3 ways to do RPA with Python


python robotic process automation

In this post we’ll cover a few packages for doing robotic process automation with Python. Robotic process automation, or RPA, is the process of automating mouse clicks and keyboard presses – i.e. simulating what a human user would do. RPA is used in a variety of applications, including data entry, accounting, finance, and more.

We’ll be covering pynput, pyautogui, and pywinauto. Each of these three packages can be used as a starting point for building your own RPA application, as well as building UI testing apps.

pynput

The first package we’ll discuss is pynput. One of the advantages of pynput is that is works on both Windows and macOS. Another nice feature is that it has functionality to monitor keyboard and mouse input. Let’s get started with pynput by installing it with pip:


pip install pynput

Once you have it installed, you can get started by importing the Controller and Button classes. Then, we’ll create an instance of the Controller class, which we’ll call mouse. This will simulate your computer’s mouse to allow you to programmatically click buttons and move the mouse around on the screen.


from pynput.mouse import Button, Controller

mouse = Controller()

Next, let’s look at a couple simple commands. To right or left-click, we can use the Button class imported above.


# left-click
mouse.press(Button.left)

# right-click
mouse.press(Button.right)

To double click, you just need to add the number two as the second parameter.


mouse.press(Button.left, 2)

We can also move the mouse pointer to a different position by using the move method.


mouse.move(50, -50)

mouse.move(100, -200)

pynput can control the keyboard, as well. To do that, we need to import the Key class


from pynput.keyboard import Key

To make your keyboard type, you can use the aptly-named keyboard.type method.


keyboard.type("this is a test")

As mentioned above, pynput can also monitor mouse movements and keyboard presses. To learn more about that functionality and pynput, check out this link.

pyautogui

Perhaps the most commonly known package for simulating mouse clicks and keyboard entries is the pyautogui library. pyautogui works on Windows, Linux, and macOS. If you don’t have it installed, you can get it using pip:


pip install pyautogui

pyautogui is also straightforward to use. For example, if you want to simulate typing a string of text, just use the typewrite method:


pyautogui.typewrite("test pyautogui!")

To left-click your mouse, you can use the click method. To right-click, you can use the rightClick method.


# left-click
pyautogui.click(100, 200)

# right-click
pyautogui.rightClick(100, 200)

Searching for an image on the screen

One of the coolest features of pyautogui is that it can search for an image on the computer screen. This is really helpful if you need to find a particular button to click. You can search for an image by inputting the image file name into the locateOnScreen method. The function returns the topleft coordinate along with the height and width of the identified image.


location = pyautogui.locateOnScreen("random_image.png")

pyautogui location image on screen

To get the center of identified image, use the center method. Then, you can use the click method to click on the center of the identified image – in this case, a button on the screen.

pyautogui get center of image


center = pyautogui.center(location)

pyautogui.click(center)

Sometimes an image may not be found exactly on a screen. In this case, you can add the confidence parameter to locateOnScreen to give Python a confidence level of identifying the image.


pyautogui.locateOnScreen("random_image.png", confidence = 0.95)

Taking a screenshot

You can take a screenshot with pyautogui using the screenshot method. Passing a filename will save the screenshot out to that file.


s = pyautogui.screenshot("sample_screenshot.png")

It’s also possible to take a screenshot of a specific region, rather than the full screen:


pyautogui.screenshot(region = c(0, 0, 100, 200))

pywinauto

On Windows, another option we can look into is the pywinauto library. The main disadvantage of this library is that it does not work on macOS or Linux. However, it also offers a couple of nice advantages for Windows users. One, it’s syntax is object-oriented – it’s made to be more Pythonic. Secondly, because of its design, the library can make it easier to perform certain tasks, like clicking on specific buttons or finding menu items in an application.

For example, let’s start by launching Notepad, typing some text, and saving the file. We can do that using the code snippet below. Here, we start Notepad by using the Application class. Then, we refer to the Notepad file we just opened by “UnitledNotepad”. We can use the Edit.type_keys to start typing text.


from pywinauto.application import Application

app = Application(backend="uia").start("notepad.exe")
app.UntitledNotepad.Edit.type_keys("Starting notepad...")
app.UntitledNotepad.menu_select("File->SaveAs")
sub_app=app.UntitledNotepad.child_window(title_re = "Save As")
sub_app.FileNameCombo.type_keys("test_file.txt")
sub_app.Save.click()

Learn more about pywinauto by checking out this link.

Conclusion

That’s it for this post! We covered three packages for doing robotic process automation with Python. Check out my other Python posts here.