Python Archives - Open Source Automation

04Apr 2023 by Andrew Treadway

Software Engineering for Data Scientists (New book!)

Very excited to announce the early-access preview (MEAP) of my upcoming book, Software Engineering for Data Scientists is available now! Check it out at this link. Use promo code au35tre to save 30% on this book and any products sold from Manning. Why Software Engineering for Data Scientists? Data science and software engineering have been merging more and more, especially over the last decade. Software Engineering for Data Scientists is my upcoming book that will help you learn more about software engineering and how it can make your life easier as a data scientist! This book covers the following key topics: Source control How to implement exception handling and write robust code Object-oriented programming for data scientists How to monitor the progress of training machine learning models Scaling your Python…

27Nov 2021 by Andrew Treadway

How to stop long-running code in Python

Python

Ever had long-running code that you don't know when it's going to finish running? If you have, then Python's stopit library is for you. In a previous post, we talked about how to create a progress bar to monitor Python code. This post will show you how to automatically stop long-running code with the stopit package. Getting started with stopit To get started with stopit, you can install it via pip: [code] pip install stopit [/code] In our first example, we'll use a context manager to stop the code we want to execute after a timeout limit is reached. [code lang="python"] import stopit with stopit.ThreadingTimeout(5) as context_manager: # sample code we want to run... for i in range(10**8): i = i * 2 # Did code finish running in under…

09Oct 2021 by Andrew Treadway

Faster alternatives to pandas

Pandas, Python

Background If you've done any type of data analysis in Python, chances are you've probably used pandas. Though widely used in the data world, if you've run into space or computational issues with it, you're not alone. This post discusses several faster alternatives to pandas. R's data table in Python If you've used R, you're probably familiar with the data.table package. A port of this library is also available in Python. In this example, we show how you can read in a CSV file faster than using standard pandas. For our purposes, we'll be using an open source dataset from the UCI repository. [code lang="python"] import datatable start = time.time() os_scan_data = datatable.fread("OS Scan_dataset.csv", header = None) end = time.time() print(end - start) [/code] Using datatable, we can read in…

02Jul 2021 by Andrew Treadway

Automated EDA with Python

Pandas, Python

In this post, we will investigate the pandas_profiling and sweetviz packages, which can be used to speed up EDA (exploratory data analysis) with Python. In a previous article, we talked about an analagous package in R (see this link). Getting started with pandas_profiling pandas_profiling can be installed using pip, like this: [code] pip install pandas-profiling[notebook] [/code] Next, let's read in our dataset. The data we'll be using is a heart attack-related dataset, which can be found here. [code lang="python"] import pandas as pd heart_data = pd.read_csv("heart.csv") heart_data.head() [/code] Now, let's import ProfileReport from pandas_profiling. [code lang="python"] from pandas_profiling import ProfileReport report = ProfileReport(heart_data, title = "Sample Report") report [/code] If you're running this code in Jupyter Notebook, you should see the report generated within your notebook file. The report shows…

14Apr 2021 by Andrew Treadway

Python collections tutorial

Python

In this post, we'll discuss the underrated Python collections package, which is part of the standard library. Collections allows you to utilize several data structures beyond base Python. How to get a count of all the elements in a list One very useful function in collections is the Counter method, which you can use to return a count of all the elements in a list. [code lang="python"] nums = [3, 3, 4, 1, 10, 10, 10, 10, 5] collections.Counter(nums) [/code] The Counter object that gets returned is also modifiable. Let's define a variable equal to the result above. [code lang="python"] counts = collections.Counter(nums) counts[20] += 1 [/code] Notice how we can add the number 20 to our Counter object without having to initialize it with a 0 value. Counter can…

25Mar 2021 by Andrew Treadway

How to create PDF files with Python

Python

In a previous article we talked about several ways to read PDF files with Python. This post will cover two packages used to create PDF files with Python, including pdfkit and ReportLab. Create PDF files with Python and pdfkit pdfkit was the first library I learned for creating PDF files. A nice feature of pdfkit is that you can use it to create PDF files from URLs. To get started, you'll need to install it along with a utility called wkhtmltopdf. Use pip to install pdfkit from PyPI: [code] pip install pdfkit [/code] Once you're set up, you can start using pdfkit. In the example below, we download Wikipedia's main page as a PDF file. To get pdfkit working, you'll need to either add wkhtmltopdf to your PATH, or configure…

16Feb 2021 by Andrew Treadway

How to get stock earnings data with Python

Python

In this post, we'll walk through a few examples for getting stock earnings data with Python. We will be using yahoo_fin, which was recently updated. The latest version now includes functionality to easily pull earnings calendar information for individual stocks or dates. If you need to install yahoo_fin, you can use pip: [code] pip install yahoo_fin [/code] If you already have it installed and need to upgrade, you can update your version like this: [code] pip install yahoo_fin --upgrade [/code] To get started, let's import yahoo_fin: [code lang="python"] import yahoo_fin.stock_info as si [/code] Getting stock earnings calendar data The first method we'll cover is the get_earnings_history function. get_earnings_history returns a list of dictionaries. Each dictionary contains an earnings date along with EPS actual / expected information. Let's test it out…

02Feb 2021 by Andrew Treadway

Technical analysis with Python

Python

In this post, we will introduce how to do technical analysis with Python. Python has several libraries for performing technical analysis of investments. We're going to compare three libraries - ta, pandas_ta, and bta-lib. The ta library for technical analysis One of the nicest features of the ta package is that it allows you to add dozen of technical indicators all at once. To get started, install the ta library using pip: [code] pip install ta [/code] Next, let's import the packages we need. We'll be using yahoo_fin to pull in stock price data. Now, data contains the historical prices for AAPL. [code lang="python"] # load packages import yahoo_fin.stock_info as si import pandas as pd from ta import add_all_ta_features # pull data from Yahoo Finance data = si.get_data("aapl") [/code] Next,…

05Jan 2021 by Andrew Treadway

Python’s rich library – a tutorial

Python, System Administration

The Python rich library is a package for having clearer, styled, and colored output in the terminal. rich works across multiple operating systems - including Windows, Linux, and macOS. In this post, we'll give an introduction to what it can do for you. You can get started with rich by installing it with pip. [code] pip install rich [/code] Once you have it installed, open up the command line and type in python. In order to get the additional functionality from rich, you'll need to do one more step, which you can see below. Running this snippet will allow you to have styled / formatted code interactively. You'll only need to do this once. [code lang="python"] from rich import pretty pretty.install() [/code] Here's a couple examples of automatic coloring for…

17Dec 2020 by Andrew Treadway

3 ways to do RPA with Python

Python, System Administration

In this post we'll cover a few packages for doing robotic process automation with Python. Robotic process automation, or RPA, is the process of automating mouse clicks and keyboard presses - i.e. simulating what a human user would do. RPA is used in a variety of applications, including data entry, accounting, finance, and more. We'll be covering pynput, pyautogui, and pywinauto. Each of these three packages can be used as a starting point for building your own RPA application, as well as building UI testing apps. pynput The first package we'll discuss is pynput. One of the advantages of pynput is that is works on both Windows and macOS. Another nice feature is that it has functionality to monitor keyboard and mouse input. Let's get started with pynput by installing…

Category: Python