Andrew Treadway, Author at Open Source Automation

31Jul 2018 by Andrew Treadway

How to get live stock prices with Python

In a previous post, I gave an introduction to the yahoo_fin package. The most updated version of the package includes new functionality allowing you to scrape live stock prices from Yahoo Finance (real-time). In this article, we'll go through a couple ways of getting real-time data from Yahoo Finance for stocks, as well as how to pull cryptocurrency price information. The get_live_price function First, we just need to load the stock_info module from yahoo_fin. [code lang="python"] # import stock_info module from yahoo_fin from yahoo_fin import stock_info as si [/code] Then, obtaining the current price of a stock is as simple as one line of code: [code lang="python"] # get live price of Apple si.get_live_price("aapl") # or Amazon si.get_live_price("amzn") # or any other ticker si.get_live_price(ticker) [/code] Note: Passing tickers is not…

16Jul 2018 by Andrew Treadway

How to download image files with RoboBrowser

Python, Web Scraping

In a previous post, we showed how RoboBrowser can be used to fill out online forms for getting historical weather data from Wunderground. This article will talk about how to use RoboBrowser to batch download collections of image files from Pexels, a site which offers free downloads. If you're looking to work with images, or want to build a training set for an image classifier with Python, this post will help you do that. In the first part of the code, we'll load the RoboBrowser class from the robobrowser package, create a browser object which acts like a web browser, and navigate to the Pexels website. [code lang="python"] # load the RoboBrowser class from robobrowser from robobrowser import RoboBrowser # define base site base = "https://www.pexels.com/" # create browser object,…

11Jul 2018 by Andrew Treadway

R: How to create, delete, move, and more with files

File Manipulation, R, System Administration

Though Python is usually thought of over R for doing system administration tasks, R is actually quite useful in this regard. In this post we're going to talk about using R to create, delete, move, and obtain information on files. How to get and change the current working directory Before working with files, it's usually a good idea to first know what directory you're working in. The working directory is the folder that any files you create or refer to without explicitly spelling out the full path fall within. In R, you can figure this out with the getwd function. To change this directory, you can use the aptly named setwd function. [code lang="R"] # get current working directory getwd() # set working directory setwd("C:/Users") [/code] Creating Files and Directories…

23Jun 2018 by Andrew Treadway

ICA on Images with Python

Machine Learning, Python

Click here to see my recommended reading list. What is Independent Component Analysis (ICA)? If you're already familiar with ICA, feel free to skip below to how we implement it in Python. ICA is a type of dimensionality reduction algorithm that transforms a set of variables to a new set of components; it does so such that that the statistical independence between the new components is maximized. This is similar to Principle Component Analysis (PCA), which maps a collection of variables to statistically uncorrelated components, except that ICA goes a step further by maximizing statistical independence rather than just developing components that are uncorrelated. Like other dimensionality reduction methods, ICA seeks to reduce the number of variables in a set of data, while retaining key information. In the example we…

25Jan 2018 by Andrew Treadway

Coding with the Yahoo_fin Package

Python, Web Scraping

Background on yahoo_fin The yahoo_fin package contains functions to scrape stock-related data from Yahoo Finance and NASDAQ. You can view the official documentation by clicking this link, but the below post will provide a few more in-depth examples. Also, please check out my yahoo_fin playlist on YouTube. The first video is below, which covers installation and getting historical / real-time stock prices. The functions in yahoo_fin are divided into two modules, stock_info and options. This post will focus on introducing stock_info. For more on using the options module, check out this post. Let's get started by importing the stock_info module from yahoo_fin. [code lang="python"] import yahoo_fin.stock_info as si [/code] Downloading price data One of the core functions available is called get_data, which retrieves historical price data for an individual stock.…

14Jan 2018 by Andrew Treadway

Timing Python Processes

Python

Timing Python processes is made possible with several different packages. One of the most common ways is using the standard library package, time, which we'll demonstrate with an example. However, another package that is very useful for timing a process -- and particularly telling you how far along a process has come -- is tqdm. As we'll show a little further down the post, tqdm will actually print out a progress bar as a process is running. Basic timing example Suppose we want to scrape the HTML from some collection of links. In this case, we're going to get a collection of URLs from Bloomberg's homepage. To do this, we'll use BeautifulSoup to get a list of full-path URLs. From the code below, this gives us a list of around…

30Dec 2017 by Andrew Treadway

Underrated R Functions

R

I wanted to write a post about a couple of handy functions in R that don't always get the recognition they deserve. This article will talk about a few functions that form part of R's core functional programming capabilities. R has thousands of functions, so this is just a short list, and I'll probably write other articles like this in the future to discuss some different R functions. Reduce Let's start with the Reduce function (note the capital "R"). Reduce takes a list or vector as input, and reduces it down to a single element. It works by applying a function to the first two elements of the vector or list, and then applying the same function to that result with the third element. This new result gets passed with…

11Dec 2017 by Andrew Treadway

Vectorize Fuzzy Matching

R

One of the best things about R is its ability to vectorize code. This allows you to run code much faster than you would if you were using a for or while loop. In this post, we're going to show you how to use vectorization to speed up fuzzy matching. First, a little bit of background will be covered. If you're familiar with vectorization and / or fuzzy matching, feel free to skip further down the post. What is vectorization? Vectorization works by performing operations on entire vectors, or by extension, matrices, rather than iterating through each element in a collection of objects one at a time. A basic example is adding two vectors together. This can be done like this: [code lang="R"] a <- c(3, 4, 5) b <-…

14Oct 2017 by Andrew Treadway

Running R Code in Parallel

R

Background Running R code in parallel can be very useful in speeding up performance. Basically, parallelization allows you to run multiple processes in your code simultaneously, rather than than iterating over a list one element at a time, or running a single process at a time. Thankfully, running R code in parallel is relatively simple using the parallel package. This package provides parallelized versions of sapply, lapply, and rapply. Parallelizing code works best when you need to call a function or perform an operation on different elements of a list or vector when doing so on any particular element of the list (or vector) has no impact on the evaluation of any other element. This could be running a large number of models across different elements of a list, scraping…

12Oct 2017 by Andrew Treadway

Word Frequency Analysis

Python, Web Scraping

In a previous article, we talked about using Python to scrape stock-related articles from the web. As an extension of this idea, we’re going to show you how to use the NLTK package to figure out how often different words occur in text, using scraped stock articles. Initial Setup Let's import the NLTK package, along with requests and BeautifulSoup, which we'll need to scrape the stock articles. [code language="python" style="font-size: 8px"] '''load packages''' import nltk import requests from bs4 import BeautifulSoup [/code] Pulling the data we'll need Below, we're copying code from my scraping stocks article. This gives us a function, scrape_all_articles (along with two other helper functions), which we can use to pull the actual raw text from articles linked to from NASDAQ's website. [code language="python"] def scrape_news_text(news_url): news_html…

Author: Andrew Treadway