How to get options data with Python

How to get options data with Python

Python, Web Scraping
In a previous post, we talked about how to get real-time stock prices with Python. This post will go through how to download financial options data with Python. We will be using the yahoo_fin package. The yahoo_fin package comes with a module called options. This module allows you to scrape option chains and get option expiration dates. To get started we'll just import this module from yahoo_fin. [code lang="python"] from yahoo_fin import options [/code] How to get options expiration dates Any option contract has an expiration date. To get all of the option expiration dates for a particular stock, we can use the get_expiration_dates method in the options package. This method is equivalent to scraping all of the date selection boxes on the options page for an individual stock (e.g.…
Read More
Creating a word cloud on R-bloggers posts

Creating a word cloud on R-bloggers posts

R, Web Scraping
This post will go through how to create a word cloud of article titles scraped from the awesome R-bloggers. Our goal will be to use R's rvest package to search through 50 successive pages on the site for article titles. The stringr and tm packages will be used for string cleaning and for creating a term document frequency matrix (with tm). We will then create a word cloud based off the words comprising these titles. First, we'll load the packages we need. [code lang="R"] # load packages library(rvest) library(stringr) library(tm) library(wordcloud) [/code] Let's write a function that will take a webpage as input and return all the scraped article titles. [code lang="R"] scrape_post_titles <- function(site) { # scrape HTML from input site source_html <- read_html(site) # grab the title attributes…
Read More
Scraping data from a JavaScript webpage with Python

Scraping data from a JavaScript webpage with Python

Python, Web Scraping
This post will walk through how to use the requests_html package to scrape options data from a JavaScript-rendered webpage. requests_html serves as an alternative to Selenium and PhantomJS, and provides a clear syntax similar to the awesome requests package. The code we'll walk through is packaged into functions in the options module in the yahoo_fin package, but this article will show how to write the code from scratch using requests_html so that you can use the same idea to scrape other JavaScript-rendered webpages. Note: requests_html requires Python 3.6+. If you don't have requests_html installed, you can download it using pip: [cc] pip install requests_html [/cc] Motivation Let's say we want to scrape options data for a particular stock. As an example, let's look at Netflix (since it's well known). If…
Read More

How to get live stock prices with Python

Python, Web Scraping
In a previous post, I gave an introduction to the yahoo_fin package. The most updated version of the package includes new functionality allowing you to scrape live stock prices from Yahoo Finance (real-time). In this article, we'll go through a couple ways of getting real-time data from Yahoo Finance for stocks, as well as how to pull cryptocurrency price information. The get_live_price function First, we just need to load the stock_info module from yahoo_fin. [code lang="python"] # import stock_info module from yahoo_fin from yahoo_fin import stock_info as si [/code] Then, obtaining the current price of a stock is as simple as one line of code: [code lang="python"] # get live price of Apple si.get_live_price("aapl") # or Amazon si.get_live_price("amzn") # or any other ticker si.get_live_price(ticker) [/code] Note: Passing tickers is not…
Read More
How to download image files with RoboBrowser

How to download image files with RoboBrowser

Python, Web Scraping
In a previous post, we showed how RoboBrowser can be used to fill out online forms for getting historical weather data from Wunderground. This article will talk about how to use RoboBrowser to batch download collections of image files from Pexels, a site which offers free downloads. If you're looking to work with images, or want to build a training set for an image classifier with Python, this post will help you do that. In the first part of the code, we'll load the RoboBrowser class from the robobrowser package, create a browser object which acts like a web browser, and navigate to the Pexels website. [code lang="python"] # load the RoboBrowser class from robobrowser from robobrowser import RoboBrowser # define base site base = "https://www.pexels.com/" # create browser object,…
Read More
Coding with the Yahoo_fin Package

Coding with the Yahoo_fin Package

Python, Web Scraping
Subscribe to TheAutomatic.net via the area on the right side of the page. The yahoo_fin package contains functions to scrape stock-related data from Yahoo Finance and NASDAQ. You can view the official documentation by clicking this link, but the below post will provide a few more in-depth examples. All of the functions in yahoo_fin are contained within a single module inside yahoo_fin, called stock_info. You can import all the functions at once like this: [code lang="python"] from yahoo_fin.stock_info import * [/code] Downloading price data One of the core functions available is called get_data, which retrieves historical price data for an individual stock. To call this function, just pass whatever ticker you want: [code lang="python"] get_data("nflx") # gets Netflix's data get_data("aapl") # gets Apple's data get_data("amzn") # gets Amazon's data [/code]…
Read More
Word Frequency Analysis

Word Frequency Analysis

Python, Web Scraping
In a previous article, we talked about using Python to scrape stock-related articles from the web. As an extension of this idea, we’re going to show you how to use the NLTK package to figure out how often different words occur in text, using scraped stock articles. Initial Setup Let's import the NLTK package, along with requests and BeautifulSoup, which we'll need to scrape the stock articles. [code language="python" style="font-size: 8px"] '''load packages''' import nltk import requests from bs4 import BeautifulSoup [/code] Pulling the data we'll need Below, we're copying code from my scraping stocks article. This gives us a function, scrape_all_articles (along with two other helper functions), which we can use to pull the actual raw text from articles linked to from NASDAQ's website. [code language="python"] def scrape_news_text(news_url): news_html…
Read More
RoboBrowser: Automating Online Forms

RoboBrowser: Automating Online Forms

Python, Web Scraping
Background RoboBrowser is a Python 3.x package for crawling through the web and submitting online forms. It works similarly to the older Python 2.x package, mechanize. This post is going to give a simple introduction using RoboBrowser to submit a form on Wunderground for scraping historical weather data. Initial setup RoboBrowser can be installed via pip: [code lang="python"] pip install robobrowser [/code] Let's do the initial setup of the script by loading the RoboBrowser package. We'll also load pandas, as we'll be using that a little bit later. [code lang="python"] from robobrowser import RoboBrowser import pandas as pd [/code] Create RoboBrowser Object Next, we create a RoboBrowser object. This object functions similarly to an actual web browser. It allows you to navigate to different websites, fill in forms, and get…
Read More
Scraping Articles About Stocks

Scraping Articles About Stocks

Python, Web Scraping
See recommended books here. The following article will show you an example of how to scrape articles about stocks from the Web using Python 3.  Specifically, we'll be looking at articles linked from http://www.nasdaq.com. If you're not familiar with list comprehensions, you may want to check this, as we'll be using them in our code. Initial, Specific Example Let's start with a specific stock -- say, Netflix, for example.  Articles linked to a specific stock ticker from Nasdaq's website have the following pattern: http://www.nasdaq.com/symbol/TICKER/news-headlines, where TICKER is replaced with whatever ticker you want.  In our case, we will start by dealing specifically with Netflix's (NFLX) stock.  So our site of interest is: http://www.nasdaq.com/symbol/nflx/news-headlines The first step is to load the requests and BeautifulSoup packages.  Here, we'll also set the variable site equal to…
Read More