How to download image files with RoboBrowser

How to download image files with RoboBrowser

Python, Web Scraping
In a previous post, we showed how RoboBrowser can be used to fill out online forms for getting historical weather data from Wunderground. This article will talk about how to use RoboBrowser to batch download collections of image files from Pexels, a site which offers free downloads. If you're looking to work with images, or want to build a training set for an image classifier with Python, this post will help you do that. In the first part of the code, we'll load the RoboBrowser class from the robobrowser package, create a browser object which acts like a web browser, and navigate to the Pexels website. [code lang="python"] # load the RoboBrowser class from robobrowser from robobrowser import RoboBrowser # define base site base = "https://www.pexels.com/" # create browser object,…
Read More
ICA on Images with Python

ICA on Images with Python

Python
Click here to see my recommended reading list. What is Independent Component Analysis (ICA)? If you're already familiar with ICA, feel free to skip below to how we implement it in Python. ICA is a type of dimensionality reduction algorithm that transforms a set of variables to a new set of components; it does so such that that the statistical independence between the new components is maximized. This is similar to Principle Component Analysis (PCA), which maps a collection of variables to statistically uncorrelated components, except that ICA goes a step further by maximizing statistical independence rather than just developing components that are uncorrelated. Like other dimensionality reduction methods, ICA seeks to reduce the number of variables in a set of data, while retaining key information. In the example we…
Read More
Coding with the Yahoo_fin Package

Coding with the Yahoo_fin Package

Python, Web Scraping
Subscribe to TheAutomatic.net via the area on the right side of the page. The yahoo_fin package contains functions to scrape stock-related data from Yahoo Finance and NASDAQ. You can view the official documentation by clicking this link, but the below post will provide a few more in-depth examples. All of the functions in yahoo_fin are contained within a single module inside yahoo_fin, called stock_info. You can import all the functions at once like this: [code lang="python"] from yahoo_fin.stock_info import * [/code] Downloading price data One of the core functions available is called get_data, which retrieves historical price data for an individual stock. To call this function, just pass whatever ticker you want: [code lang="python"] get_data("nflx") # gets Netflix's data get_data("aapl") # gets Apple's data get_data("amzn") # gets Amazon's data [/code]…
Read More
Timing Python Processes

Timing Python Processes

Python
Timing Python processes is made possible with several different packages. One of the most common ways is using the standard library package, time. Here's an example. Suppose we want to scrape the HTML from some collection of links. In this case, we're going to get a collection of URLs from Bloomberg's homepage. To do this, we'll use BeautifulSoup to get a list of full-path URLs. From the code below, this gives us a list of over 200 URLs. This first section of code should run pretty quickly; where timing a process comes in is if we wanted to cycle through some (or all) of these links and scrape the HTML from the respective pages. [code lang="Python"] # load packages import time from bs4 import BeautifulSoup import requests # get HTML…
Read More
Word Frequency Analysis

Word Frequency Analysis

Python, Web Scraping
In a previous article, we talked about using Python to scrape stock-related articles from the web. As an extension of this idea, we’re going to show you how to use the NLTK package to figure out how often different words occur in text, using scraped stock articles. Initial Setup Let's import the NLTK package, along with requests and BeautifulSoup, which we'll need to scrape the stock articles. [code language="python" style="font-size: 8px"] '''load packages''' import nltk import requests from bs4 import BeautifulSoup [/code] Pulling the data we'll need Below, we're copying code from my scraping stocks article. This gives us a function, scrape_all_articles (along with two other helper functions), which we can use to pull the actual raw text from articles linked to from NASDAQ's website. [code language="python"] def scrape_news_text(news_url): news_html…
Read More
Running Python from the Task Scheduler

Running Python from the Task Scheduler

Python, System Administration
Running Python from the Windows Task Scheduler is a really useful capability. It allows you to run Python in production on a Windows system, and can save countless hours of work. For instance, running code like this previous article about scraping stock articles on an automated, regular basis, could come in handy as new stock articles are posted. Before we go into how to schedule a Python script to run, you need to understand how to run Python from the command line. Just press the windows key and type cmd into the search box to make the command prompt come up. Suppose your python script is called cool_python_script.py, and is saved under C:\Users. You can run this script from the command prompt by typing the below line: python C:\Users\cool_python_script.py If…
Read More
RoboBrowser: Automating Online Forms

RoboBrowser: Automating Online Forms

Python, Web Scraping
RoboBrowser is a Python 3.x package for crawling through the web and submitting online forms. It works similarly to the older Python 2.x package, mechanize. This post is going to give a simple introduction using RoboBrowser to submit a form on Wunderground for scraping historical weather data. Initial setup RoboBrowser can be installed via pip: [code lang="python"] pip install robobrowser [/code] Let's do the initial setup of the script by loading the RoboBrowser package. We'll also load pandas, as we'll be using that a little bit later. [code lang="python"] from robobrowser import RoboBrowser import pandas as pd [/code] Create RoboBrowser Object Next, we create a RoboBrowser object. This object functions similarly to an actual web browser. It allows you to navigate to different websites, fill in forms, and get HTML…
Read More
Parsing Dates with Pandas

Parsing Dates with Pandas

Pandas, Python
The pandas package is one of the most powerful Python packages available. One useful feature of pandas is its Timestamp method. This provides functionality to convert strings in a variety of formats to dates. The problem we're trying to solve in this article is how to parse dates from strings that may contain additional text / words. We will look at this problem using pandas. In the first step, we'll load the pandas package. [code lang="python"] '''Load pandas package ''' import pandas as pd [/code] Next, let's create a sample string containing a made-up date with other text. For now, assume the dates will not contain spaces (we will re-examine this later). Taking this assumption, we use the split method, available for strings in Python, to create a list of…
Read More
File Manipulation with Python

File Manipulation with Python

File Manipulation, Python, System Administration
Python is great for automating file creation, deletion, and other types of file manipulations.  Two of the primary packages used to perform these types of tasks are os and shutil.  We'll be covering a few useful highlights from each of these. [code lang="python"] import os import shutil [/code] Batch Folder Creation If you want to create a handful of folders / directories, it's not difficult to manually do so.  But creating a few dozen folders manually gets mundane really fast. The os package contains a method, os.mkdir, that we can use in our situation. One line of code you might (though not required) want to use before you start is to change your working directory to where you want to create your list of folders: [code lang="python"] os.chdir('C:/Users/USERNAME/Documents') [/code] One problem I've…
Read More