ICA on Images with Python

ICA on Images with Python

Python
Click here to see my recommended reading list. What is Independent Component Analysis (ICA)? If you're already familiar with ICA, feel free to skip below to how we implement it in Python. ICA is a type of dimensionality reduction algorithm that transforms a set of variables to a new set of components; it does so such that that the statistical independence between the new components is maximized. This is similar to Principle Component Analysis (PCA), which maps a collection of variables to statistically uncorrelated components, except that ICA goes a step further by maximizing statistical independence rather than just developing components that are uncorrelated. Like other dimensionality reduction methods, ICA seeks to reduce the number of variables in a set of data, while retaining key information. In the example we…
Read More
Coding with the Yahoo_fin Package

Coding with the Yahoo_fin Package

Python, Web Scraping
Subscribe to TheAutomatic.net via the area on the right side of the page. The yahoo_fin package contains functions to scrape stock-related data from Yahoo Finance and NASDAQ. You can view the official documentation by clicking this link, but the below post will provide a few more in-depth examples. All of the functions in yahoo_fin are contained within a single module inside yahoo_fin, called stock_info. You can import all the functions at once like this: [code lang="python"] from yahoo_fin.stock_info import * [/code] Downloading price data One of the core functions available is called get_data, which retrieves historical price data for an individual stock. To call this function, just pass whatever ticker you want: [code lang="python"] get_data("nflx") # gets Netflix's data get_data("aapl") # gets Apple's data get_data("amzn") # gets Amazon's data [/code]…
Read More
Timing Python Processes

Timing Python Processes

Python
Timing Python processes is made possible with several different packages. One of the most common ways is using the standard library package, time, which we'll demonstrate with an example. However, another package that is very useful for timing a process -- and particularly telling you how far along a process has come -- is tqdm. As we'll show a little further down the post, tqdm will actually print out a progress bar as a process is running. Basic timing example Suppose we want to scrape the HTML from some collection of links. In this case, we're going to get a collection of URLs from Bloomberg's homepage. To do this, we'll use BeautifulSoup to get a list of full-path URLs. From the code below, this gives us a list of around…
Read More
Word Frequency Analysis

Word Frequency Analysis

Python, Web Scraping
In a previous article, we talked about using Python to scrape stock-related articles from the web. As an extension of this idea, we’re going to show you how to use the NLTK package to figure out how often different words occur in text, using scraped stock articles. Initial Setup Let's import the NLTK package, along with requests and BeautifulSoup, which we'll need to scrape the stock articles. [code language="python" style="font-size: 8px"] '''load packages''' import nltk import requests from bs4 import BeautifulSoup [/code] Pulling the data we'll need Below, we're copying code from my scraping stocks article. This gives us a function, scrape_all_articles (along with two other helper functions), which we can use to pull the actual raw text from articles linked to from NASDAQ's website. [code language="python"] def scrape_news_text(news_url): news_html…
Read More
Running Python from the Task Scheduler

Running Python from the Task Scheduler

Python, System Administration
Running Python from the Windows Task Scheduler is a really useful capability. It allows you to run Python in production on a Windows system, and can save countless hours of work. For instance, running code like this previous article about scraping stock articles on an automated, regular basis, could come in handy as new stock articles are posted. Before we go into how to schedule a Python script to run, you need to understand how to run Python from the command line. Just press the windows key and type cmd into the search box to make the command prompt come up. Suppose your python script is called cool_python_script.py, and is saved under C:\Users. You can run this script from the command prompt by typing the below line: python C:\Users\cool_python_script.py If…
Read More
RoboBrowser: Automating Online Forms

RoboBrowser: Automating Online Forms

Python, Web Scraping
Background RoboBrowser is a Python 3.x package for crawling through the web and submitting online forms. It works similarly to the older Python 2.x package, mechanize. This post is going to give a simple introduction using RoboBrowser to submit a form on Wunderground for scraping historical weather data. Initial setup RoboBrowser can be installed via pip: [code lang="python"] pip install robobrowser [/code] Let's do the initial setup of the script by loading the RoboBrowser package. We'll also load pandas, as we'll be using that a little bit later. [code lang="python"] from robobrowser import RoboBrowser import pandas as pd [/code] Create RoboBrowser Object Next, we create a RoboBrowser object. This object functions similarly to an actual web browser. It allows you to navigate to different websites, fill in forms, and get…
Read More
Parsing Dates with Pandas

Parsing Dates with Pandas

Pandas, Python
The pandas package is one of the most powerful Python packages available. One useful feature of pandas is its Timestamp method. This provides functionality to convert strings in a variety of formats to dates. The problem we're trying to solve in this article is how to parse dates from strings that may contain additional text / words. We will look at this problem using pandas. In the first step, we'll load the pandas package. [code lang="python"] '''Load pandas package ''' import pandas as pd [/code] Next, let's create a sample string containing a made-up date with other text. For now, assume the dates will not contain spaces (we will re-examine this later). Taking this assumption, we use the split method, available for strings in Python, to create a list of…
Read More
File Manipulation with Python

File Manipulation with Python

File Manipulation, Python, System Administration
Getting started Python is great for automating file creation, deletion, and other types of file manipulations.  Two of the primary packages used to perform these types of tasks are os and shutil.  We'll be covering a few useful highlights from each of these. [code lang="python"] import os import shutil [/code] How to get and change your current working directory You can get your current working directory using os.getcwd: [code lang="python"] os.getcwd() [/code] Any actions you take without specifying a directory will be assumed to be associated with your current working directory i.e. if you create or search for a file without specifying a directory, Python will assume you're in the value of os.getcwd(). To change your working directory, use os.chdir: [code lang="python"] os.chdir("C:/path/to/new/directory") [/code] How to merge a directory name…
Read More
Scraping Articles About Stocks

Scraping Articles About Stocks

Python, Web Scraping
See recommended books here. The following article will show you an example of how to scrape articles about stocks from the Web using Python 3.  Specifically, we'll be looking at articles linked from http://www.nasdaq.com. If you're not familiar with list comprehensions, you may want to check this, as we'll be using them in our code. Initial, Specific Example Let's start with a specific stock -- say, Netflix, for example.  Articles linked to a specific stock ticker from Nasdaq's website have the following pattern: http://www.nasdaq.com/symbol/TICKER/news-headlines, where TICKER is replaced with whatever ticker you want.  In our case, we will start by dealing specifically with Netflix's (NFLX) stock.  So our site of interest is: http://www.nasdaq.com/symbol/nflx/news-headlines The first step is to load the requests and BeautifulSoup packages.  Here, we'll also set the variable site equal to…
Read More