Data Analysis with Python Course: How to read, wrangle, and analyze data

Pandas, Python
For anyone in the NYC area, I will be holding an in-person training session December 3rd on doing data analysis with Python. We will be covering the pandas, pyodbc, and matplotlib packages. Please register at Eventbrite here: https://www.eventbrite.com/e/data-analysis-with-python-how-to-read-wrangle-and-analyze-data-tickets-51945542516. Overview Learn how to apply Python to read, wrangle, visualize, and analyze data!  This course provides a hands-on session where we'll walk through a prepared curriculum on doing data analysis with Python.  All code and practice exercises during the session will be made available after the course is complete.     About the course During this hands-on class, you will learn the fundamentals of doing data analysis in Python, the powerful pandas package, and pyodbc for connecting to databases. We will walk through using Python to analyze and answer key questions on sales…
Read More
Dpylthon…dplyr for Python!

Dpylthon…dplyr for Python!

Python
If you're an avid R user, you probably use the famous dplyr package. Python has a package meant to be similar to dplyr, called dplython. This article will give an introduction for how to use dplython. For the examples below, we'll use a sample dataset that comes with R giving attributes about the US states, including population, area, and income levels. You can see the dataset by clicking here. Initial setup dplython can be installed using pip:. pip install dplython Once the package is installed, let's load a few methods from it, and read in our dataset. [code lang="python"] # load packages from dplython import select, DplyFrame, X, arrange, count, sift, head, summarize, group_by, tail, mutate import pandas as pd # read in data state_df = pd.read_csv("state_info.txt") [/code] After we've…
Read More

How to get live stock prices with Python

Python, Web Scraping
In a previous post, I gave an introduction to the yahoo_fin package. The most updated version of the package includes new functionality allowing you to scrape live stock prices from Yahoo Finance (real-time). In this article, we'll go through a couple ways of getting real-time data from Yahoo Finance for stocks, as well as how to pull cryptocurrency price information. The get_live_price function First, we just need to load the stock_info module from yahoo_fin. [code lang="python"] # import stock_info module from yahoo_fin from yahoo_fin import stock_info as si [/code] Then, obtaining the current price of a stock is as simple as one line of code: [code lang="python"] # get live price of Apple si.get_live_price("aapl") # or Amazon si.get_live_price("amzn") # or any other ticker si.get_live_price(ticker) [/code] Note: Passing tickers is not…
Read More
How to download image files with RoboBrowser

How to download image files with RoboBrowser

Python, Web Scraping
In a previous post, we showed how RoboBrowser can be used to fill out online forms for getting historical weather data from Wunderground. This article will talk about how to use RoboBrowser to batch download collections of image files from Pexels, a site which offers free downloads. If you're looking to work with images, or want to build a training set for an image classifier with Python, this post will help you do that. In the first part of the code, we'll load the RoboBrowser class from the robobrowser package, create a browser object which acts like a web browser, and navigate to the Pexels website. [code lang="python"] # load the RoboBrowser class from robobrowser from robobrowser import RoboBrowser # define base site base = "https://www.pexels.com/" # create browser object,…
Read More
ICA on Images with Python

ICA on Images with Python

Machine Learning, Python
Click here to see my recommended reading list. What is Independent Component Analysis (ICA)? If you're already familiar with ICA, feel free to skip below to how we implement it in Python. ICA is a type of dimensionality reduction algorithm that transforms a set of variables to a new set of components; it does so such that that the statistical independence between the new components is maximized. This is similar to Principle Component Analysis (PCA), which maps a collection of variables to statistically uncorrelated components, except that ICA goes a step further by maximizing statistical independence rather than just developing components that are uncorrelated. Like other dimensionality reduction methods, ICA seeks to reduce the number of variables in a set of data, while retaining key information. In the example we…
Read More
Coding with the Yahoo_fin Package

Coding with the Yahoo_fin Package

Python, Web Scraping
Background on yahoo_fin The yahoo_fin package contains functions to scrape stock-related data from Yahoo Finance and NASDAQ. You can view the official documentation by clicking this link, but the below post will provide a few more in-depth examples. Also, please check out my yahoo_fin playlist on YouTube. The first video is below, which covers installation and getting historical / real-time stock prices. The functions in yahoo_fin are divided into two modules, stock_info and options. This post will focus on introducing stock_info. For more on using the options module, check out this post. Let's get started by importing the stock_info module from yahoo_fin. [code lang="python"] import yahoo_fin.stock_info as si [/code] Downloading price data One of the core functions available is called get_data, which retrieves historical price data for an individual stock.…
Read More
Timing Python Processes

Timing Python Processes

Python
Timing Python processes is made possible with several different packages. One of the most common ways is using the standard library package, time, which we'll demonstrate with an example. However, another package that is very useful for timing a process -- and particularly telling you how far along a process has come -- is tqdm. As we'll show a little further down the post, tqdm will actually print out a progress bar as a process is running. Basic timing example Suppose we want to scrape the HTML from some collection of links. In this case, we're going to get a collection of URLs from Bloomberg's homepage. To do this, we'll use BeautifulSoup to get a list of full-path URLs. From the code below, this gives us a list of around…
Read More
Word Frequency Analysis

Word Frequency Analysis

Python, Web Scraping
In a previous article, we talked about using Python to scrape stock-related articles from the web. As an extension of this idea, we’re going to show you how to use the NLTK package to figure out how often different words occur in text, using scraped stock articles. Initial Setup Let's import the NLTK package, along with requests and BeautifulSoup, which we'll need to scrape the stock articles. [code language="python" style="font-size: 8px"] '''load packages''' import nltk import requests from bs4 import BeautifulSoup [/code] Pulling the data we'll need Below, we're copying code from my scraping stocks article. This gives us a function, scrape_all_articles (along with two other helper functions), which we can use to pull the actual raw text from articles linked to from NASDAQ's website. [code language="python"] def scrape_news_text(news_url): news_html…
Read More
Running Python from the Task Scheduler

Running Python from the Task Scheduler

Python, System Administration
Background Running Python from the Windows Task Scheduler is a really useful capability. It allows you to run Python in production on a Windows system, and can save countless hours of work. For instance, running code like extracting data from a database on an automated, regular basis is a common need at many companies. How to run Python from the command line Before we go into how to schedule a Python script to run, you need to understand how to run Python from the command line. To open the command prompt (command line), press the windows key and type cmd into the search box. Next, suppose your python script is called cool_python_script.py, and is saved under C:\Users. You can run this script from the command prompt by typing the below…
Read More