Why defining constants is important – a Python example

Why defining constants is important – a Python example

Python
This post will walk through an example of why defining a known constant can save lots of computational time. How to find the key with the maximum value in a Python dictionary There's a few ways to go about getting the key associated with the max value in a Python dictionary. The two ways we'll show each involve using a list comprehension. First, let's set the scene by creating a dictionary with 100,000 key-value pairs. We'll just make the keys the integers between 0 and 99,999 and we'll use the random package to randomly assign values for each of these keys based off the uniform distribution between 0 and 100,000. [code lang="python"] import random import time vals = [random.uniform(0, 100000) for x in range(100000)] mapping = dict(zip(range(100000), vals)) [/code] Now,…
Read More
Scraping data from a JavaScript webpage with Python

Scraping data from a JavaScript webpage with Python

Python, Web Scraping
This post will walk through how to use the requests_html package to scrape options data from a JavaScript-rendered webpage. requests_html serves as an alternative to Selenium and PhantomJS, and provides a clear syntax similar to the awesome requests package. The code we'll walk through is packaged into functions in the options module in the yahoo_fin package, but this article will show how to write the code from scratch using requests_html so that you can use the same idea to scrape other JavaScript-rendered webpages. Note: requests_html requires Python 3.6+. If you don't have requests_html installed, you can download it using pip: [cc] pip install requests_html [/cc] Motivation Let's say we want to scrape options data for a particular stock. As an example, let's look at Netflix (since it's well known). If…
Read More
2 packages for extracting dates from a string of text in Python

2 packages for extracting dates from a string of text in Python

Pandas, Python
This post will cover two different ways to extract a date from a string of text in Python. The main purpose here is that the strings we will parse contain additional text - not just the date. Scraping a date out of text can be useful in many different situations. Option 1) dateutil The first option we'll show is using the dateutil package. Here's an example: [code lang="python"] from dateutil.parser import parse parse("Today is 12-01-18", fuzzy_with_tokens=True) [/code] Above, we use a method in dateutil called parse. The first parameter to this method is the string of the text we want to use to search for a date. The second parameter, fuzzy_with_tokens, is set equal to True - this causes the method to return where the date is found in the…
Read More

Intro to Python Course

Python
For anyone in the NYC area, I am offering an in-person introductory Python course on January 7th, 2019. The description of the workshop is below. Please see here to register on Eventbrite. Want to learn Python? Consider attending this workshop! This hands-on class will be a great introduction for how to code in Python, the important features of the language, and will help you build a strong foundation for learning more in the future! Overview This course provides a workshop for introducing you to Python. We'll walk through how to write and run Python programs, when to use particular data structures, how to handle different data types, and more. The class will be a great start in learning one of the most versatile, powerful programming languages being used today! All…
Read More
How to measure DNA similarity with Python and Dynamic Programming

How to measure DNA similarity with Python and Dynamic Programming

Python
*Note, if you want to skip the background / alignment calculations and go straight to where the code begins, just click here. Dynamic Programming and DNA Dynamic programming has many uses, including identifying the similarity between two different strands of DNA or RNA, protein alignment, and in various other applications in bioinformatics (in addition to many other fields). For anyone less familiar, dynamic programming is a coding paradigm that solves recursive problems by breaking them down into sub-problems using some type of data structure to store the sub-problem results. In this way, recursive problems (like the Fibonacci sequence for example) can be programmed much more efficiently because dynamic programming allows you to avoid duplicate (and hence, wasteful) calculations in your code. Click here to read more about dynamic programming. Let's…
Read More

Data Analysis with Python Course: How to read, wrangle, and analyze data

Pandas, Python
For anyone in the NYC area, I will be holding an in-person training session December 3rd on doing data analysis with Python. We will be covering the pandas, pyodbc, and matplotlib packages. Please register at Eventbrite here: https://www.eventbrite.com/e/data-analysis-with-python-how-to-read-wrangle-and-analyze-data-tickets-51945542516. Overview Learn how to apply Python to read, wrangle, visualize, and analyze data!  This course provides a hands-on session where we'll walk through a prepared curriculum on doing data analysis with Python.  All code and practice exercises during the session will be made available after the course is complete.     About the course During this hands-on class, you will learn the fundamentals of doing data analysis in Python, the powerful pandas package, and pyodbc for connecting to databases. We will walk through using Python to analyze and answer key questions on sales…
Read More
Dpylthon…dplyr for Python!

Dpylthon…dplyr for Python!

Python
If you're an avid R user, you probably use the famous dplyr package. Python has a package meant to be similar to dplyr, called dplython. This article will give an introduction for how to use dplython. For the examples below, we'll use a sample dataset that comes with R giving attributes about the US states, including population, area, and income levels. You can see the dataset by clicking here. Initial setup dplython can be installed using pip:. pip install dplython Once the package is installed, let's load a few methods from it, and read in our dataset. [code lang="python"] # load packages from dplython import select, DplyFrame, X, arrange, count, sift, head, summarize, group_by, tail, mutate import pandas as pd # read in data state_df = pd.read_csv("state_info.txt") [/code] After we've…
Read More

How to get live stock prices with Python

Python, Web Scraping
In a previous post, I gave an introduction to the yahoo_fin package. The most updated version of the package includes new functionality allowing you to scrape live stock prices from Yahoo Finance (real-time). In this article, we'll go through a couple ways of getting real-time data from Yahoo Finance for stocks, as well as how to pull cryptocurrency price information. The get_live_price function First, we just need to load the stock_info module from yahoo_fin. [code lang="python"] # import stock_info module from yahoo_fin from yahoo_fin import stock_info as si [/code] Then, obtaining the current price of a stock is as simple as one line of code: [code lang="python"] # get live price of Apple si.get_live_price("aapl") # or Amazon si.get_live_price("amzn") # or any other ticker si.get_live_price(ticker) [/code] Note: Passing tickers is not…
Read More
How to download image files with RoboBrowser

How to download image files with RoboBrowser

Python, Web Scraping
In a previous post, we showed how RoboBrowser can be used to fill out online forms for getting historical weather data from Wunderground. This article will talk about how to use RoboBrowser to batch download collections of image files from Pexels, a site which offers free downloads. If you're looking to work with images, or want to build a training set for an image classifier with Python, this post will help you do that. In the first part of the code, we'll load the RoboBrowser class from the robobrowser package, create a browser object which acts like a web browser, and navigate to the Pexels website. [code lang="python"] # load the RoboBrowser class from robobrowser from robobrowser import RoboBrowser # define base site base = "https://www.pexels.com/" # create browser object,…
Read More
ICA on Images with Python

ICA on Images with Python

Python
Click here to see my recommended reading list. What is Independent Component Analysis (ICA)? If you're already familiar with ICA, feel free to skip below to how we implement it in Python. ICA is a type of dimensionality reduction algorithm that transforms a set of variables to a new set of components; it does so such that that the statistical independence between the new components is maximized. This is similar to Principle Component Analysis (PCA), which maps a collection of variables to statistically uncorrelated components, except that ICA goes a step further by maximizing statistical independence rather than just developing components that are uncorrelated. Like other dimensionality reduction methods, ICA seeks to reduce the number of variables in a set of data, while retaining key information. In the example we…
Read More