How to hide a password in R with the keyring package

How to hide a password in R with the keyring package

R
This post will introduce using the keyring package to hide a password. Short background The keyring package is a library designed to let you access your operating system's credential store. In essence, it lets you store and retrieve passwords in your operating system, which allows you to avoid having a password in plaintext in an R script. Storing a password Storing a password with keyring is really straightforward. First, we just need to load the keyring package. Then we call a function called key_set_with_value. In this function, we'll input three different parameters - service, username and password. [code lang="R"] # load keyring package library(keyring) # Store email username with password key_set_with_value(service = "user_email", username = "your_address@example.com", password = "test password") [/code] The username and password stored are just that -…
Read More
Web Browsing and Parsing with RoboBrowser and requests_html

Web Browsing and Parsing with RoboBrowser and requests_html

Python, Web Scraping
Background So you've learned all about BeautifulSoup. What's next? Python is a great language for automating web operations. In a previous article we went through how to use BeautifulSoup and requests to scrape stock-related articles from Nasdaq's website. This post talks about a couple of alternatives to using BeautifulSoup directly. One way of scraping and crawling the web is to use Python's RoboBrowser package, which is built on top of requests and BeautifulSoup. Because it's built using each of these packages, writing code to scrape the web is a bit simplified as we'll see below. RoboBrowser works similarly to the older Python 2.x package mechanize in that it allows you to simulate a web browser. A second option is using requests_html, which was also discussed here, and which we'll also…
Read More
Does “Sell in May, Go Away” really work?

Does “Sell in May, Go Away” really work?

R
If you follow the stock market, you've probably heard the expression "Sell in May, Go Away." This expression generally refers to the perceived idea that the stock market goes up between the end of October and end of April, but one should sell at the beginning of May to avoid losses. The general recommendation according to the theory is to hold money in a money market account during the "short period" of May through October, and then reinvest in the stock market in November. But how does this myth hold up in reality? Let's use R to find out! Our analysis will look strictly at the S&P 500 performance during the years 1970 to the present (so we won't dive into interest rate levels, money market accounts, etc.). Getting started…
Read More
All about Python Sets

All about Python Sets

Python
See also my tutorials on lists and list comprehensions. Background on sets A set in Python is an unordered collection of unique elements. Sets are mutable and iterable (more on these properties later). Sets are useful for when dealing with a unique collection of elements - e.g. finding the unique elements within a list to determine if there are are any values which should be present. The operations built around sets are also handy when you need to perform mathematical set-like operations. For example, how would you figure out the common elements between two lists? Or what elements are in one list, but not another? With sets, it's easy! How to create a set We can define a set using curly braces, similar to how we define dictionaries. [code lang="python"]…
Read More
3 ways to scrape tables from PDFs with Python

3 ways to scrape tables from PDFs with Python

Python
This post will go through a few ways of scraping tables from PDFs with Python. To learn more about scraping tables and other data from PDFs with R, click here. Note, this options will only work for PDFs that are typed - not scanned-in images. tabula-py tabula-py is a very nice package that allows you to both scrape PDFs, as well as convert PDFs directly into CSV files. tabula-py can be installed using pip: [code] pip install tabula-py [/code] If you have issues with installation, check this. Once installed, tabula-py is straightforward to use. Below we use it scrape all the tables from a paper on classification regarding the Iris dataset (available here). [code lang="python"] import tabula file = "http://lab.fs.uni-lj.si/lasin/wp/IMIT_files/neural/doc/seminar8.pdf" tables = tabula.read_pdf(file, pages = "all", multiple_tables = True) [/code]…
Read More
Four ways to reverse a string in R

Four ways to reverse a string in R

R
R offers several ways to reverse a string, include some base R options. We go through a few of those in this post. We'll also compare the computational time for each method. Reversing a string can be especially useful in bioinformatics (e.g. finding the reverse compliment of a DNA strand). To get started, let's generate a random string of 10 million DNA bases (we can do this with the stringi package as well, but for our purposes here, let's just use base R functions). [code lang="R"] set.seed(1) dna <- paste(sample(c("A", "T", "C", "G"), 10000000, replace = T), collapse = "") [/code] 1) Base R with strsplit and paste One way to reverse a string is to use strsplit with paste. This is the slowest method that will be shown, but…
Read More
How to get options data with Python

How to get options data with Python

Python, Web Scraping
In a previous post, we talked about how to get real-time stock prices with Python. This post will go through how to download financial options data with Python. We will be using the yahoo_fin package. The yahoo_fin package comes with a module called options. This module allows you to scrape option chains and get option expiration dates. To get started we'll just import this module from yahoo_fin. [code lang="python"] from yahoo_fin import options [/code] How to get options expiration dates Any option contract has an expiration date. To get all of the option expiration dates for a particular stock, we can use the get_expiration_dates method in the options package. This method is equivalent to scraping all of the date selection boxes on the options page for an individual stock (e.g.…
Read More

Don’t forget the “utils” package in R

R
With thousands of powerful packages, it's easy to glaze over the libraries that come preinstalled with R. Thus, this post will talk about some of the cool functions in the utils package, which comes with a standard installation of R. While utils comes with several familiar functions, like read.csv, write.csv, and help, it also contains over 200 other functions. readClipboard and writeClipboard One of my favorite duo of functions from utils is readCLipboard and writeClipboard. If you're doing some manipulation to get a quick answer between R and Excel, these functions can come in handy. readClipboard reads in whatever is currently on the Clipboard. For example, let's copy a column of cells from Excel. We can now run readClipboard() in R. The result of running this command is a vector…
Read More
Speed Test: Sapply vs. Vectorization

Speed Test: Sapply vs. Vectorization

R
The apply functions in R are awesome (see this post for some lesser known apply functions). However, if you can use pure vectorization, then you'll probably end up making your code run a lot faster than just depending upon functions like sapply and lapply. This is because apply functions like these still rely on looping through elements in a vector or list behind the scenes - one at a time. Vectorization, on the other hand, allows parallel operations under the hood - allowing much faster computation. This posts runs through a couple such examples involving string substitution and fuzzy matching. String substitution For example, let's create a vector that looks like this: test1, test2, test3, test4, ..., test1000000 with one million elements. With sapply, the code to create this would…
Read More
Why defining constants is important – a Python example

Why defining constants is important – a Python example

Python
This post will walk through an example of why defining a known constant can save lots of computational time. How to find the key with the maximum value in a Python dictionary There's a few ways to go about getting the key associated with the max value in a Python dictionary. The two ways we'll show each involve using a list comprehension. First, let's set the scene by creating a dictionary with 100,000 key-value pairs. We'll just make the keys the integers between 0 and 99,999 and we'll use the random package to randomly assign values for each of these keys based off the uniform distribution between 0 and 100,000. [code lang="python"] import random import time vals = [random.uniform(0, 100000) for x in range(100000)] mapping = dict(zip(range(100000), vals)) [/code] Now,…
Read More