All about Python Sets

All about Python Sets

Python
See also my tutorials on lists and list comprehensions. Background on sets A set in Python is an unordered collection of unique elements. Sets are mutable and iterable (more on these properties later). Sets are useful for when dealing with a unique collection of elements - e.g. finding the unique elements within a list to determine if there are are any values which should be present. The operations built around sets are also handy when you need to perform mathematical set-like operations. For example, how would you figure out the common elements between two lists? Or what elements are in one list, but not another? With sets, it's easy! How to create a set We can define a set using curly braces, similar to how we define dictionaries. [code lang="python"]…
Read More
3 ways to scrape tables from PDFs with Python

3 ways to scrape tables from PDFs with Python

Python
This post will go through a few ways of scraping tables from PDFs with Python. To learn more about scraping tables and other data from PDFs with R, click here. Note, this options will only work for PDFs that are typed - not scanned-in images. tabula-py tabula-py is a very nice package that allows you to both scrape PDFs, as well as convert PDFs directly into CSV files. tabula-py can be installed using pip: [code] pip install tabula-py [/code] If you have issues with installation, check this. Once installed, tabula-py is straightforward to use. Below we use it scrape all the tables from a paper on classification regarding the Iris dataset (available here). [code lang="python"] import tabula file = "http://lab.fs.uni-lj.si/lasin/wp/IMIT_files/neural/doc/seminar8.pdf" tables = tabula.read_pdf(file, pages = "all", multiple_tables = True) [/code]…
Read More
Four ways to reverse a string in R

Four ways to reverse a string in R

R
R offers several ways to reverse a string, include some base R options. We go through a few of those in this post. We'll also compare the computational time for each method. Reversing a string can be especially useful in bioinformatics (e.g. finding the reverse compliment of a DNA strand). To get started, let's generate a random string of 10 million DNA bases (we can do this with the stringi package as well, but for our purposes here, let's just use base R functions). [code lang="R"] set.seed(1) dna <- paste(sample(c("A", "T", "C", "G"), 10000000, replace = T), collapse = "") [/code] 1) Base R with strsplit and paste One way to reverse a string is to use strsplit with paste. This is the slowest method that will be shown, but…
Read More