Underrated R Functions

Underrated R Functions

R
I wanted to write a post about a couple of handy functions in R that don't always get the recognition they deserve. This article will talk about a few functions that form part of R's core functional programming capabilities. R has thousands of functions, so this is just a short list, and I'll probably write other articles like this in the future to discuss some different R functions. Reduce Let's start with the Reduce function (note the capital "R"). Reduce takes a list or vector as input, and reduces it down to a single element. It works by applying a function to the first two elements of the vector or list, and then applying the same function to that result with the third element. This new result gets passed with…
Read More
Vectorize Fuzzy Matching

Vectorize Fuzzy Matching

R
One of the best things about R is its ability to vectorize code. This allows you to run code much faster than you would if you were using a for or while loop. In this post, we're going to show you how to use vectorization to speed up fuzzy matching. First, a little bit of background will be covered. If you're familiar with vectorization and / or fuzzy matching, feel free to skip further down the post. What is vectorization? Vectorization works by performing operations on entire vectors, or by extension, matrices, rather than iterating through each element in a collection of objects one at a time. A basic example is adding two vectors together. This can be done like this: [code lang="R"] a <- c(3, 4, 5) b <-…
Read More
Running R Code in Parallel

Running R Code in Parallel

R
Background Running R code in parallel can be very useful in speeding up performance. Basically, parallelization allows you to run multiple processes in your code simultaneously, rather than than iterating over a list one element at a time, or running a single process at a time. Thankfully, running R code in parallel is relatively simple using the parallel package. This package provides parallelized versions of sapply, lapply, and rapply. Parallelizing code works best when you need to call a function or perform an operation on different elements of a list or vector when doing so on any particular element of the list (or vector) has no impact on the evaluation of any other element. This could be running a large number of models across different elements of a list, scraping…
Read More