Speed Test: Sapply vs. Vectorization

Speed Test: Sapply vs. Vectorization

R
The apply functions in R are awesome (see this post for some lesser known apply functions). However, if you can use pure vectorization, then you'll probably end up making your code run a lot faster than just depending upon functions like sapply and lapply. This is because apply functions like these still rely on looping through elements in a vector or list behind the scenes - one at a time. Vectorization, on the other hand, allows parallel operations under the hood - allowing much faster computation. This posts runs through a couple such examples involving string substitution and fuzzy matching. String substitution For example, let's create a vector that looks like this: test1, test2, test3, test4, ..., test1000000 with one million elements. With sapply, the code to create this would…
Read More
How to build a logistic regression model from scratch in R

How to build a logistic regression model from scratch in R

Machine Learning, R
Background In a previous post, we showed how using vectorization in R can vastly speed up fuzzy matching. Here, we will show you how to use vectorization to efficiently build a logistic regression model from scratch in R. Now we could just use the caret or stats packages to create a model, but building algorithms from scratch is a great way to develop a better understanding of how they work under the hood. Definitions & Assumptions In developing our code for the logistic regression algorithm, we will consider the following definitions and assumptions: x = A dxn matrix of d predictor variables, where each column xi represents the vector of predictors corresponding to one data point (with n such columns i.e. n data points) d = The number of predictor…
Read More