Really large numbers in R

Really large numbers in R

R
This post will discuss ways of handling huge numbers in R using the gmp package. The gmp package The gmp package provides us a way of dealing with really large numbers in R. For example, let's suppose we want to multiple 10250 by itself. Mathematically we know the result should be 10500. But if we try this calculation in base R we get Inf for infinity. [code lang="R"] num = 10^250 num^2 # Inf [/code] However, we can get around this using the gmp package. Here, we can convert the integer 10 to an object of the bigz class. This is an implementation that allows us to handle very large numbers. Once we convert an integer to a bigz object, we can use it to perform calculations with regular numbers…
Read More
Testing the Collatz Conjecture with R

Testing the Collatz Conjecture with R

R
Background The Collatz Conjecture is a famous unsolved problem in number theory. If you're not familiar with it - the conjecture is very simple to understand, yet, no one has been able to mathematically prove that the conjecture is true (though it's been shown to be true for an enormous number of cases). The conjecture states the following: Start with any whole number. If the number is even, divide by two. If the number is odd, multiply the number by three and add one. Then, repeat this logic with the result number. Eventually you'll end up with the number one. Effectively, this is can written like this: For any whole number n: If n mod 2 == 0, then n = n / 2 Else n = 3 * n…
Read More
How to build a logistic regression model from scratch in R

How to build a logistic regression model from scratch in R

Machine Learning, R
Background In a previous post, we showed how using vectorization in R can vastly speed up fuzzy matching. Here, we will show you how to use vectorization to efficiently build a logistic regression model from scratch in R. Now we could just use the caret or stats packages to create a model, but building algorithms from scratch is a great way to develop a better understanding of how they work under the hood. Definitions & Assumptions In developing our code for the logistic regression algorithm, we will consider the following definitions and assumptions: x = A dxn matrix of d predictor variables, where each column xi represents the vector of predictors corresponding to one data point (with n such columns i.e. n data points) d = The number of predictor…
Read More
Running R Code in Parallel

Running R Code in Parallel

R
Background Running R code in parallel can be very useful in speeding up performance. Basically, parallelization allows you to run multiple processes in your code simultaneously, rather than than iterating over a list one element at a time, or running a single process at a time. Thankfully, running R code in parallel is relatively simple using the parallel package. This package provides parallelized versions of sapply, lapply, and rapply. Parallelizing code works best when you need to call a function or perform an operation on different elements of a list or vector when doing so on any particular element of the list (or vector) has no impact on the evaluation of any other element. This could be running a large number of models across different elements of a list, scraping…
Read More