stringi Archives - Open Source Automation

17May 2019 by Andrew Treadway

Four ways to reverse a string in R

R offers several ways to reverse a string, include some base R options. We go through a few of those in this post. We'll also compare the computational time for each method. Reversing a string can be especially useful in bioinformatics (e.g. finding the reverse compliment of a DNA strand). To get started, let's generate a random string of 10 million DNA bases (we can do this with the stringi package as well, but for our purposes here, let's just use base R functions). [code lang="R"] set.seed(1) dna <- paste(sample(c("A", "T", "C", "G"), 10000000, replace = T), collapse = "") [/code] 1) Base R with strsplit and paste One way to reverse a string is to use strsplit with paste. This is the slowest method that will be shown, but…

13Mar 2019 by Andrew Treadway

Speed Test: Sapply vs. Vectorization

The apply functions in R are awesome (see this post for some lesser known apply functions). However, if you can use pure vectorization, then you'll probably end up making your code run a lot faster than just depending upon functions like sapply and lapply. This is because apply functions like these still rely on looping through elements in a vector or list behind the scenes - one at a time. Vectorization, on the other hand, allows parallel operations under the hood - allowing much faster computation. This posts runs through a couple such examples involving string substitution and fuzzy matching. String substitution For example, let's create a vector that looks like this: test1, test2, test3, test4, ..., test1000000 with one million elements. With sapply, the code to create this would…

Tag: stringi