How to measure DNA similarity with Python and Dynamic Programming

How to measure DNA similarity with Python and Dynamic Programming

Python
*Note, if you want to skip the background / alignment calculations and go straight to where the code begins, just click here. Dynamic Programming and DNA Dynamic programming has many uses, including identifying the similarity between two different strands of DNA or RNA, protein alignment, and in various other applications in bioinformatics (in addition to many other fields). For anyone less familiar, dynamic programming is a coding paradigm that solves recursive problems by breaking them down into sub-problems using some type of data structure to store the sub-problem results. In this way, recursive problems (like the Fibonacci sequence for example) can be programmed much more efficiently because dynamic programming allows you to avoid duplicate (and hence, wasteful) calculations in your code. Click here to read more about dynamic programming. Let's…
Read More
How to build a logistic regression model from scratch in R

How to build a logistic regression model from scratch in R

Machine Learning, R
Background In a previous post, we showed how using vectorization in R can vastly speed up fuzzy matching. Here, we will show you how to use vectorization to efficiently build a logistic regression model from scratch in R. Now we could just use the caret or stats packages to create a model, but building algorithms from scratch is a great way to develop a better understanding of how they work under the hood. Definitions & Assumptions In developing our code for the logistic regression algorithm, we will consider the following definitions and assumptions: x = A dxn matrix of d predictor variables, where each column xi represents the vector of predictors corresponding to one data point (with n such columns i.e. n data points) d = The number of predictor…
Read More