Very excited to announce the early-access preview (MEAP) of my upcoming book, Software Engineering for Data Scientists is available now!…
In this post, we're going to cover how to plot XGBoost trees in R. XGBoost is a very popular machine…
This post will explore the mathematics behind information gain. We'll start with the base intuition behind information gain, but then…
This post will explore using R's MLmetrics to evaluate machine learning models. MLmetrics provides several functions to calculate common metrics…
Background AUC is an important metric in machine learning for classification. It is often used as a measure of a…
Background In a previous post, we showed how using vectorization in R can vastly speed up fuzzy matching. Here, we…
Click here to see my recommended reading list. What is Independent Component Analysis (ICA)? If you're already familiar with ICA,…