Python, Basket Analysis, and Pymining

Python, Basket Analysis, and Pymining

Python
Background Python's pymining package provides a collection of useful algorithms for item set mining, association mining, and more. We'll explore some of its functionality during this post by using it to apply basket analysis to tennis. When basket analysis is discussed, it's often in the context of retail - analyzing what combinations of products are typically bought together (or in the same "basket"). For example, in grocery shopping, milk and butter may be frequently purchased together. We can take ideas from basket analysis and apply them in many other scenarios. As an example - let's say we're looking at events like tennis tournaments where each tournament has different successive rounds i.e. quarterfinals, semifinals, finals etc. How would you figure out what combinations of players typically show up in the same…
Read More
How to get an AUC confidence interval

How to get an AUC confidence interval

Machine Learning, R
Background AUC is an important metric in machine learning for classification. It is often used as a measure of a model's performance. In effect, AUC is a measure between 0 and 1 of a model's performance that rank-orders predictions from a model. For a detailed explanation of AUC, see this link. Since AUC is widely used, being able to get a confidence interval around this metric is valuable to both better demonstrate a model's performance, as well as to better compare two or more models. For example, if model A has an AUC higher than model B, but the 95% confidence interval around each AUC value overlaps, then the models may not be statistically different in performance. We can get a confidence interval around AUC using R's pROC package, which…
Read More
Really large numbers in R

Really large numbers in R

R
This post will discuss ways of handling huge numbers in R using the gmp package. The gmp package The gmp package provides us a way of dealing with really large numbers in R. For example, let's suppose we want to multiple 10250 by itself. Mathematically we know the result should be 10500. But if we try this calculation in base R we get Inf for infinity. [code lang="R"] num = 10^250 num^2 # Inf [/code] However, we can get around this using the gmp package. Here, we can convert the integer 10 to an object of the bigz class. This is an implementation that allows us to handle very large numbers. Once we convert an integer to a bigz object, we can use it to perform calculations with regular numbers…
Read More