Background
AUC is an important metric in machine learning for classification. It is often used as a measure of a model’s performance. In effect, AUC is a measure between 0 and 1 of a model’s performance that rank-orders predictions from a model. For a detailed explanation of AUC, see this link.
Since AUC is widely used, being able to get a confidence interval around this metric is valuable to both better demonstrate a model’s performance, as well as to better compare two or more models. For example, if model A has an AUC higher than model B, but the 95% confidence interval around each AUC value overlaps, then the models may not be statistically different in performance. We can get a confidence interval around AUC using R’s pROC package, which uses bootstrapping to calculate the interval.
Building a simple model to test
To demonstrate how to get an AUC confidence interval, let’s build a model using a movies dataset from Kaggle (you can get the data here).
Reading in the data
# load packages library(pROC) library(dplyr) library(randomForest) # read in dataset movies <- read.csv("movie_metadata.csv") # remove records with missing budget / gross data movies <- movies %>% filter(!is.na(budget) & !is.na(gross))
Split into train / test
Next, let’s randomly select 70% of the records to be in the training set and leave the rest for testing.
# get random sample of rows set.seed(0) train_rows <- sample(1:nrow(movies), .7 * nrow(movies)) # split data into train / test train_data <- movies[train_rows,] test_data <- movies[-train_rows,] # select only fields we need train_need <- train_data %>% select(gross, duration, director_facebook_likes, budget, imdb_score, content_rating, movie_title) test_need <- test_data %>% select(gross, duration, director_facebook_likes, budget, imdb_score, content_rating, movie_title)
Create the label
Lastly, we need to create our label i.e. what we’re trying to predict. Here, we’re going to predict if a movie’s gross beats its budget (1 if so, 0 if not).
train_need$beat_budget <- as.factor(ifelse(train_need$gross > train_need$budget, 1, 0)) test_need$beat_budget <- as.factor(ifelse(test_need$gross > test_need$budget, 1, 0))
Train a random forest
Now, let’s train a simple random forest model with just 50 trees.
train_need <- train_need[complete.cases(train_need),] # train a random forest forest <- randomForest(beat_budget ~ duration + director_facebook_likes + budget + imdb_score + content_rating, train_need, ntree = 50)
Getting an AUC confidence interval
Next, let’s use our model to get predictions on the test set.
test_pred <- predict(forest, test_need, type = "prob")[,2]
And now, we’re reading to get our confidence interval! We can do that in just one line of code using the ci.auc function from pROC. By default, this function uses 2000 bootstraps to calculate a 95% confidence interval. This means our 95% confidence interval for the AUC on the test set is between 0.6198 and 0.6822, as can be seen below.
ci.auc(test_need$beat_budget, test_pred) # 95% CI: 0.6198-0.6822 (DeLong)
We can adjust the confidence interval using the conf.level parameter:
ci.auc(test_need$beat_budget, test_pred, conf.level = 0.9) # 90% CI: 0.6248-0.6772 (DeLong)
That’s it for this post! Please click here to follow this blog on Twitter!
See here to learn more about the pROC package.
Hi! Does this work with the survival package?
Best wishes,
Sebastián
Great question. Are you asking about using this to get a confidence interval around a time-dependent AUC for survival analysis? If you’re getting a time-dependent AUC in a survival model, then this won’t work without modification. The method in this post is based on binary classification without time dependencies.
Best,
Andrew
Hello,
This chunck does not wor k :
# train a random forest
forest <- randomForest(beat_budget ~ duration + director_facebook_likes + budget + imdb_score + content_rating,
train_need, ntree = 50, na.omit = TRUE)
I think that you need to replace "na.omit=TRUE" by "na.action=na.exclude".
After that, things are going well for me.
Best,
Fabien
Thanks. It doesn’t actually give me an error – but perhaps that’s a difference in version. I meant to take that out because the complete.cases line above already takes out rows with missing values, so it’s not necessary anyway.