Faster data exploration with DataExplorer
Data exploration is an important part of the modeling process. It can also take up a fair amount of time. The awesome DataExplorer package in R aims to make this process easier. To get started with DataExplorer, you'll need to install it like below: [code lang="R"] install.packages("DataExplorer") [/code] Let's use DataExplorer to explore a dataset on diabetes. [code lang="R"] # load DataExplorer library(DataExplorer) # read in dataset diabetes_data <- read.csv("https://raw.githubusercontent.com/jbrownlee/Datasets/master/pima-indians-diabetes.csv", header = FALSE) # fix column names names(diabetes_data) <- c("number_of_times_pregnant", "plasma_glucose_conc", "diastolic_bp", "triceps_skinfold_thickness", "two_hr_serum_insulin", "bmi", "diabetes_pedigree_function", "age", "label") # create report create_report(diabetes_data) [/code] Running the create_report line of code above will generate an HTML report file containing a collection of useful information about the data. This includes: Basic statistics, such as number of rows and columns, number of columns with…