Automated EDA with Python
In this post, we will investigate the pandas_profiling and sweetviz packages, which can be used to speed up EDA (exploratory data analysis) with Python. In a previous article, we talked about an analagous package in R (see this link). Getting started with pandas_profiling pandas_profiling can be installed using pip, like this: [code] pip install pandas-profiling[notebook] [/code] Next, let's read in our dataset. The data we'll be using is a heart attack-related dataset, which can be found here. [code lang="python"] import pandas as pd heart_data = pd.read_csv("heart.csv") heart_data.head() [/code] Now, let's import ProfileReport from pandas_profiling. [code lang="python"] from pandas_profiling import ProfileReport report = ProfileReport(heart_data, title = "Sample Report") report [/code] If you're running this code in Jupyter Notebook, you should see the report generated within your notebook file. The report shows…