Pump it Up: Data Prep¶
In this submodule, we’ll do a little EDA data prep for the pumpitup
project. This isn’t meant to be an exhaustive example as our focus is really on classification modeling in Module 2. Nevertheless, there are some useful tips in here including:
automated EDA tools for Python,
doing factor lumping with a port of the R package,
forcats
,creating a data prep script,
getting your data ready for use with sklearn for classification models.
You’ll be working in your newly created pumpitup
project folder.
Start by opening the data_prep.ipynb
notebook in Jupyter Lab.
Here is a screencast to help guide you through the notebook:
SCREENCAST: Pump it Up data prep (25:27)
Move on to the last submodule, Classification models for Pump it Up project.