Classification models for Pump it Up project¶
In this submodule, we’ll build a number of classification models for the pumpitup
project. In particular, we will explore:
using sklearn transformers and pipelines to streamline workflow,
logistic regression with regularization,
random forests and boosted trees and other ensembles.
A good, short, article on avoiding data leakage when building ML models is this one by Kevin Markham at Data School.
You’ll be working in your newly created pumpitup
project folder.
Start by opening the model_exploration.ipynb
notebook in Jupyter Lab.
Here is are screencasts to help guide you through the notebook:
SCREENCAST: Logistric regression review and overview of regularization (10:49)
SCREENCAST: Creating a preprocessing and model estimation pipeline (3:17)
SCREENCAST: Random forests (6:10)
OPTIONAL ADVANCED MATERIAL¶
If you want to learn a bit about one more popular machine learning technique, gradient boosting machines, you can check out the following short intro in the gradient_boosting.ipynb
notebook - just take a stroll through to learn about one of the newer classification techniques available in sklearn.
And we’re done with Module 2¶
Next we’ll be using Python to do a bunch of analytics work that we’d usually do in Excel.