Classification models for Pump it Up project

In this submodule, we’ll build a number of classification models for the pumpitup project. In particular, we will explore:

  • using sklearn transformers and pipelines to streamline workflow,

  • logistic regression with regularization,

  • random forests and boosted trees and other ensembles.

A good, short, article on avoiding data leakage when building ML models is this one by Kevin Markham at Data School.

You’ll be working in your newly created pumpitup project folder.

Start by opening the model_exploration.ipynb notebook in Jupyter Lab.

Here is are screencasts to help guide you through the notebook:

OPTIONAL ADVANCED MATERIAL

If you want to learn a bit about one more popular machine learning technique, gradient boosting machines, you can check out the following short intro in the gradient_boosting.ipynb notebook - just take a stroll through to learn about one of the newer classification techniques available in sklearn.

And we’re done with Module 2

Next we’ll be using Python to do a bunch of analytics work that we’d usually do in Excel.