Classification models for Pump it Up project¶

In this submodule, we’ll build a number of classification models for the pumpitup project. In particular, we will explore:

using sklearn transformers and pipelines to streamline workflow,
logistic regression with regularization,
random forests and boosted trees and other ensembles.

A good, short, article on avoiding data leakage when building ML models is this one by Kevin Markham at Data School.

You’ll be working in your newly created pumpitup project folder.

Start by opening the model_exploration.ipynb notebook in Jupyter Lab.

Here is are screencasts to help guide you through the notebook:

SCREENCAST: Intro and data preprocessing (5:46)
SCREENCAST: Logistric regression review and overview of regularization (10:49)
SCREENCAST: Preprocessing with column transformers (8:10)
SCREENCAST: Logistic regression model and solvers (3:11)
SCREENCAST: Creating a preprocessing and model estimation pipeline (3:17)
SCREENCAST: Data partitioning and modeling fitting (17:39)
SCREENCAST: Cross validation and Predictions (7:41)
SCREENCAST: Automation and Model persistence (14:03)
SCREENCAST: Random forests (6:10)

OPTIONAL ADVANCED MATERIAL¶

If you want to learn a bit about one more popular machine learning technique, gradient boosting machines, you can check out the following short intro in the gradient_boosting.ipynb notebook - just take a stroll through to learn about one of the newer classification techniques available in sklearn.

And we’re done with Module 2¶

Next we’ll be using Python to do a bunch of analytics work that we’d usually do in Excel.

Classification models for Pump it Up project¶

OPTIONAL ADVANCED MATERIAL¶

And we’re done with Module 2¶

Table of Contents

Previous topic

Next topic

This Page