*************************************************
Classification models for Pump it Up project 
*************************************************

In this submodule, we'll build a number of classification models for the ``pumpitup`` project. In particular, we will explore:

* using sklearn transformers and pipelines to streamline workflow,
* logistic regression with regularization,
* random forests and boosted trees and other ensembles.

A good, short, article on avoiding data leakage when building ML models 
is `this one by Kevin Markham at Data School <https://www.dataschool.io/machine-learning-data-leakage/>`_.

You'll be working in your newly created ``pumpitup`` project folder. 

Start by opening the ``model_exploration.ipynb`` notebook in Jupyter Lab.

Here is are screencasts to help guide you through the notebook:

* `SCREENCAST: Intro and data preprocessing <https://youtu.be/ZIrvHNyj13Q>`_ (5:46)
* `SCREENCAST: Logistric regression review and overview of regularization <https://youtu.be/_OWWa8iZMTI>`_ (10:49)
* `SCREENCAST: Preprocessing with column transformers <https://youtu.be/UPN27ALQWQI>`_ (8:10)
* `SCREENCAST: Logistic regression model and solvers <https://youtu.be/4S6MNqIQXwA>`_ (3:11)
* `SCREENCAST: Creating a preprocessing and model estimation pipeline <https://youtu.be/Ynzp-ErKMl4>`_ (3:17)
* `SCREENCAST: Data partitioning and modeling fitting <https://youtu.be/hUf_RBaOoLk>`_ (17:39)
* `SCREENCAST: Cross validation and Predictions <https://youtu.be/xgwUH-jMTqY>`_ (7:41)
* `SCREENCAST: Automation and Model persistence <https://youtu.be/X3pVedBFI0Q>`_ (14:03)
* `SCREENCAST: Random forests <https://youtu.be/RcX5f6ckgCM>`_ (6:10)

OPTIONAL ADVANCED MATERIAL
---------------------------

If you want to learn a bit about one more popular machine learning technique, *gradient boosting machines*, you can check out the following short intro in the ``gradient_boosting.ipynb`` notebook - just take a stroll through to learn about one of the newer classification techniques available in sklearn.

And we're done with Module 2
-----------------------------

Next we'll be using Python to do a bunch of analytics work that we'd usually do in Excel.