Week 11 — More Modeling (10/31–11/4)#

In this week, we’re going to learn more about model building, that will be useful in Assignment 5:

  • Feature engineering

  • SciKit-Learn pipelines and workflows

  • Regularization

  • Analyzing model results

🧐 Content Overview#

Element Length

🎥 Intro and Context

4m39s

🎥 Feature Transforms

21m3s

🎥 Workflow and Iteration

14m29s

🎥 Pipelines

7m19s

🎥 Regularization

15m4s

🎥 Models and Depth

7m23s

🎥 Inference and Ablation

14m55s

📃 Statistical Significance Tests for Comparing Machine Learning Algorithms

3400 words

🎥 Dates

8m34s

This week has 1h33m of video and 3400 words of assigned readings. This week’s videos are available in a Panopto folder.

🎥 Intro & Context#

In this video, I review where we are at conceptually, and recap the ideas of estimating conditional probability and expectation.

🎥 Feature Transforms#

What are some useful techniques for engineering features in an application?

🎥 Workflow#

How do you do feature engineering and model selection in a machine learning workflow? What is the iterative process involved?

🎥 SciKit Pipelines#

In this video, I introduce SciKit pipelines that put multiple transformations together.

📃 SciKit Learn Pipelines#

Read the SciKit-Learn User Guide chapter on pipelines.

📃 SciKit Learn Preprocessing#

Read the SciKit-Learn User Guide chapter on pre-processing.

🎥 Regularization#

This video introduces regularization: ridge regression, lasso regression, and the elasticnet. Lasso regression can help with (semi-)automatic feature selection.

📓 Pipeline and Regularization#

This notebook demonstrates pipelines and \(L_2\) regression, and performs a significance test of classifier improvement.

It also shows a training of a decision tree (next video).

📓 Advanced Pipelines#

The Advanced Pipelines notebook demonstrates a much more advanced SciKit-Learn pipeline.

🎥 Models and Depth#

What does the world look like beyond logistic regression? Can a model output be a feature?

🎥 Inference and Ablation#

How do we understand, robustly, the performance of our system? What contributes to its performance?

📃 Statistical Significance Tests#

Read Statistical Significance Tests for Comparing Machine Learning Algorithms.

Note

In the Week 9 activity, we used the paired t-test for comparing the output of two regression models. Our use of this test did not violate the guidance in this reading — why is that?

For further reading, you can also see Approximate Statistical Tests.

🎥 Dates#

This video discusses how to use work with dates in Pandas.

🚩 Quiz 11#

Quiz 11 is in Canvas.

📩 Assignment 5#

Assignment 5 is due November 6.