Week 11 — More Modeling (11/1–5)

In this week, we’re going to learn more about model building, that will be useful in Assignment 5:

  • Feature engineering

  • SciKit-Learn pipelines and workflows

  • Regularization

  • Analyzing model results

🧐 Content Overview

Element Length

🎥 Intro and Context

4m39s

🎥 Feature Transforms

21m3s

🎥 Workflow

14m29s

🎥 SciKit Pipelines

7m19s

🎥 Regularization

15m4s

🎥 Models and Depth

7m23s

🎥 Inference and Ablation

14m55s

📃 Statistical Significance Tests for Comparing Machine Learning Algorithms

3400 words

🎥 Dates

8m34s

This week has 1h33m of video and 3400 words of assigned readings. This week’s videos are available in a Panopto folder and as a podcast.

🎥 Intro & Context

In this video, I review where we are at conceptually, and recap the ideas of estimating conditional probability and expectation.

🎥 Feature Transforms

What are some useful techniques for engineering features in an application?

🎥 Workflow

How do you do feature engineering and model selection in a machine learning workflow? What is the iterative process involved?

🎥 SciKit Pipelines

In this video, I introduce SciKit pipelines that put multiple transformations together.

📃 SciKit Learn Pipelines

Read the SciKit-Learn User Guide chapter on pipelines.

📃 SciKit Learn Preprocessing

Read the SciKit-Learn User Guide chapter on pre-processing.

🎥 Regularization

This video introduces regularization: ridge regression, lasso regression, and the elasticnet. Lasso regression can help with (semi-)automatic feature selection.

📓 Pipeline and Regularization

This notebook demonstrates pipelines and \(L_2\) regression, and performs a significance test of classifier improvement.

It also shows a training of a decision tree (next video).

📓 Advanced Pipelines

The Advanced Pipelines notebook demonstrates a much more advanced SciKit-Learn pipeline.

🎥 Models and Depth

What does the world look like beyond logistic regression? Can a model output be a feature?

🎥 Inference and Ablation

How do we understand, robustly, the performance of our system? What contributes to its performance?

📃 Statistical Significance Tests

Read Statistical Significance Tests for Comparing Machine Learning Algorithms.

Note

In the Week 9 activity, we used the paired t-test for comparing the output of two regression models. Our use of this test did not violate the guidance in this reading — why is that?

For further reading, you can also see Approximate Statistical Tests.

🎥 Dates

This video discusses how to use work with dates in Pandas.

🚩 Quiz 11

Quiz 11 is in Canvas.

📩 Assignment 5

Assignment 5 is due November 7.