Week 11 — Evaluation
Activities:
There is no quiz this week.
This week's videos are also available as a Panopto playlist.
Intro & Context
In this video, I review where we are at conceptually, and recap the ideas of estimating conditional probability and expectation.
Feature Transforms
What are some useful techniques for engineering features in an application?
Workflow
How do you do feature engineering and model selection in a machine learning workflow? What is the iterative process involved?
SciKit Pipelines
In this video, I introduce SciKit pipelines that put multiple transformations together.
SciKit Learn Pipelines
Read the SciKit-Learn User Guide chapter on pipelines.
SciKit Learn Preprocessing
Read the SciKit-Learn User Guide chapter on pre-processing.
Regularization
This video introduces regularization: ridge regression, lasso regression, and the elasticnet. Lasso regression can help with (semi-)automatic feature selection.
Pipeline and Regularization
This notebook demonstrates pipelines and \(L_2\) regression, and performs a significance test of classifier improvement.
It also shows a training of a decision tree (next video).
Models and Depth
What does the world look like beyond logistic regression? Can a model output be a feature?
Inference and Ablation
How do we understand, robustly, the performance of our system? What contributes to its performance?
Statistical Significance Tests
Read Statistical Significance Tests for Comparing Machine Learning Algorithms.
For further reading, you can also see Approximate Statistical Tests.
Dates
This video discusses how to use work with dates in Pandas.
Links
- Date operations notebook
- Pandas time series / date functionality
- Pandas time deltas
- DateOffset
- Format codes
Assignment 5
Assignment 5 is due November 11, 2020.