Week 10 — Classification
- What is Classification?
- Log-Odds and Logistics
- Logistic Regression
- The Confusion Matrix
- Logistic Regression Demo
- Week 10 Quiz
- Floating Point
- StatsModels Documentation
- Log Likelihood
- SciKit-Learn Logistic Regression
- Receiver Operating Characteristic
- Biases and Assumptions
- Prediction-Based Decisions
- Abolish the #TechToPrison Pipeline
- Assignment 5
The videos are also available as a Panopto playlist.
What is Classification?
In this video, I introduce the week and what classification is.
Log-Odds and Logistics
In this video, I introduce log-odds, along with the logistic function and its inverse, logit.
We're now ready for our first classification model: logistic regression.
The Confusion Matrix
The confusion matrix describes the outcomes of a classification model and is the basis for computing effectiveness metrics.
- The Wikipedia article has a very good diagram of the confusion matrix and its derived metrics.
Logistic Regression Demo
The demo notebook for the first-half videos.
Week 10 Quiz
The Week 10 quiz will be posted to Blackboard.
This is provided for reference.
The following StatsModels page documents its logistic regression:
This is not an assigned reading - it is here for your reference.
This video describes the log likelihood that is the objective function used by logistic regression.
This video introduces SciKit-Learn, and using it for a logistic regression.
SciKit-Learn Logistic Regression
The SciKit Logistic notebook demonstrates training and using a logistic regression classifier with SciKit-Learn.
Receiver Operating Characteristic
This video introduces the receiver operating characteristic (ROC) curve, and its use in evaluating classifiers and selecting tradeoffs.
Load the Penguin data, and use a logistic regression to try to classify a penguin as Gentoo or Chinstrap using various measurements. Delete the Adelie penguins first, so you have a binary classification problem.
Biases and Assumptions
This video revisits sources of bias and discusses the assumptions underlying prediction.
Read Sections 1 and 2 of the following paper:
Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, Kristian Lum. 2018. Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions. arXiv:1811.07867 [stat.AP].
We'll come back to ideas here, but sections 1 and 2 describe the assumptions underlying most classification problems.
If you would like to learn more, I recommend:
Abolish the #TechToPrison Pipeline
Read Abolish the #TechToPrison Pipeline (the Medium reading time estimate includes the thorough — and valuable — footnotes and list of 2435 signatories). This article probes in more detail the assumptions underlying classes of criminal justice data science applications.
Assignment 5 is due November 11, 2020.