Week 10 — Classification
Activities:
- What is Classification?
- Log-Odds and Logistics
- Logistic Regression
- The Confusion Matrix
- Logistic Regression Demo
- Week 10 Quiz
- Floating Point
- StatsModels Documentation
- Log Likelihood
- Scikit-Learn
- SciKit-Learn Logistic Regression
- Receiver Operating Characteristic
- Practice
- Biases and Assumptions
- Prediction-Based Decisions
- Abolish the #TechToPrison Pipeline
- Assignment 5
The videos are also available as a Panopto playlist.
What is Classification?
In this video, I introduce the week and what classification is.
Log-Odds and Logistics
In this video, I introduce log-odds, along with the logistic function and its inverse, logit.
Logistic Regression
We're now ready for our first classification model: logistic regression.
The Confusion Matrix
The confusion matrix describes the outcomes of a classification model and is the basis for computing effectiveness metrics.
Resources
- The Wikipedia article has a very good diagram of the confusion matrix and its derived metrics.
Logistic Regression Demo
The demo notebook for the first-half videos.
Week 10 Quiz
The Week 10 quiz will be posted to Blackboard.
Floating Point
This is provided for reference.
StatsModels Documentation
The following StatsModels page documents its logistic regression:
This is not an assigned reading - it is here for your reference.
Log Likelihood
This video describes the log likelihood that is the objective function used by logistic regression.
Scikit-Learn
This video introduces SciKit-Learn, and using it for a logistic regression.
SciKit-Learn Logistic Regression
The SciKit Logistic notebook demonstrates training and using a logistic regression classifier with SciKit-Learn.
Receiver Operating Characteristic
This video introduces the receiver operating characteristic (ROC) curve, and its use in evaluating classifiers and selecting tradeoffs.
Practice
Load the Penguin data, and use a logistic regression to try to classify a penguin as Gentoo or Chinstrap using various measurements. Delete the Adelie penguins first, so you have a binary classification problem.
Biases and Assumptions
This video revisits sources of bias and discusses the assumptions underlying prediction.
Prediction-Based Decisions
Read Sections 1 and 2 of the following paper:
Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, Kristian Lum. 2018. Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions. arXiv:1811.07867 [stat.AP].
We'll come back to ideas here, but sections 1 and 2 describe the assumptions underlying most classification problems.
If you would like to learn more, I recommend:
Abolish the #TechToPrison Pipeline
Read Abolish the #TechToPrison Pipeline (the Medium reading time estimate includes the thorough — and valuable — footnotes and list of 2435 signatories). This article probes in more detail the assumptions underlying classes of criminal justice data science applications.
Assignment 5
Assignment 5 is due November 11, 2020.