Week 10 — Classification (10/25–29)¶
🧐 Content Overview¶
This week has 1h40m of video and 5650 words of assigned readings.
🎥 What is Classification?¶
In this video, I introduce the week and what classification is.
🎥 Log-Odds and Logistics¶
🎥 Logistic Regression¶
We’re now ready for our first classification model: logistic regression.
🎥 The Confusion Matrix¶
The confusion matrix describes the outcomes of a classification model and is the basis for computing effectiveness metrics.
🎥 Baseline Models¶
📃 StatsModels Documentation¶
The following StatsModels page documents its logistic regression:
This is not an assigned reading - it is here for your reference.
🎥 Log Likelihood¶
This video describes the log likelihood that is the objective function used by logistic regression.
This video introduces SciKit-Learn, and using it for a logistic regression.
📓 SciKit-Learn Logistic Regression¶
🎥 Receiver Operating Characteristic¶
This video introduces the receiver operating characteristic (ROC) curve, and its use in evaluating classifiers and selecting tradeoffs.
Load the Penguin data, and use a logistic regression to try to classify a penguin as Gentoo or Chinstrap using various measurements. Delete the Adelie penguins first, so you have a binary classification problem.
🎥 Biases and Assumptions¶
This video revisits sources of bias and discusses the assumptions underlying prediction.
📃 Prediction-Based Decisions¶
Read Sections 1 and 2 of the following paper:
Shira Mitchell, Eric Potash, Solon Barocas, Alexander D’Amour, Kristian Lum. 2018. Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions. arXiv:1811.07867 [stat.AP].
We’ll come back to ideas here, but sections 1 and 2 describe the assumptions underlying most classification problems. While the overall topic of the paper is fairness in making these decisions, I am not assigning it because it is a fairness paper; rather, those first two sections provide a succinct description of the assumptions that we make when we undertake most classification problems. They apply no matter what properties of a classification problem or model we care about.
If you would like to learn more, I recommend:
🚩 Week 10 Quiz¶
The Week 10 quiz will be posted to Canvas.
📃 Abolish the #TechToPrison Pipeline¶
Read Abolish the #TechToPrison Pipeline (the Medium reading time estimate includes the thorough — and valuable — footnotes and list of 2435 signatories). This article probes in more detail the assumptions underlying classes of criminal justice data science applications.