# Week 10 — Classification (10/24–28)

## Contents

# Week 10 — Classification (10/24–28)#

This week we introduce **classification** as a prediction task, with methods for evaluating classifiers.

## 🧐 Content Overview#

Element |
Length |
---|---|

6m39s | |

10m4s | |

9m7s | |

11m48s | |

9m44s | |

16m54s | |

6m42s | |

7m25s | |

22m | |

3650 words | |

2000 words |

This week has **1h40m** of video and **5650 words** of assigned readings.

## 🎥 What is Classification?#

In this video, I introduce the week and what classification is.

## 🎥 Log-Odds and Logistics#

In this video, I introduce log odds, along with the *logistic function* and its inverse,
the *logit function*.
Log odds are a useful concept in many situations!

## 🎥 Logistic Regression#

We’re now ready for our first classification model: *logistic regression*.

## 🎥 The Confusion Matrix#

The *confusion matrix* describes the outcomes of a classification model and is the basis for computing effectiveness metrics.

### Resources#

The Wikipedia article has a very good diagram of the confusion matrix and its derived metrics.

## 📓 Logistic Regression Demo#

The demo notebook for our initial logistic regression videos.

## 🎥 Baseline Models#

## 📃 Floating Point#

This is provided for reference.

## 📃 StatsModels Documentation#

The following StatsModels page documents its logistic regression:

This is **not** an assigned reading - it is here for your reference.

## 🎥 Log Likelihood#

This video describes the *log likelihood* that is the objective function used by logistic regression.

## 🎥 Scikit-Learn#

This video introduces SciKit-Learn, and using it for a logistic regression.

## 📓 SciKit-Learn Logistic Regression#

The SciKit Logistic notebook demonstrates training and using
`sklearn.linear_model.LogisticRegression`

.

## 🎥 Receiver Operating Characteristic#

This video introduces the *receiver operating characteristic* (ROC) curve, and its use in evaluating classifiers and selecting tradeoffs.

## ✅ Practice#

Load the Penguin data, and use a logistic regression to try to classify a penguin as Gentoo or Chinstrap using various measurements. Delete the Adelie penguins first, so you have a binary classification problem.

## 🎥 Biases and Assumptions#

This video revisits sources of bias and discusses the assumptions underlying prediction.

## 📃 Prediction-Based Decisions#

Read Sections 1 and 2 of the following paper:

Shira Mitchell, Eric Potash, Solon Barocas, Alexander D’Amour, Kristian Lum. 2018. Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions. arXiv:1811.07867 [stat.AP].

We’ll come back to ideas here, but sections 1 and 2 describe the assumptions underlying most classification problems. While the overall topic of the paper is fairness in making these decisions, I am not assigning it because it is a fairness paper; rather, those first two sections provide a succinct description of the assumptions that we make when we undertake most classification problems. They apply no matter what properties of a classification problem or model we care about.

If you would like to learn more, I recommend:

## 🚩 Week 10 Quiz#

The Week 10 quiz will be posted to Canvas.

## 📃 Abolish the #TechToPrison Pipeline#

Read Abolish the #TechToPrison Pipeline (the Medium reading time estimate includes the thorough — and valuable — footnotes and list of 2435 signatories). This article probes in more detail the assumptions underlying classes of criminal justice data science applications.

## 📩 Assignment 5#

Assignment 5 is due **November 6**.