# Week 10 — Classification

Activities:

- What is Classification?
- Log-Odds and Logistics
- Logistic Regression
- The Confusion Matrix
- Logistic Regression Demo
- Week 10 Quiz
- Floating Point
- StatsModels Documentation
- Log Likelihood
- Scikit-Learn
- SciKit-Learn Logistic Regression
- Receiver Operating Characteristic
- Practice
- Biases and Assumptions
- Prediction-Based Decisions
- Abolish the #TechToPrison Pipeline
- Assignment 5

The videos are also available as a Panopto playlist.

## What is Classification?

In this video, I introduce the week and what classification is.

## Log-Odds and Logistics

In this video, I introduce log-odds, along with the *logistic function* and its inverse, *logit*.

## Logistic Regression

We're now ready for our first classification model: *logistic regression*.

## The Confusion Matrix

The *confusion matrix* describes the outcomes of a classification model and is the basis for computing effectiveness metrics.

### Resources

- The Wikipedia article has a very good diagram of the confusion matrix and its derived metrics.

## Logistic Regression Demo

The demo notebook for the first-half videos.

## Week 10 Quiz

The Week 10 quiz will be posted to Blackboard.

## Floating Point

This is provided for reference.

## StatsModels Documentation

The following StatsModels page documents its logistic regression:

This is **not** an assigned reading - it is here for your reference.

## Log Likelihood

This video describes the *log likelihood* that is the objective function used by logistic regression.

## Scikit-Learn

This video introduces SciKit-Learn, and using it for a logistic regression.

## SciKit-Learn Logistic Regression

The SciKit Logistic notebook demonstrates training and using a logistic regression classifier with SciKit-Learn.

## Receiver Operating Characteristic

This video introduces the *receiver operating characteristic* (ROC) curve, and its use in evaluating classifiers and selecting tradeoffs.

## Practice

Load the Penguin data, and use a logistic regression to try to classify a penguin as Gentoo or Chinstrap using various measurements. Delete the Adelie penguins first, so you have a binary classification problem.

## Biases and Assumptions

This video revisits sources of bias and discusses the assumptions underlying prediction.

## Prediction-Based Decisions

Read Sections 1 and 2 of the following paper:

Shira Mitchell, Eric Potash, Solon Barocas, Alexander D'Amour, Kristian Lum. 2018. Prediction-Based Decisions and Fairness: A Catalogue of Choices, Assumptions, and Definitions. arXiv:1811.07867 [stat.AP].

We'll come back to ideas here, but sections 1 and 2 describe the assumptions underlying most classification problems.

If you would like to learn more, I recommend:

## Abolish the #TechToPrison Pipeline

Read Abolish the #TechToPrison Pipeline (the Medium reading time estimate includes the thorough — and valuable — footnotes and list of 2435 signatories). This article probes in more detail the assumptions underlying classes of criminal justice data science applications.

## Assignment 5

Assignment 5 is due **November 11, 2020**.