# Week 5 — Filling In (9/20–24)¶

This week introduces one new statistical concept — the hypothesis test — and is otherwise about **practice** and **solidifying concepts**.
I’m also going to take a step back and give some more context to some of the things we’re talking about.

Our learning outcomes are:

Compute and interpret hypothesis test

Understand how to read and interpret Python errors

Understand how the quantitative techniques we are learning in this class fit in a broader landscape of epistemologies

## 🧐 Content Overview¶

Element |
Length |
---|---|

5m6s | |

14m51s | |

12m24s | |

25m44s | |

7m28s | |

3m43s | |

5m10s |

This week has **1h14m** of video and **0 words** of assigned readings. This week’s videos are available in a Panopto folder and as a podcast.

## 📅 Deadlines¶

Week 5 Quiz is due on

**Thursday**at 8AM.Assignment 2 is due on

**Sunday, Sep. 26**at 11:59 PM.Midterm A is next week, on

**Tuesday, Sep. 28**.

## 📓 Assignment 1 Solution¶

I will post the Assignment 1 solution to Canvas (sorry, I’m not posting it to the entire Internet).

## 📃 Course Glossary¶

If you haven’t yet, I highly recommend consulting the course glossary. Please post on Piazza if you have suggested additions!

The midterm is also likely to be useful in studying for the exam.

## 📓 Writing Functions¶

I’ve used Python *functions* in a few of my example notebooks.
The function notebook talks more about them, how to write them, and how to use them.

## 🎥 Comparing Distributions¶

This video describes how to use Q-Q plots to compare data against a distribution.

## 🎥 Testing Hypotheses¶

### Resources¶

## 💥 Cartoon¶

Read XKCD #882: Significant.

This is called ** p-hacking**: running tests until we find one that is significant.

## 🎥 T-tests¶

This video discusses the *t*-test in more detail, and the different kinds of *t*-tests that we can run.
It also introduces degrees of freedom.

## 🎥 Epistemology¶

In this video, I talk about how the quantitative data science methods we are learning fit into a broader picture of source of knowledge.

## 🚩 Week 5 Quiz¶

The Week 5 quiz is about material **through this point**.
The subsequent videos are to help you better understand and contextualize material.

## 📓 One Sample Notebook¶

The One Sample notebook demonstrates how to compute a one-sample *t*-test, and draw a Q-Q plot to compare a distribution with normal.

### Resources¶

NIST Handbook on quantitative meaures (has info on 1-sample and 2-sample

*t*-tests)

## 🎥 Python Errors¶

This video discusses common Python errors and how to read errors.

## 🎥 Python Libraries¶

## 🎥 Learning More¶

In this video I talk about how I go about expanding my own data science knowledge and techniques, with the goal of giving you ideas for how you can continue learning beyond this class.

## ✅ Practice¶

There are a few things you can do to keep practicing the material:

The HETREC data contains two data sets besides the movie data: Delicious bookmarks and Last.FM listening records. Download this data set and apply some of our exploratory techniques to it.

Download the SBA data from Week 4’s activity and describe the distributions of more of the variables.

Apply the inference techniques from Week 4 to statistically test the differences you observed in Assignment 1.

## 📓 More Examples¶

Some more examples from my own work (these are *not* all cleaned up to our checklist standards):

Data summary from book gender paper - shows a number of descriptive things, including a stacked area chart; it also uses Plotnine.

Linkage statistics from book data - shows some matploblib things, and computing data linking statistics.

## 📓 Tutorials¶

The tutorial notebooks include many useful things, and have a couple of additions moved over from 📅 Week 4 — Inference (9/13–17).

## 📩 Assignment 2¶

Assignment 2 is due on **September 26**.