# Week 5 — Filling In (9/20–24)¶

This week introduces one new statistical concept — the hypothesis test — and is otherwise about practice and solidifying concepts. I’m also going to take a step back and give some more context to some of the things we’re talking about.

Our learning outcomes are:

• Compute and interpret hypothesis test

• Avoid p-hacking and HARKing

• Understand how to read and interpret Python errors

• Understand how the quantitative techniques we are learning in this class fit in a broader landscape of epistemologies

## 🧐 Content Overview¶

Element Length

🎥 Comparing Distributions

5m6s

🎥 Testing Hypotheses

14m51s

🎥 T-tests

12m24s

🎥 Epistemology

25m44s

🎥 Python Errors

7m28s

🎥 Python Libraries

3m43s

🎥 Learning More

5m10s

This week has 1h14m of video and 0 words of assigned readings. This week’s videos are available in a Panopto folder and as a podcast.

• Week 5 Quiz is due on Thursday at 8AM.

• Assignment 2 is due on Sunday, Sep. 26 at 11:59 PM.

• Midterm A is next week, on Tuesday, Sep. 28.

## 📓 Assignment 1 Solution¶

I will post the Assignment 1 solution to Canvas (sorry, I’m not posting it to the entire Internet).

## 📃 Course Glossary¶

If you haven’t yet, I highly recommend consulting the course glossary. Please post on Piazza if you have suggested additions!

The midterm is also likely to be useful in studying for the exam.

## 📓 Writing Functions¶

I’ve used Python functions in a few of my example notebooks. The function notebook talks more about them, how to write them, and how to use them.

## 🎥 Comparing Distributions¶

This video describes how to use Q-Q plots to compare data against a distribution.

## 💥 Cartoon¶

This is called p-hacking: running tests until we find one that is significant.

## 🎥 T-tests¶

This video discusses the t-test in more detail, and the different kinds of t-tests that we can run. It also introduces degrees of freedom.

## 🎥 Epistemology¶

In this video, I talk about how the quantitative data science methods we are learning fit into a broader picture of source of knowledge.

## 🚩 Week 5 Quiz¶

The Week 5 quiz is about material through this point. The subsequent videos are to help you better understand and contextualize material.

## 📓 One Sample Notebook¶

The One Sample notebook demonstrates how to compute a one-sample t-test, and draw a Q-Q plot to compare a distribution with normal.

## 🎥 Python Errors¶

This video discusses common Python errors and how to read errors.

## 🎥 Learning More¶

In this video I talk about how I go about expanding my own data science knowledge and techniques, with the goal of giving you ideas for how you can continue learning beyond this class.

## ✅ Practice¶

There are a few things you can do to keep practicing the material:

• The HETREC data contains two data sets besides the movie data: Delicious bookmarks and Last.FM listening records. Download this data set and apply some of our exploratory techniques to it.

• Download the SBA data from Week 4’s activity and describe the distributions of more of the variables.

• Apply the inference techniques from Week 4 to statistically test the differences you observed in Assignment 1.

## 📓 More Examples¶

Some more examples from my own work (these are not all cleaned up to our checklist standards):

## 📓 Tutorials¶

The tutorial notebooks include many useful things, and have a couple of additions moved over from 📅 Week 4 — Inference (9/13–17).

## 📩 Assignment 2¶

Assignment 2 is due on September 26.