Week 5 — Filling In (9/20–24)

This week introduces one new statistical concept — the hypothesis test — and is otherwise about practice and solidifying concepts. I’m also going to take a step back and give some more context to some of the things we’re talking about.

Our learning outcomes are:

  • Compute and interpret hypothesis test

  • Avoid p-hacking and HARKing

  • Understand how to read and interpret Python errors

  • Understand how the quantitative techniques we are learning in this class fit in a broader landscape of epistemologies

🧐 Content Overview

Element Length

🎥 Comparing Distributions


🎥 Testing Hypotheses


🎥 T-tests


🎥 Epistemology


🎥 Python Errors


🎥 Python Libraries


🎥 Learning More


This week has 1h14m of video and 0 words of assigned readings. This week’s videos are available in a Panopto folder and as a podcast.

📅 Deadlines

  • Week 5 Quiz is due on Thursday at 8AM.

  • Assignment 2 is due on Sunday, Sep. 26 at 11:59 PM.

  • Midterm A is next week, on Tuesday, Sep. 28.

📓 Assignment 1 Solution

I will post the Assignment 1 solution to Canvas (sorry, I’m not posting it to the entire Internet).

📃 Course Glossary

If you haven’t yet, I highly recommend consulting the course glossary. Please post on Piazza if you have suggested additions!

The midterm is also likely to be useful in studying for the exam.

📓 Writing Functions

I’ve used Python functions in a few of my example notebooks. The function notebook talks more about them, how to write them, and how to use them.

🎥 Comparing Distributions

This video describes how to use Q-Q plots to compare data against a distribution.

🎥 Testing Hypotheses

💥 Cartoon

Read XKCD #882: Significant.

This is called p-hacking: running tests until we find one that is significant.

🎥 T-tests

This video discusses the t-test in more detail, and the different kinds of t-tests that we can run. It also introduces degrees of freedom.

🎥 Epistemology

In this video, I talk about how the quantitative data science methods we are learning fit into a broader picture of source of knowledge.

🚩 Week 5 Quiz

The Week 5 quiz is about material through this point. The subsequent videos are to help you better understand and contextualize material.

📓 One Sample Notebook

The One Sample notebook demonstrates how to compute a one-sample t-test, and draw a Q-Q plot to compare a distribution with normal.


🎥 Python Errors

This video discusses common Python errors and how to read errors.

🎥 Python Libraries

🎥 Learning More

In this video I talk about how I go about expanding my own data science knowledge and techniques, with the goal of giving you ideas for how you can continue learning beyond this class.

✅ Practice

There are a few things you can do to keep practicing the material:

  • The HETREC data contains two data sets besides the movie data: Delicious bookmarks and Last.FM listening records. Download this data set and apply some of our exploratory techniques to it.

  • Download the SBA data from Week 4’s activity and describe the distributions of more of the variables.

  • Apply the inference techniques from Week 4 to statistically test the differences you observed in Assignment 1.

📓 More Examples

Some more examples from my own work (these are not all cleaned up to our checklist standards):

📓 Tutorials

The tutorial notebooks include many useful things, and have a couple of additions moved over from 📅 Week 4 — Inference (9/13–17).

📩 Assignment 2

Assignment 2 is due on September 26.