Week 5 — Filling In (9/19–23)#

This week introduces one new statistical concept — the hypothesis test — and is otherwise about practice and solidifying concepts. I’m also going to take a step back and give some more context to some of the things we’re talking about.

Our learning outcomes are:

  • Compute and interpret hypothesis test

  • Avoid p-hacking and HARKing

  • Understand how to read and interpret Python errors

  • Understand how the quantitative techniques we are learning in this class fit in a broader landscape of epistemologies

🧐 Content Overview#

Element Length

🎥 Comparing Distributions

5m6s

🎥 Testing Hypotheses

14m51s

🎥 T-tests

12m24s

📃 Cookbook 1

2000 words

🎥 Epistemology

25m44s

📃 One Sample T-test and Q-Q Plot

653 words

🎥 Python Errors

7m28s

🎥 Python Libraries

3m43s

🎥 Learning More

5m10s

This week has 1h14m of video and 2653 words of assigned readings. This week’s videos are available in a Panopto folder.

📅 Deadlines#

  • Week 5 Quiz is due on Thursday at 8AM.

  • Assignment 2 is due on Sunday, September 25, 2022 at 11:59 PM.

  • Midterm A is next week, on September 28.

📓 Assignment 1 Solution#

The Assignment 1 solution is on Piazza.

📃 Course Glossary#

If you haven’t yet, I highly recommend consulting the course glossary. Please post on Piazza if you have suggested additions!

The glossary is also likely to be useful in studying for the exam next week.

📓 Writing Functions#

I’ve used Python functions in a few of my example notebooks. The function notebook talks more about them, how to write them, and how to use them.

🎥 Comparing Distributions#

This video describes how to use Q-Q plots to compare data against a distribution.

Resources#

🎥 Testing Hypotheses#

Resources#

💥 Cartoon#

Read XKCD #882: Significant.

This is called p-hacking: running tests until we find one that is significant.

🎥 T-tests#

This video discusses the t-test in more detail, and the different kinds of t-tests that we can run. It also introduces degrees of freedom.

📓 Tying It Together#

I will be adding a notebook reading here to tie together some Week 4 and 5 material.

🎥 Epistemology#

In this video, I talk about how the quantitative data science methods we are learning fit into a broader picture of source of knowledge.

🚩 Week 5 Quiz#

The Week 5 quiz is about material through this point. The subsequent videos are to help you better understand and contextualize material.

📓 One Sample Notebook#

The One Sample notebook demonstrates how to compute a one-sample t-test, and draw a Q-Q plot to compare a distribution with normal.

Resources#

🎥 Python Errors#

This video discusses common Python errors and how to read errors.

🎥 Python Libraries#

🎥 Learning More#

In this video I talk about how I go about expanding my own data science knowledge and techniques, with the goal of giving you ideas for how you can continue learning beyond this class.

✅ Practice#

There are a few things you can do to keep practicing the material:

  • The HETREC data contains two data sets besides the movie data: Delicious bookmarks and Last.FM listening records. Download this data set and apply some of our exploratory techniques to it.

  • Download the SBA data from Week 4’s activity and describe the distributions of more of the variables.

  • Apply the inference techniques from Week 4 to statistically test the differences you observed in Assignment 1.

📓 More Examples#

Some more examples from my own work (these are not all cleaned up to our checklist standards):

📓 Tutorials#

The tutorial notebooks include many useful things, and have a couple of additions moved over from 📅 Week 4 — Inference (9/12–16).

📩 Assignment 2#

Assignment 2 is due on Sunday, September 25, 2022.