Week 5 β€” Filling In (9/19–23)#

This week introduces one new statistical concept β€” the hypothesis test β€” and is otherwise about practice and solidifying concepts. I’m also going to take a step back and give some more context to some of the things we’re talking about.

Our learning outcomes are:

  • Compute and interpret hypothesis test

  • Avoid p-hacking and HARKing

  • Understand how to read and interpret Python errors

  • Understand how the quantitative techniques we are learning in this class fit in a broader landscape of epistemologies

🧐 Content Overview#

Element Length

πŸŽ₯Β Comparing Distributions


πŸŽ₯Β Testing Hypotheses


πŸŽ₯Β T-tests


πŸ“ƒΒ Cookbook 1

2000 words

πŸŽ₯Β Epistemology


πŸ“ƒΒ One Sample T-test and Q-Q Plot

653 words

πŸŽ₯Β Python Errors


πŸŽ₯Β Python Libraries


πŸŽ₯Β Learning More


This week has 1h14m of video and 2653 words of assigned readings. This week’s videos are available in a Panopto folder.

πŸ“… Deadlines#

  • Week 5 Quiz is due on Thursday at 8AM.

  • Assignment 2 is due on Sunday, September 25, 2022 at 11:59 PM.

  • Midterm A is next week, on September 28.

πŸ““ Assignment 1 Solution#

The Assignment 1 solution is on Piazza.

πŸ“ƒ Course Glossary#

If you haven’t yet, I highly recommend consulting the course glossary. Please post on Piazza if you have suggested additions!

The glossary is also likely to be useful in studying for the exam next week.

πŸ““ Writing Functions#

I’ve used Python functions in a few of my example notebooks. The function notebook talks more about them, how to write them, and how to use them.

πŸŽ₯ Comparing Distributions#

This video describes how to use Q-Q plots to compare data against a distribution.


πŸŽ₯ Testing Hypotheses#


πŸ’₯ Cartoon#

Read XKCD #882: Significant.

This is called p-hacking: running tests until we find one that is significant.

πŸŽ₯ T-tests#

This video discusses the t-test in more detail, and the different kinds of t-tests that we can run. It also introduces degrees of freedom.

πŸ““ Tying It Together#

I will be adding a notebook reading here to tie together some Week 4 and 5 material.

πŸŽ₯ Epistemology#

In this video, I talk about how the quantitative data science methods we are learning fit into a broader picture of source of knowledge.

🚩 Week 5 Quiz#

The Week 5 quiz is about material through this point. The subsequent videos are to help you better understand and contextualize material.

πŸ““ One Sample Notebook#

The One Sample notebook demonstrates how to compute a one-sample t-test, and draw a Q-Q plot to compare a distribution with normal.


πŸŽ₯ Python Errors#

This video discusses common Python errors and how to read errors.

πŸŽ₯ Python Libraries#

πŸŽ₯ Learning More#

In this video I talk about how I go about expanding my own data science knowledge and techniques, with the goal of giving you ideas for how you can continue learning beyond this class.

βœ… Practice#

There are a few things you can do to keep practicing the material:

  • The HETREC data contains two data sets besides the movie data: Delicious bookmarks and Last.FM listening records. Download this data set and apply some of our exploratory techniques to it.

  • Download the SBA data from Week 4’s activity and describe the distributions of more of the variables.

  • Apply the inference techniques from Week 4 to statistically test the differences you observed in Assignment 1.

πŸ““ More Examples#

Some more examples from my own work (these are not all cleaned up to our checklist standards):

πŸ““ Tutorials#

The tutorial notebooks include many useful things, and have a couple of additions moved over from πŸ“…Β Week 4 β€” Inference (9/12–16).

πŸ“© Assignment 2#

Assignment 2 is due on Sunday, September 25, 2022.