Week 5 — Filling In
This week is primarily for practice and solidifying concepts. I'm also going to take a step back and give some more context to some of the things we're talking about.
This week's videos are also in a Panopto playlist.
Week 5 Quiz
The Week 5 quiz is about material through the end of Week 4. Nothing from this week will be on it, except that things here may clarify some of last week's material for you.
Assignment 1 Solution
I will post the Assignment 1 solution to Piazza (sorry, I'm not posting it to the entire Internet).
I have extensively updated the course glossary. Please post on Piazza if you have suggested additions!
I've used Python functions in a few of my example notebooks. The function notebook talks more about them, how to write them, and how to use them.
This video describes how to use Q-Q plots to compare data against a distribution.
This video discusses the t-test in more detail, and the different kinds of t-tests that we can run. It also introduces degrees of freedom.
This video discusses common Python errors and how to read errors.
- NIST Handbook on quantitative meaures (has info on 1-sample and 2-sample t-tests)
One Sample Notebook
The One Sample notebook demonstrates how to compute a one-sample t-test, and draw a Q-Q plot to compare a distribution with normal.
In this video, I talk about how the quantitative data science methods we are learning fit into a broader picture of source of knowledge.
There are a few things you can do to keep practing the material:
- The HETREC data contains two data sets besides the movie data: Delicious bookmarks and Last.FM listening records. Download this data set and apply some of our exploratory techniques to it.
- Download the SBA data from Week 4's activity and describe the distributions of more of the variables.
- Apply the inference techniques from Week 4 to statistically test the differences you observed in Assignment 1.
Some more examples from my own work (these are not all cleaned up to our checklist standards):
- Data summary from book gender paper - shows a number of descriptive things, including a stacked area chart.
- Linkage statistics from book data - shows some matploblib things, and computing data linking statistics.
The indexing notebook is now up!
Assignment 2 is due on September 27.