This is a collection of notebooks with tips and consolidated references for the various Python and Pandas topics that we are discussing.
This semester, I am teaching with plotnine instead of Seaborn, but
Python and Data#
These notebooks are on basic Python data manipulation:
Building Data — building up arrays and data series
Movie Score Charting Examples — example charts used in several videos in 📅 Week 3 — Presentation (9/5–9)
Charts from the Ground Up — notebook for 🎥 Charts from the Ground Up.
Probability and Statistics#
Empirical Probabilities (demonstration of using boolean series to compute probabilities with empirical data)
Probability Distributions (from 📅 Week 4 — Inference (9/12–16))
Sampling Distributions (from 📅 Week 4 — Inference (9/12–16))
Magic Numbers (demonstration of where various “magic” numbers come from)
Regressions (goes with Week 8)
SciKit-Learn and ML Models#
SciKit-Learn Pipelines and Regularization — also includes a significance test for difference in classifier accuracy, and a decision tree
Linear Regression with SciKit-Learn — also uses a pipeline and applies standardization
Dummy-Coding and Feature Combination with SciKit-Learn Pipelines
Another advanced SciKit-Learn pipeline and logistic regression example (on Towards Data Science)
K-Means Example (uses the chi-papers data from Week 13)
Specific Data Set Examples#
These are more advanced examples of data manipulation and collection:
Sessionization (demonstrates some more advanced aggregation and time-based operations)
Spam Filter demonstrates building a spam filter
Using the Census describes how to access census data.
Fetching CHI Papers creates the
chi-papers.csvfile from Internet sources.
This example demonstrates a complete Git-based workflow: