Contents

Tutorials

Contents

Tutorials#

This is a collection of notebooks with tips and consolidated references for the various Python and Pandas topics that we are discussing.

Note

This semester, I am teaching with plotnine instead of Seaborn, but

Python and Data#

These notebooks are on basic Python data manipulation:

Fun with Numbers
Types and Operations
Writing and Using Functions
Selecting Data
Reshaping Data
Building Data — building up arrays and data series
Indexing
Missing Data
Tricks with Boolean Series

Visualization#

Drawing Charts
Movie Score Charting Examples — example charts used in several videos in 📅 Week 3 — Presentation (9/5–9)
Charts from the Ground Up — notebook for 🎥 Charts from the Ground Up.
Chart Finishing Touches

Probability and Statistics#

Penguin Inference (from 📅 Week 4 — Inference (9/12–16))
Empirical Probabilities (demonstration of using boolean series to compute probabilities with empirical data)
Probability Distributions (from 📅 Week 4 — Inference (9/12–16))
Sampling Distributions (from 📅 Week 4 — Inference (9/12–16))
Magic Numbers (demonstration of where various “magic” numbers come from)
One Sample T-test and Distribution Comparison
Confidence
Correlation
Regressions (goes with Week 8)
Random Numbers
Logistic Regression
Sampling and Testing the Penguins
Linear Models with scipy minimize
Overfitting Simulation example
Random Sampling

SciKit-Learn and ML Models#

SciKit-Learn Logistic Regression
SciKit-Learn Pipelines and Regularization — also includes a significance test for difference in classifier accuracy, and a decision tree
Linear Regression with SciKit-Learn — also uses a pipeline and applies standardization
Advanced SciKit-Learn Pipeline
Dummy-Coding and Feature Combination with SciKit-Learn Pipelines
Another advanced SciKit-Learn pipeline and logistic regression example (on Towards Data Science)
Movie Decomposition from 🎥 Decomposing Matrices
PCA demo from 🎥 Decomposing Matrices
K-Means Example (uses the chi-papers data from Week 13)
Tuning Hyperparameters

Specific Data Set Examples#

These are more advanced examples of data manipulation and collection:

MovieLens Time Series
Sessionization (demonstrates some more advanced aggregation and time-based operations)
Spam Filter demonstrates building a spam filter
Using the Census describes how to access census data.
Fetching CHI Papers creates the chi-papers.csv file from Internet sources.

Workflow Example#

This example demonstrates a complete Git-based workflow:

Git repo & workflow example