This is a collection of notebooks with tips and consolidated references for the various Python and Pandas topics that we are discussing.


This semester, I am teaching with plotnine instead of Seaborn, but

Python and Data#

These notebooks are on basic Python data manipulation:

  1. Fun with Numbers

  2. Types and Operations

  3. Writing and Using Functions

  4. Selecting Data

  5. Reshaping Data

  6. Building Data β€” building up arrays and data series

  7. Indexing

  8. Missing Data

  9. Tricks with Boolean Series


  1. Drawing Charts

  2. Movie Score Charting Examples β€” example charts used in several videos in πŸ“…Β Week 3 β€” Presentation (9/5–9)

  3. Charts from the Ground Up β€” notebook for πŸŽ₯Β Charts from the Ground Up.

  4. Chart Finishing Touches

Probability and Statistics#

  1. Penguin Inference (from πŸ“…Β Week 4 β€” Inference (9/12–16))

  2. Empirical Probabilities (demonstration of using boolean series to compute probabilities with empirical data)

  3. Probability Distributions (from πŸ“…Β Week 4 β€” Inference (9/12–16))

  4. Sampling Distributions (from πŸ“…Β Week 4 β€” Inference (9/12–16))

  5. Magic Numbers (demonstration of where various β€œmagic” numbers come from)

  6. One Sample T-test and Distribution Comparison

  7. Confidence

  8. Correlation

  9. Regressions (goes with Week 8)

  10. Random Numbers

  11. Logistic Regression

  12. Sampling and Testing the Penguins

  13. Linear Models with scipy minimize

  14. Overfitting Simulation example

  15. Random Sampling

SciKit-Learn and ML Models#

  1. SciKit-Learn Logistic Regression

  2. SciKit-Learn Pipelines and Regularization β€” also includes a significance test for difference in classifier accuracy, and a decision tree

  3. Linear Regression with SciKit-Learn β€” also uses a pipeline and applies standardization

  4. Advanced SciKit-Learn Pipeline

  5. Dummy-Coding and Feature Combination with SciKit-Learn Pipelines

  6. Another advanced SciKit-Learn pipeline and logistic regression example (on Towards Data Science)

  7. Movie Decomposition from πŸŽ₯Β Decomposing Matrices

  8. PCA demo from πŸŽ₯Β Decomposing Matrices

  9. K-Means Example (uses the chi-papers data from Week 13)

  10. Tuning Hyperparameters

Specific Data Set Examples#

These are more advanced examples of data manipulation and collection:

  1. MovieLens Time Series

  2. Sessionization (demonstrates some more advanced aggregation and time-based operations)

  3. Spam Filter demonstrates building a spam filter

  4. Using the Census describes how to access census data.

  5. Fetching CHI Papers creates the chi-papers.csv file from Internet sources.

Workflow Example#

This example demonstrates a complete Git-based workflow:

  1. Git repo & workflow example