Tutorials¶
This is a collection of notebooks with tips and consolidated references for the various Python and Pandas topics that we are discussing.
Python and Data¶
These notebooks are on basic Python data manipulation:
Building Data — building up arrays and data series
Visualization¶
Movie Score Charting Examples — example charts used in several videos in 📅 Week 3 — Presentation (9/6–10)
Probability and Statistics¶
Penguin Inference (from 📅 Week 4 — Inference (9/13–17))
Empirical Probabilities (demonstration of using boolean series to compute probabilities with empirical data)
Probability Distributions (from 📅 Week 4 — Inference (9/13–17))
Sampling Distributions (from 📅 Week 4 — Inference (9/13–17))
Regressions (goes with Week 8)
SciKit-Learn and ML Models¶
SciKit-Learn Pipelines and Regularization — also includes a significance test for difference in classifier accuracy, and a decision tree
Linear Regression with SciKit-Learn — also uses a pipeline and applies standardization
Dummy-Coding and Feature Combination with SciKit-Learn Pipelines
Another advanced SciKit-Learn pipeline and logistic regression example (on Towards Data Science)
K-Means Example (uses the chi-papers data from Week 13)
Specific Data Set Examples¶
These are more advanced examples of data manipulation and collection:
Sessionization (demonstrates some more advanced aggregation and time-based operations)
Spam Filter demonstrates building a spam filter
Using the Census describes how to access census data.
Fetching CHI Papers creates the
chi-papers.csv
file from Internet sources.
Workflow Example¶
This example demonstrates a complete Git-based workflow: