Tutorials
Contents
Tutorials#
This is a collection of notebooks with tips and consolidated references for the various Python and Pandas topics that we are discussing.
Note
This semester, I am teaching with plotnine instead of Seaborn, but
Python and Data#
These notebooks are on basic Python data manipulation:
Building Data β building up arrays and data series
Visualization#
Movie Score Charting Examples β example charts used in several videos in π Β Week 3 β Presentation (9/5β9)
Charts from the Ground Up β notebook for π₯Β Charts from the Ground Up.
Probability and Statistics#
Penguin Inference (from π Β Week 4 β Inference (9/12β16))
Empirical Probabilities (demonstration of using boolean series to compute probabilities with empirical data)
Probability Distributions (from π Β Week 4 β Inference (9/12β16))
Sampling Distributions (from π Β Week 4 β Inference (9/12β16))
Magic Numbers (demonstration of where various βmagicβ numbers come from)
Regressions (goes with Week 8)
SciKit-Learn and ML Models#
SciKit-Learn Pipelines and Regularization β also includes a significance test for difference in classifier accuracy, and a decision tree
Linear Regression with SciKit-Learn β also uses a pipeline and applies standardization
Dummy-Coding and Feature Combination with SciKit-Learn Pipelines
Another advanced SciKit-Learn pipeline and logistic regression example (on Towards Data Science)
K-Means Example (uses the chi-papers data from Week 13)
Specific Data Set Examples#
These are more advanced examples of data manipulation and collection:
Sessionization (demonstrates some more advanced aggregation and time-based operations)
Spam Filter demonstrates building a spam filter
Using the Census describes how to access census data.
Fetching CHI Papers creates the
chi-papers.csv
file from Internet sources.
Workflow Example#
This example demonstrates a complete Git-based workflow: