# Tutorials

## Contents

# Tutorials#

This is a collection of notebooks with tips and consolidated references for the various Python and Pandas topics that we are discussing.

Note

This semester, I am teaching with plotnine instead of Seaborn, but

## Python and Data#

These notebooks are on basic Python data manipulation:

Building Data β building up arrays and data series

## Visualization#

Movie Score Charting Examples β example charts used in several videos in π Β Week 3 β Presentation (9/5β9)

Charts from the Ground Up β notebook for π₯Β Charts from the Ground Up.

## Probability and Statistics#

Penguin Inference (from π Β Week 4 β Inference (9/12β16))

Empirical Probabilities (demonstration of using boolean series to compute probabilities with empirical data)

Probability Distributions (from π Β Week 4 β Inference (9/12β16))

Sampling Distributions (from π Β Week 4 β Inference (9/12β16))

Regressions (goes with Week 8)

## SciKit-Learn and ML Models#

SciKit-Learn Pipelines and Regularization β also includes a significance test for difference in classifier accuracy, and a decision tree

Linear Regression with SciKit-Learn β also uses a pipeline and applies standardization

Dummy-Coding and Feature Combination with SciKit-Learn Pipelines

Another advanced SciKit-Learn pipeline and logistic regression example (on Towards Data Science)

Movie Decomposition from

`week13:decomp`

PCA demo from

`week13:decomp`

K-Means Example (uses the chi-papers data from Week 13)

## Specific Data Set Examples#

These are more advanced examples of data manipulation and collection:

Sessionization (demonstrates some more advanced aggregation and time-based operations)

Spam Filter demonstrates building a spam filter

Using the Census describes how to access census data.

Fetching CHI Papers creates the

`chi-papers.csv`

file from Internet sources.

## Workflow Example#

This example demonstrates a complete Git-based workflow: