# TutorialsÂ¶

This is a collection of notebooks with tips and consolidated references for the various Python and Pandas topics that we are discussing.

## Python and DataÂ¶

These notebooks are on basic Python data manipulation:

Building Data â€” building up arrays and data series

## VisualizationÂ¶

Movie Score Charting Examples â€” example charts used in several videos in ðŸ“…Â Week 3 â€” Presentation (9/6â€“10)

## Probability and StatisticsÂ¶

Penguin Inference (from ðŸ“…Â Week 4 â€” Inference (9/13â€“17))

Empirical Probabilities (demonstration of using boolean series to compute probabilities with empirical data)

Probability Distributions (from ðŸ“…Â Week 4 â€” Inference (9/13â€“17))

Sampling Distributions (from ðŸ“…Â Week 4 â€” Inference (9/13â€“17))

Regressions (goes with Week 8)

## SciKit-Learn and ML ModelsÂ¶

SciKit-Learn Pipelines and Regularization â€” also includes a significance test for difference in classifier accuracy, and a decision tree

Linear Regression with SciKit-Learn â€” also uses a pipeline and applies standardization

Dummy-Coding and Feature Combination with SciKit-Learn Pipelines

Another advanced SciKit-Learn pipeline and logistic regression example (on Towards Data Science)

K-Means Example (uses the chi-papers data from Week 13)

## Specific Data Set ExamplesÂ¶

These are more advanced examples of data manipulation and collection:

Sessionization (demonstrates some more advanced aggregation and time-based operations)

Spam Filter demonstrates building a spam filter

Using the Census describes how to access census data.

Fetching CHI Papers creates the

`chi-papers.csv`

file from Internet sources.

## Workflow ExampleÂ¶

This example demonstrates a complete Git-based workflow: