Skip to content

Week 14 — Workflow

In this week, we are going to talk more about workflows. What does it look like to build a practical data science pipeline?

This week's videos are also available as a Panopto playlist.

From Notebooks to Workflows

In this video, we introduce going beyond notebooks to broader structures for our Python projects.

Scripts and Modules

This video introduces Python scripts and modules, and how to organize Python code outside of a notebook.

Resources

Introducing Git

This video introduces version control with Git.

Resources

Weekly Quiz 14

Take Quiz 14 in Blackboard.

Git for Data Science

How do you use Git effectively in a data science project?

Resources

Extract, Transform, Load

The Extract, Transform, Load (ETL) pipeline is a common design pattern for data ingest. Sometimes it is adjusted to Extract, Load, Transform.

Resources

Split, Apply, Combine

We've seen group-by operations this semester; they're a specific form of a general paradigm called split, apply, combine.

Resources

Tuning Hyperparameters

How can we move beyond GridSearchCV in our quest to tune hyperparameters?

Resources

Tuning Example

The Tuning Example notebook demonstrates hyperparameter tuning by cross-validation with multiple techniques.

Reproducible Pipelines

I provide very brief pointers to additional tools you may want for workflow management in more advanced projects.

Resources

Some software that supports data and/or workflow management:

More Examples

My book author gender project is an example of an advanced workflow with DVC.

Assignment 7

Assignment 7 is due December 13, 2020.