Skip to content

Week 14 — Workflow

In this week, we are going to talk more about workflows. What does it look like to build a practical data science pipeline?

This week's videos are also available as a Panopto playlist.

From Notebooks to Workflows

In this video, we introduce going beyond notebooks to broader structures for our Python projects.

Scripts and Modules

This video introduces Python scripts and modules, and how to organize Python code outside of a notebook.


Introducing Git

This video introduces version control with Git.


Weekly Quiz 14

Take Quiz 14 in Blackboard.

Git for Data Science

How do you use Git effectively in a data science project?


Extract, Transform, Load

The Extract, Transform, Load (ETL) pipeline is a common design pattern for data ingest. Sometimes it is adjusted to Extract, Load, Transform.


Split, Apply, Combine

We've seen group-by operations this semester; they're a specific form of a general paradigm called split, apply, combine.


Tuning Hyperparameters

How can we move beyond GridSearchCV in our quest to tune hyperparameters?


Tuning Example

The Tuning Example notebook demonstrates hyperparameter tuning by cross-validation with multiple techniques.

Reproducible Pipelines

I provide very brief pointers to additional tools you may want for workflow management in more advanced projects.


Some software that supports data and/or workflow management:

More Examples

My book author gender project is an example of an advanced workflow with DVC.

Assignment 7

Assignment 7 is due December 13, 2020.