Week 14 — Workflow (11/29–12/3)

In this week, we are going to talk more about workflows. What does it look like to build a practical data science pipeline?

🧐 Content Overview

Element Length

🎥 From Notebooks to Workflows

3m44s

🎥 Scripts and Modules

15m33s

🎥 Introducing Git

12m2s

🎥 Git for Data Science

6m52s

🎥 ETL

6m46s

🎥 Split Apply Combine

6m45s

🎥 Tuning Hyperparameters

10m49s

🎥 Reproducible Pipelines

8m28s

📃 Software Environments

1068 words

📃 Yay Reproducibility

1250 words

This week has 1h11m of video and 2318 words of assigned readings. This week’s videos are available in a Panopto folder and as a podcast.

📅 Deadlines

  • Quiz 14, December 2

  • Assignment 7, December 12

🎥 From Notebooks to Workflows

In this video, we introduce going beyond notebooks to broader structures for our Python projects.

🎥 Scripts and Modules

This video introduces Python scripts and modules, and how to organize Python code outside of a notebook.

Resources

🎥 Introducing Git

This video introduces version control with Git.

🎥 Git for Data Science

How do you use Git effectively in a data science project?

Resources

🎥 Extract, Transform, Load

The Extract, Transform, Load (ETL) pipeline is a common design pattern for data ingest. Sometimes it is adjusted to Extract, Load, Transform.

🎥 Split, Apply, Combine

We’ve seen group-by operations this semester; they’re a specific form of a general paradigm called split, apply, combine.

🎥 Tuning Hyperparameters

How can we move beyond GridSearchCV in our quest to tune hyperparameters?

📓 Tuning Example

The Tuning Example notebook demonstrates hyperparameter tuning by cross-validation with multiple techniques.

🎥 Reproducible Pipelines

I provide very brief pointers to additional tools you may want for workflow management in more advanced projects.

Resources

Some software that supports data and/or workflow management:

📃 Software Environments

Read software environments.

📃 Reproducibility Case Study

Read my case study on reproducibility and bug-hunting.

🚩 Weekly Quiz 14

Take Quiz 14 in Canvas.

📓 More Examples

My book author gender project is an example of an advanced workflow with DVC.

📩 Assignment 7

Assignment 7 is due Sunday, December 12, 2021.