Week 14 — Workflow (11/28–12/2)#

In this week, we are going to talk more about workflows. What does it look like to build a practical data science pipeline?

🧐 Content Overview#

Element Length

🎥 From Notebooks to Workflows


🎥 Scripts and Modules


🎥 Introducing Git


🎥 Git for Data Science




🎥 Split Apply Combine


🎥 Tuning Hyperparameters


🎥 Reproducible Pipelines


📃 Software Environments

1068 words

📃 Yay Reproducibility

1250 words

This week has 1h11m of video and 2318 words of assigned readings. This week’s videos are available in a Panopto folder.

📅 Deadlines#

  • Quiz 14, December 1

  • Assignment 7, December 11

🎥 From Notebooks to Workflows#

In this video, we introduce going beyond notebooks to broader structures for our Python projects.

🎥 Scripts and Modules#

This video introduces Python scripts and modules, and how to organize Python code outside of a notebook.


🎥 Introducing Git#

This video introduces version control with Git.


🎥 Git for Data Science#

How do you use Git effectively in a data science project?


🎥 Extract, Transform, Load#

The Extract, Transform, Load (ETL) pipeline is a common design pattern for data ingest. Sometimes it is adjusted to Extract, Load, Transform.


🎥 Split, Apply, Combine#

We’ve seen group-by operations this semester; they’re a specific form of a general paradigm called split, apply, combine.


🎥 Tuning Hyperparameters#

How can we move beyond GridSearchCV in our quest to tune hyperparameters?


There is an error on slide 9. Where it says “≤ 0.5” it should say “≤ 0.05”.


📓 Tuning Example#

The Tuning Example notebook demonstrates hyperparameter tuning by cross-validation with multiple techniques.

🎥 Reproducible Pipelines#

I provide very brief pointers to additional tools you may want for workflow management in more advanced projects.


Some software that supports data and/or workflow management:

📃 Software Environments#

Read software environments.

📃 Reproducibility Case Study#

Read my case study on reproducibility and bug-hunting.

📓 Example Script and Notebook#

You can find an example, with walkthrough of how to run it with the command line on GitHub CodeSpaces, in this example repo.

🚩 Weekly Quiz 14#

Take Quiz 14 in Canvas.

📓 More Examples#

My book author gender project is an example of an advanced workflow with DVC.

📩 Assignment 7#

Assignment 7 is due Sunday, December 11, 2022.