Week 14 — Workflow (11/29–12/3)¶
In this week, we are going to talk more about workflows. What does it look like to build a practical data science pipeline?
🧐 Content Overview¶
Quiz 14, December 2
Assignment 7, December 12
🎥 From Notebooks to Workflows¶
In this video, we introduce going beyond notebooks to broader structures for our Python projects.
🎥 Scripts and Modules¶
This video introduces Python scripts and modules, and how to organize Python code outside of a notebook.
🎥 Introducing Git¶
This video introduces version control with Git.
🎥 Git for Data Science¶
How do you use Git effectively in a data science project?
🎥 Extract, Transform, Load¶
The Extract, Transform, Load (ETL) pipeline is a common design pattern for data ingest. Sometimes it is adjusted to Extract, Load, Transform.
🎥 Split, Apply, Combine¶
We’ve seen group-by operations this semester; they’re a specific form of a general paradigm called split, apply, combine.
🎥 Tuning Hyperparameters¶
How can we move beyond
GridSearchCV in our quest to tune hyperparameters?
📓 Tuning Example¶
The Tuning Example notebook demonstrates hyperparameter tuning by cross-validation with multiple techniques.
🎥 Reproducible Pipelines¶
I provide very brief pointers to additional tools you may want for workflow management in more advanced projects.
🚩 Weekly Quiz 14¶
Take Quiz 14 in Canvas.