Week 3 — Presentation
These are the learning outcomes for this week:
- Create plots for data
- Identify the appropriate type of plot for data in question
- Read and interpret a plot
- Refine a plot to more clearly show data
- Write a well-organized notebook to present data analysis with text and visuals
Here's the overview:
- Presentation Goals and Audiences
- Statistical Data Presentation
- Introducing Statistical Graphics
- Manipulating Data
- Types of Charts
- Week 3 Quiz
- Week 3 Sync Notebook
- Metrics and Differences
- Charts from the Ground Up
- Plots in the Wild
- Finishing Touches
- Organizing and Formatting Notebooks
- Notebook Formatting Checklist
- Textbook
- Futher Reading
- Practice
- Assignment 1
Videos also available as a Panopto playlist.
We will primarily be using Seaborn and Matplotlib for our graphics, because it is easy to get them fully working for both notebook and document-ready graphics in any Anaconda environment and efficiently handles very large data sets. There are several other packages that are useful for Python data visualization, and in some cases are easier to use. I personally use plotnine for most of my graphics, and plotly is a very capable package with particularly strong support for interactive graphics. The core graphics principles we study in this module will apply to most packages you may use in the future.
Presentation Goals and Audiences
Statistical Data Presentation
Read Statistical Data Presentation by Junyong In and Sangseok Lee.
Introducing Statistical Graphics
This video introduces basic principles of statistical graphics.
Manipulating Data
This video goes over the core Pandas data selection and manipulation operations. Arguably it should have been last week, but we'll do it this week!
The video is primarily a tour guide — the technical content is in the notebooks.
Resources
Types of Charts
In this video, I discuss several common types of charts for statistical graphics, and how to choose an appropriate one. It complements the “Statistical Data Presentation” reading.
Resources
- Notebook
- Seaborn gallery
- Seaborn tutorial — organized topically, very good resource
- Matplotlib gallery
- Plotnine gallery
Week 3 Quiz
Submit the Week 3 quiz by Monday night.
Week 3 Sync Notebook
Here is my solution to the Week 3 activities.
Metrics and Differences
We talked about the notion of “relative” differences, but what are they?
Charts from the Ground Up
In this video, I discuss how to design a chart from your questions, goals, and data.
Resources
Plots in the Wild
Find a data presentation (plot, table, etc.) in a recent online publication, and create a post on Piazza (in the 'discuss' category) with a link, a copy of the image, and 3 observations about things it does well and/or needs to correct or improve. This can be from a journal paper, a newspaper article, a blog post, or another source the class can all access.
Don't spend more than 30 minutes on this assignment. Completing by the end of the weekend will count towards Participation points.
Finishing Touches
The Finishing Touches notebook describes how to apply some finishing touches to your plots and save them to files.
Organizing and Formatting Notebooks
How should you organize your notebook? What makes a good notebook? In this video we talk about that!
Resources
- GitHub Markdown guide — most of this syntax works in Jupyter as well
- Jupyter's Markdown docs
Notebook Formatting Checklist
The notebook checklist will help you make sure your notebooks are well-organized.
Textbook
This week primarily uses Chapter 9 of the textbook, with some material from chapters 8 and 10.
Futher Reading
For further study on these topics, see:
- The Seaborn and Matplotlib galleries
- The Visual Display of Quantitative Information by Edward R. Tufte
- W. E. B. Du Bois's Data Portraits: Visualizing Black America, edited by Whitney Battle-Baptiste and Britt Rusert
Practice
Doing this work well takes a lot of practice. Create some notebooks and experiment with drawing interesting charts from some of the data sets we have been exploring, or new data you find! The HETREC data has a number of variables of different types that are useful for practicing manipulations and visualizations.
Assignment 1
Assignment 1 is due on Sunday, Sep. 13 at the end of the day (11:59 pm).
The tutorial notebooks are going to be very useful for this notebook.