Skip to content

Week 3 — Presentation

These are the learning outcomes for this week:

  • Create plots for data
  • Identify the appropriate type of plot for data in question
  • Read and interpret a plot
  • Refine a plot to more clearly show data
  • Write a well-organized notebook to present data analysis with text and visuals

Here's the overview:

Videos also available as a Panopto playlist.

We will primarily be using Seaborn and Matplotlib for our graphics, because it is easy to get them fully working for both notebook and document-ready graphics in any Anaconda environment and efficiently handles very large data sets. There are several other packages that are useful for Python data visualization, and in some cases are easier to use. I personally use plotnine for most of my graphics, and plotly is a very capable package with particularly strong support for interactive graphics. The core graphics principles we study in this module will apply to most packages you may use in the future.

Presentation Goals and Audiences

Statistical Data Presentation

Read Statistical Data Presentation by Junyong In and Sangseok Lee.

Introducing Statistical Graphics

This video introduces basic principles of statistical graphics.

Manipulating Data

This video goes over the core Pandas data selection and manipulation operations. Arguably it should have been last week, but we'll do it this week!

The video is primarily a tour guide — the technical content is in the notebooks.

Resources

Types of Charts

In this video, I discuss several common types of charts for statistical graphics, and how to choose an appropriate one. It complements the “Statistical Data Presentation” reading.

Resources

Week 3 Quiz

Submit the Week 3 quiz by Monday night.

📓 Week 3 Sync Notebook

Here is my solution to the Week 3 activities.

Metrics and Differences

We talked about the notion of “relative” differences, but what are they?

Charts from the Ground Up

In this video, I discuss how to design a chart from your questions, goals, and data.

Resources

Plots in the Wild

Find a data presentation (plot, table, etc.) in a recent online publication, and create a post on Piazza (in the 'discuss' category) with a link, a copy of the image, and 3 observations about things it does well and/or needs to correct or improve. This can be from a journal paper, a newspaper article, a blog post, or another source the class can all access.

Don't spend more than 30 minutes on this assignment. Completing by the end of the weekend will count towards Participation points.

Finishing Touches

The Finishing Touches notebook describes how to apply some finishing touches to your plots and save them to files.

Organizing and Formatting Notebooks

How should you organize your notebook? What makes a good notebook? In this video we talk about that!

Resources

Notebook Formatting Checklist

The notebook checklist will help you make sure your notebooks are well-organized.

Textbook

This week primarily uses Chapter 9 of the textbook, with some material from chapters 8 and 10.

Futher Reading

For further study on these topics, see:

  • The Seaborn and Matplotlib galleries
  • The Visual Display of Quantitative Information by Edward R. Tufte
  • W. E. B. Du Bois's Data Portraits: Visualizing Black America, edited by Whitney Battle-Baptiste and Britt Rusert

Practice

Doing this work well takes a lot of practice. Create some notebooks and experiment with drawing interesting charts from some of the data sets we have been exploring, or new data you find! The HETREC data has a number of variables of different types that are useful for practicing manipulations and visualizations.

Assignment 1

Assignment 1 is due on Sunday, Sep. 13 at the end of the day (11:59 pm).

The tutorial notebooks are going to be very useful for this notebook.