Week 3 — Presentation (9/6–10)

These are the learning outcomes for this week:

  • Create plots for data

  • Identify the appropriate type of plot for data in question

  • Read and interpret a plot

  • Refine a plot to more clearly show data

  • Write a well-organized notebook to present data analysis with text and visuals

We will primarily be using Seaborn and Matplotlib for our graphics, because it is easy to get them fully working for both notebook and document-ready graphics in any Anaconda environment and efficiently handles very large data sets. There are several other packages that are useful for Python data visualization, and in some cases are easier to use. I personally use plotnine for most of my graphics, and plotly is a very capable package with particularly strong support for interactive graphics. The core graphics principles we study in this module will apply to most packages you may use in the future.

Tip

I do not recommend that you use Plotly for this course. While it is very good for interactive graphics, its support for static graphics to render in printable documents is rather new.

Seaborn upgrades

Seaborn is undergoing some changes in its syntax. In the old syntax, we pass the x and y parameters as positional paremeters to a plotting function:

sns.lineplot('time', 'price', data=stocks)

In the new syntax, which will be required in a future Seaborn release, we use named parameters for everything:

sns.lineplot(data=stocks, x='time', y='price')

All new material going forward will use the new syntax, but it takes time to update all of the slides and videos. You may see the old syntax. It still works, but it issues a warning to let you know the future syntax is changing.

🧐 Content Overview

Element Length

🎥 Goals and Audiences

9m55s

📃 Statistical Data Presentation

4300 words

🎥 Statistical Graphics

14m15s

🎥 Manipulating Data

9m18s

📃 Missing Data

3850 words

🎥 Types of Charts

13m30s

🎥 Metrics and Differences

6m50s

🎥 Charts from the Ground Up

22m14s

🎥 Organizing Notebooks

16m50s

This week has 1h33m of video and 8150 words of assigned readings. This week’s videos are available in a Panopto folder and as a podcast.

📅 Deadlines

  • Finding a plot before class on Thursday

  • Week 3 quiz at 8am on Thursday

  • Assignment 1 at midnight on Sunday

🎥 Presentation Goals and Audiences

📓 Data and Notebook

These resources are used throughout many of the videos in this class:

📃 Statistical Data Presentation

Read Statistical Data Presentation by Junyong In and Sangseok Lee.

🎥 Introducing Statistical Graphics

This video introduces basic principles of statistical graphics.

🎥 Manipulating Data

This video goes over the core Pandas data selection and manipulation operations. It is primarily a tour guide — the technical content is in the notebooks.

📓 Missing Data

Read the 📓 Missing Data tutorial notebook. I encourage you to read relevant tutorial notebooks throughout the semester, and link to them when appropriate; I am making this one specifically an assigned reading.

🎥 Types of Charts

In this video, I discuss several common types of charts for statistical graphics, and how to choose an appropriate one. It complements the “Statistical Data Presentation” reading.

Resources

🎥 Metrics and Differences

We talked about the notion of “relative” differences, but what are they?

🎥 Charts from the Ground Up

In this video, I discuss how to design a chart from your questions, goals, and data.

Resources

✅ Plots in the Wild

In preparation for Thursday’s class, find a data presentation (plot, table, etc.) in a recent online publication, and share it with your team through a post on Piazza (in the ‘discuss’ category) with a link, a copy of the image. This can be from a journal paper, a newspaper article, a blog post, or another source the class can all access.

In class we will discuss these plots!

Tip

Don’t spend more than 30 minutes on this assignment.

📓 Finishing Touches

The Finishing Touches notebook describes how to apply some finishing touches to your plots and save them to files.

🎥 Organizing and Formatting Notebooks

How should you organize your notebook? What makes a good notebook? In this video we talk about that!

Resources

📃 Notebook Formatting Checklist

The notebook checklist will help you make sure your notebooks are well-organized.

🚩 Week 3 Quiz

The Week 3 quiz will be over all of the assigned material for this week, and is in Canvas.

The sections below this are for your further study and practice.

📖 Textbook

This week primarily uses Chapter 9 of 📖 Python for Data Analysis, with some material from chapters 8 and 10.

📚 Futher Reading

For further study on these topics, see:

  • The Seaborn and Matplotlib galleries

  • The Visual Display of Quantitative Information by Edward R. Tufte

  • W. E. B. Du Bois’s Data Portraits: Visualizing Black America, edited by Whitney Battle-Baptiste and Britt Rusert

✅ Practice

Doing this work well takes a lot of practice. Create some notebooks and experiment with drawing interesting charts from some of the data sets we have been exploring, or new data you find! The HETREC data has a number of variables of different types that are useful for practicing manipulations and visualizations.

📩 Assignment 1

Assignment 1 is due on Sunday, Sep. 12 at the end of the day (11:59 pm).

The tutorial notebooks are going to be very useful for this assignment.