Week 3 — Presentation (9/5–9)#

These are the learning outcomes for this week:

  • Create plots for data

  • Identify the appropriate type of plot for data in question

  • Read and interpret a plot

  • Refine a plot to more clearly show data

  • Write a well-organized notebook to present data analysis with text and visuals

We will primarily be using Seaborn and Matplotlib for our graphics, because it is easy to get them fully working for both notebook and document-ready graphics in any Anaconda environment and efficiently handles very large data sets. There are several other packages that are useful for Python data visualization, and in some cases are easier to use. I personally use plotnine for most of my graphics, and plotly is a very capable package with particularly strong support for interactive graphics. The core graphics principles we study in this module will apply to most packages you may use in the future.

Tip

I do not recommend that you use Plotly for this course. While it is very good for interactive graphics, its support for static graphics to render in printable documents is rather new.

Seaborn upgrades

Seaborn is undergoing some changes in its syntax. In the old syntax, we pass the x and y parameters as positional paremeters to a plotting function:

sns.lineplot('time', 'price', data=stocks)

In the new syntax, which will be required in a future Seaborn release, we use named parameters for everything:

sns.lineplot(data=stocks, x='time', y='price')

All new material going forward will use the new syntax, but it takes time to update all of the slides and videos. You may see the old syntax. It still works, but it issues a warning to let you know the future syntax is changing.

🧐 Content Overview#

Element Length

🎥 Goals and Audiences

9m55s

📃 Statistical Data Presentation

4300 words

🎥 Statistical Graphics

14m15s

🎥 Manipulating Data

9m18s

📃 Selecting Data

1866 words

📃 Reshaping Data

2363 words

📃 Missing Data

3850 words

🎥 Types of Charts

13m30s

🎥 Metrics and Differences

6m50s

🎥 Charts from the Ground Up

22m14s

🎥 Organizing Notebooks

16m50s

This week has 1h33m of video and 12379 words of assigned readings. This week’s videos are available in a Panopto folder.

📅 Deadlines#

  • Finding a plot before class on Thursday

  • Week 3 quiz at 8am on Thursday

  • Assignment 1 at midnight on Sunday

🎥 Presentation Goals and Audiences#

📓 Data and Notebook#

These resources are used throughout many of the videos in this class:

📃 Statistical Data Presentation#

Read Statistical Data Presentation by Junyong In and Sangseok Lee.

🎥 Introducing Statistical Graphics#

This video introduces basic principles of statistical graphics.

🎥 Manipulating Data#

This video goes over the core Pandas data selection and manipulation operations. It is primarily a tour guide — the technical content is in following notebooks.

📓 Selecting Data#

Read the 📓 Selecting Data tutorial notebook to learn how to select data from a data frame.

I encourage you to read relevant tutorial notebooks throughout the semester, and link to them when appropriate; I am making three ones this week specifically assigned readings.

📓 Reshaping Data#

Read the 📓 Reshaping Data tutorial notebook to learn how to manipulate the shape of data frames in various ways, including merging two data frames into one.

📓 Missing Data#

Read the 📓 Missing Data tutorial notebook.

🎥 Types of Charts#

In this video, I discuss several common types of charts for statistical graphics, and how to choose an appropriate one. It complements the “Statistical Data Presentation” reading.

Resources#

🎥 Metrics and Differences#

We talked about the notion of “relative” differences, but what are they?

🎥 Charts from the Ground Up#

In this video, I discuss how to design a chart from your questions, goals, and data.

Resources#

✅ Plots in the Wild#

In preparation for Thursday’s class, find a data presentation (plot, table, etc.) in a recent online publication, and share it with your team through a post on Piazza (in the ‘discuss’ category) with a link, a copy of the image. This can be from a journal paper, a newspaper article, a blog post, or another source the class can all access.

In class we will discuss these plots!

Tip

Don’t spend more than 30 minutes on this assignment.

📓 Finishing Touches#

The Finishing Touches notebook describes how to apply some finishing touches to your plots and save them to files.

🎥 Organizing and Formatting Notebooks#

How should you organize your notebook? What makes a good notebook? In this video we talk about that!

Resources#

📃 Notebook Formatting Checklist#

The notebook checklist will help you make sure your notebooks are well-organized.

🚩 Week 3 Quiz#

The Week 3 quiz will be over all of the assigned material for this week, and is in Canvas.

The sections below this are for your further study and practice.

📖 Textbook#

This week primarily uses Chapter 9 of 📖 Python for Data Analysis, with some material from chapters 8 and 10.

📚 Futher Reading#

For further study on these topics, see:

  • The Seaborn and Matplotlib galleries

  • The Visual Display of Quantitative Information by Edward R. Tufte

  • W. E. B. Du Bois’s Data Portraits: Visualizing Black America, edited by Whitney Battle-Baptiste and Britt Rusert

✅ Practice#

Doing this work well takes a lot of practice. Create some notebooks and experiment with drawing interesting charts from some of the data sets we have been exploring, or new data you find! The HETREC data has a number of variables of different types that are useful for practicing manipulations and visualizations.

📩 Assignment 1#

Assignment 1 is due on Sunday, Sep. 12 at the end of the day (11:59 pm).

The tutorial notebooks are going to be very useful for this assignment.