Week 7 — Getting Data (Oct. 4–8)

This week has the following learning outcomes:

  • Locate data sources

  • Integrate data from multiple sources

  • Reason about bias and social effects of data

🧐 Content Overview

Element Length

🎥 Data

8m22s

🎥 Finding Data

7m25s

🎥 Data Formats

13m55s

🎥 Integrating Data

11m40s

🎥 Values and Types

8m15s

📃 Working with Text Data

4200 words

🎥 Ethics

14m10s

📃 The Belmont Report

5500 words

📃 ACM Code of Ethics and Professional Responsibility

3500 words

🎥 Real Example

48m57s

🎥 Workflow Advice

5m24s

This week has 1h58m of video and 13200 words of assigned readings. This week’s videos are available in a Panopto folder and as a podcast.

📅 Deadlines

  • Week 7 Quiz Thursday Oct. 7 at 8AM

  • Assignment 3 Sunday Oct. 10 at 11:59 PM

🎥 Introduction

What are we talking about this week? I also discuss general principles that will drive the week’s material.

🎥 Finding Data

Where do we go to find data?

Resources

🎥 Data Formats

In this video I describe different formats in which you may find data.

Resources

  • Pandas IO tools describes Pandas support for reading and writing various data formats

🎥 Integrating Data

This video talks about the key ideas of integrating multiple data sources.

🎥 Values and Types

This video discusses how to deal with and clean up various data types.

Resources

In addition to the next reading, you may find these useful:

📃 Pandas Text Operations

Read Working with Text Data.

🎥 Ethical Issues in Data

This video provides a very brief overview of some of the ethical issues in data collection and use.

📃 The Belmont Report

Read the Belmont Report.

Additional information, including a video, is available at the HHS Office of Human Research Protections.

📃 The ACM Code of Ethics

Read the ACM code of ethics.

🚩 Week 7 Quiz

Take the Week 7 quiz in Canvas.

🎥 A Real Example

This video describes the data cleaning and integration in a real example from my own research group. I am providing it so you can see the principles in this week’s material applied to an actual problem; details of this specific data set will not be on exams.

🎥 Workflow Advice

This video talks about general principles for processing and integration workflows.

📃 Further Reading

These aren’t part of the assigned reading, but are for you to learn more.

📩 Assignment 3

Assignment 3 is due on Oct. 10 at the end of the day.