Skip to content

Week 7 — Getting Data

This week has the following learning outcomes:

  • Locate data sources
  • Integrate data from multiple sources
  • Reason about bias and social effects of data

Pursued through the following activities:

This week's videos are also in a Panopto playlist.

Introduction

What are we talking about this week? I also discuss general principles that will drive the week's material.

Finding Data

Where do we go to find data?

Resources

Data Formats

In this video I describe different formats in which you may find data.

Resources

  • Pandas IO tools describes Pandas support for reading and writing various data formats

Integrating Data

This video talks about the key ideas of integrating multiple data sources.

Week 7 Quiz

Take the Week 7 quiz in Blackboard.

Values and Types

This video discusses how to deal with and clean up various data types.

Resources

In addition to the next reading, you may find these useful:

Pandas Text Operations

Read Working with Text Data.

Ethical Issues in Data

This video provides a very brief overview of some of the ethical issues in data collection and use.

The Belmont Report

Read the Belmont Report.

Additional information, including a video, is available at the HHS Office of Human Research Protections.

The ACM Code of Ethics

Read the ACM code of ethics.

A Real Example

This video describes the data cleaning and integration in a real example from my own research group. I am providing it so you can see the principles in this week's material applied to an actual problem; details of this specific data set will not be on exams.

Resources

Workflow Advice

This video talks about general principles for processing and integration workflows.

Further Reading

These aren't part of the assigned reading, but are for you to learn more.

Assignment 3

Assignment 3 is due on Oct. 11 at the end of the day.