Week 7 — Getting Data (10/3–7)#

This week has the following learning outcomes:

  • Locate data sources

  • Integrate data from multiple sources

  • Reason about bias and social effects of data

🧐 Content Overview#

Element Length

🎥 Data

8m22s

🎥 Finding Data

7m25s

🎥 Data Formats

13m55s

🎥 Integrating Data

11m40s

🎥 Values and Types

8m15s

📃 Working with Text Data

4200 words

🎥 Ethics

14m10s

📃 The Belmont Report

5500 words

📃 ACM Code of Ethics and Professional Responsibility

3500 words

🎥 Real Example

48m57s

🎥 Workflow Advice

5m24s

This week has 1h58m of video and 13200 words of assigned readings. This week’s videos are available in a Panopto folder.

The long video does not require as careful a study as the rest — it is here for you to see a worked, real-world example, but I will not be expecting you to reconstruct its various details.

📅 Deadlines#

  • Week 7 Quiz Thursday Oct. 7 at 8AM

  • Assignment 3 Sunday Oct. 10 at 11:59 PM

🎥 Introduction#

What are we talking about this week? I also discuss general principles that will drive the week’s material.

🎥 Finding Data#

Where do we go to find data?

Resources#

🎥 Data Formats#

In this video I describe different formats in which you may find data.

Resources#

  • Pandas IO tools describes Pandas support for reading and writing various data formats

🎥 Integrating Data#

This video talks about the key ideas of integrating multiple data sources.

🎥 Values and Types#

This video discusses how to deal with and clean up various data types.

Resources#

In addition to the next reading, you may find these useful:

📃 Pandas Text Operations#

Read Working with Text Data.

🎥 Ethical Issues in Data#

This video provides a very brief overview of some of the ethical issues in data collection and use.

📃 The Belmont Report#

Read the Belmont Report.

Additional information, including a video, is available at the HHS Office of Human Research Protections.

📃 The ACM Code of Ethics#

Read the ACM code of ethics.

🚩 Week 7 Quiz#

Take the Week 7 quiz in Canvas.

🎥 A Real Example#

This video describes the data cleaning and integration in a real example from my own research group. I am providing it so you can see the principles in this week’s material applied to an actual problem; details of this specific data set will not be on exams.

Resources#

🎥 Workflow Advice#

This video talks about general principles for processing and integration workflows.

📃 Further Reading#

These aren’t part of the assigned reading, but are for you to learn more.

📩 Assignment 3#

Assignment 3 is due on Sunday, October 9, 2022 at the end of the day.