Week 7 — Getting Data (10/3–7)
Week 7 — Getting Data (10/3–7)#
This week has the following learning outcomes:
Locate data sources
Integrate data from multiple sources
Reason about bias and social effects of data
🧐 Content Overview#
This week has 1h58m of video and 13200 words of assigned readings. This week’s videos are available in a Panopto folder.
The long video does not require as careful a study as the rest — it is here for you to see a worked, real-world example, but I will not be expecting you to reconstruct its various details.
Week 7 Quiz Thursday Oct. 7 at 8AM
Assignment 3 Sunday Oct. 10 at 11:59 PM
What are we talking about this week? I also discuss general principles that will drive the week’s material.
🎥 Finding Data#
Where do we go to find data?
🎥 Data Formats#
In this video I describe different formats in which you may find data.
Pandas IO tools describes Pandas support for reading and writing various data formats
🎥 Integrating Data#
This video talks about the key ideas of integrating multiple data sources.
🎥 Values and Types#
This video discusses how to deal with and clean up various data types.
In addition to the next reading, you may find these useful:
📃 Pandas Text Operations#
Read Working with Text Data.
🎥 Ethical Issues in Data#
This video provides a very brief overview of some of the ethical issues in data collection and use.
📃 The Belmont Report#
Read the Belmont Report.
Additional information, including a video, is available at the HHS Office of Human Research Protections.
📃 The ACM Code of Ethics#
Read the ACM code of ethics.
🚩 Week 7 Quiz#
Take the Week 7 quiz in Canvas.
🎥 A Real Example#
This video describes the data cleaning and integration in a real example from my own research group. I am providing it so you can see the principles in this week’s material applied to an actual problem; details of this specific data set will not be on exams.
🎥 Workflow Advice#
This video talks about general principles for processing and integration workflows.
📃 Further Reading#
These aren’t part of the assigned reading, but are for you to learn more.
Social Data: Biases, Methodological Pitfalls, and Ethical Boundaries — strongly recommended
CITI Training on Human-Subjects Research — free for Boise State students, faculty, and staff; this training is required if you are involved in carrying out human-subjects research at Boise State
📩 Assignment 3#
Assignment 3 is due on Sunday, October 9, 2022 at the end of the day.