The purpose of this course is for students to learn how to engage in the scientific process using data-centric concepts and methods and to think like a data scientist by critically analyzing their own work and the work of others.

Learning Outcomes

It is my goal that after completing this course successfully, you will be able to:

  1. Explore a data set to determine whether and how it might illuminate questions of interest.

  2. Define and operationalize a research question such that a data analysis could produce meaningful knowledge.

  3. Use best practices to carry out analyses in a documented, reproducible, and efficient fashion.

  4. Present the results of a data analysis with appropriate visuals and written argument.

  5. Identify weaknesses in a data analysis and assess their impact on the correctness and utility of the results.

  6. Assess ethical implications of an analysis in terms of both classical human subject research ethics and contemporary concerns such as fairness and bias.

  7. Understand the space of data science techniques and applications, and relate future learning to this framework.

Course Logistics

Course Title

CS 533: Introduction to Data Science



In-Person Schedule

Tuesdays and Thursdays 9:00–10:15 AM in CCP 221

Course Website

Canvas (private links, grades, and assignments)

Course Discussion



I am Michael Ekstrand, an assistant professor in the Dept. of Computer Science.


CCP 255

E-mail (but please use Piazza for non-grade class questions)

Office Hours

Tue 10:30–11:30 AM

Fri 10—11 AM

Online by appointment

I generally respond to course questions during normal hours (9a–5p M–F). I may occasionally reply to a question in an evening or on the weekend, but do not plan on it.

COVID-19 syllabus notice


This section is a required syllabus notice provided by Boise State University. For information about the plans and accommodations for COVID that I have designed into this specific class, see Coronavirus.

Many Boise State classes have resumed face-to-face meetings in the midst of a global pandemic and a recent local surge of infections. Our goal is to have a successful academic year while keeping our students, faculty, and local community healthy and safe. Public health requirements are in place to achieve that goal, the primary mechanism for which includes the mandatory use of facial coverings that protect all of us.

We have taken health precautions on campus so that you can have the option of a face-to-face course. However, there is still inherent risk associated with face-to-face courses during a pandemic because of proximity to others and length of potential exposure to the virus. Therefore, as members of this learning community, it is imperative that we all engage in behaviors that protect the overall public health.

You have enrolled in a face-to-face course, and this format offers a number of benefits that appeal to many students. In order to preserve your access to this face-to-face option you are required to

  1. sit in the same seat all semester (for purposes of contact tracing) and

  2. wear facial coverings in all face-to-face learning environments. You must keep your mouth and nose covered at all times throughout class — facial coverings cannot be pulled up or down. As a health precaution, eating and drinking are NOT permitted in the classroom.

By enrolling in an in-person course, you agree to comply with Boise State’s rules and precautions which include, but are not limited to, facial coverings, frequent hand washing, hand sanitizing, and sitting in the same seat all semester. Failing to comply with these rules and precautions is a violation of Boise State’s Student Code of Conduct and will subject you to university sanctions and discipline.

Seating assignment will be based on learning teams, and will be effective the second day of class after learning teams are formed.

University policy states that I am not allowed to begin/continue with instruction unless and until everyone present has a facial covering in place.

This course is designed to be accessible to all students. A very small percentage of people cannot wear facial coverings for reasons related to medical conditions or disabilities. If this is your experience, please contact the Educational Access Center to document your condition so that we may determine the best accommodation for you. Until an accommodation is in place, you will need to participate remotely. If you need to read lips or facial expressions to understand what people are saying, please let the Educational Access Center and me know via email.

If you are unwilling to wear a facial covering, you cannot participate in person. If this is the case, please dismiss yourself and either inquire whether you may participate in the class fully remotely, or contact the Registrar’s Office (208-426-4249) to pursue your learning experience in a different remote or online section. Should you refuse to cover your mouth and nose and also refuse to leave the classroom, I have been directed to dismiss the class and you will be reported to and contacted by the Dean of Students Office.

Mutual Guidelines for Safe Learning Environments

While these public health measures are essential to protecting our individual and communal health, they also complicate how we engage in teaching and learning. The following guidelines should ease our comfort and communication with one another:

  • In the classroom, we must wear a facial covering that covers our mouth and nose at all times. If you or I let our facial coverings slip, we will politely remind one another to secure our masks.

  • Facial coverings muffle voices. I will use the classroom microphone to amplify my voice through my mask. In addition, I will repeat your questions and summarize comments to ensure we all can follow any discussion.

Resources and Readings

The Resources page on the course web site has a more complete list of resources that I will update throughout the semester.


Our primary textbooks are:

Python for Data Analysis (2nd Edition) by Wes McKinney (O’Reilly, ISBN 978-1491957660)

Think Like a Data Scientist by Brian Godsey (Manning, ISBN 978-1633430273)

If you want a more thorough treatment of the core Python language traditional book format, I recommend:

Learn Python the Hard Way by Zed Shaw

Online Readings

Throughout the semester, I will assign various readings from the Internet and research papers. These will be posted to to the course web site.


We will be using Python with the PyData tools (Pandas, Numpy, Scipy, matplotlib, Seaborn, etc.). The easiest way to install the required software is to install Anaconda Python. The various Python libraries we use each have their own documentation.

I will not provide support for debugging Python installations other than Anaconda (and other Conda distributions, like miniconda and miniforge).

Further information about software, and links to documentation, can be found in the course resources.

Course Structure

This class uses a flipped-classroom design. I will primarily not be lecturing in class; instead, content delivery is asynchronous through the following resources:

  • Video lectures

  • Accompanying notes (on the course web site)

  • The textbook

  • Other readings linked from the lecture notes and course web page

Each week has approximately 75–90 minutes of video material, plus some reading. There will be a short quiz before each Thursday’s class as an initial check on your understanding of the material. Weeks with exams will have lower video and reading loads.

Our in-person class time will be for discussing the course material and topics, additional mini-lectures to supplement your understanding of the course, and team-based exercises to practice the material with ready access to peer and instructor support.

Putting these together, along with the larger assignments, results in the following components of the class:

  • Reading and study

  • Work in class

  • Assignments

  • Exams (in-class)


Your final grade will be computed from the course components as follows:



In-Class Work


Individual Quizzes


Team Quizzes








The standard 70/80/90 scale determines the minimum grade you will receive (that is, if you have 80 total course points, you will receive at least a B-).

Class Sessions

I expect you to actively participate in our class sessions. Since these are interactive working sessions, if you have a laptop please bring it to class.

The in-person class sessions are based on the principles of team-based learning, adapted to this class’s needs and role in the graduate curriculum. You will be working with a group of your peers throughout the semester helping each other practice the material, discuss and apply it to small exercises and examples, and identify places where you need and want to learn more.

On Thursdays, our class will look like a “normal” team-based learning class — a team quiz (the Readiness Assessment Process), supplementary content discussion as needed, and an application exercise. The TBL readiness assessment process normally consists of two parts: an individual quiz, followed by re-taking the quiz as a team to discuss and improve your answers. In this class, we will be implementing that with an online individual quiz due before class and an in-person team quiz (see Quizzes).

Tuesdays will be more varied. The first Tuesday will be the class introduction and overview, and our exams will be on Tuesdays. On weeks when an assignment is due (see the schedule), we will use the class period for collaborative problem-solving about the assignment. Other Tuesdays will be for more extended discussion and application exercises depending on where we are in the semester.

In-class work will contribute to your grade.

One final note on classes: taking care of our health, individually and as a class, is top priority. While I aim for every class to be a meaningful, can’t-miss learning experience, I also want us to have a general expectation that if we’re ill, we stay home, both to recover ourselves and to protect our colleagues’ health. I’ve designed the grading policies to help with this (see Coronavirus), but if we need to further adjust to accommodate the semester’s health demands, we will. If you need to miss class, I encourage you to phone in and have a teammate put you on speakerphone during your team’s activities, if you are feeling well enough; this will allow you to contribute to the team quiz and work.


There is a short weekly individual quiz in Canvas, due before class on Thursday (at 8AM, so I can look at results before class), on the readings and videos. The purpose of these quizzes is to help make sure you are prepared for applying the material, and to give both you and I early and frequent checks on your understanding.

In class on Thursdays, we will take the team quiz, which will usually be the same or very similar as the individual quiz. This is an opportunity to refine your understanding of the material and collaboratively fill in gaps you may have missed. Individual and team quizzes are weighted equally in your final grade.

For both individual and team quizzes, only the 10 highest scores will contribute to your grade.


There will be 8 homework assignments practicing data science techniques in Python. Each assignment is due at midnight on Sunday of the week in which it is due.

I have scheduled this due date to give you the weekend to finish the assignment if that works best with your schedule. However, as documented above, I do not commit to checking Piazza on the weekends to watch my time, and therefore you should work on the assignment early enough that you can raise questions and have them resolved before the weekend is over.

The first assignment (A0) is a warm-up assignment to make sure that you can install the software and run Python notebooks. You must complete this assignment individually.

The other assignments (A1–7) are full assignments doing data science with Python. You may do up to 3 of these assignments with a partner, and must complete 4 individually. You may choose which assignments you solo and which are a group effort. When doing an assignment with a partner, submit one copy for both of you, and indicate your partner’s name in the Blackboard submission comments.

I will drop the lowest assignment grade.

Exams and Final

There will be two midterm exams and a final.

There will be a makeup exam available the last week of class. If you turn in the makeup exam, your grade on it will replace your lowest normal exam grade.

Course Policies

Our work within this structure is governed by the following policies, in addition to applicable university policies and regulations and general principles of academic and scholarly integrity.

Web Site and Announcements

I will use Piazza for all course communication, including announcements. Please make sure your Piazza notifications are set correctly, so that you are notified of important announcements.

I will sometimes need to update assignments after I have issued them. When this is necessary, I will include a revision log at the top of the assignment describing the changes, and will make a course announcement regarding the change. I will also state whether the revision changes a requirement (this is rare), or clarifies a requirement.

Late Work

For the assignments, you have a budget of 4 late days to use throughout the semester, at your discretion. Each late day extends an assignment deadline by 24 hours with no penalty; late days are indivisible, so submitting an assignment 12 hours late requires an entire late day. You may use up to 3 late days on a single assignment.1 When submitting an assignment using a late day, state with your submission the number of days you are using. I appreciate it if you notify the TA and I (via a Piazza private message) prior to the deadline that you are planning to submit late, but do not require you to do so.

This policy, combined with dropping the lowest assignment grade, is designed to accommodate most ordinary need for extensions or late submissions. Therefore, exceptions beyond this policy will not generally be granted; any requests for individual exceptions must be submitted in writing (by e-mail or Piazza) so that I have a record of the request and my response.

Exams will be at the published times. The makeup exam is the ordinary accommodation for not being able to take the exam when scheduled.

Cheating and Academic Integrity

As both a scientist and a student, you are expected to do your own work, attribute sources, and respect the legal and moral rights of others with respect to their work; as a student, you are also required to abide by the Boise State University Student Code of Conduct. While I aim to allow you to make reasonable use of resources, cheating (including copying code, using unauthorized resources during tests, etc.) is not ok. If you are found to be cheating, the penalty may range from an F on the assignment to an F on the course and will generally be reported to the university.


I expect you to behave in a civil, respectful manner in all class interactions and to contribute to a constructive learning environment.

The Recurse Center Social Rules are a good source of guidance on how to maintain a constructive and educational environment.

If you experience or witness harassment of any form, please let me know.

Disability Accommodations

If you need particular accommodations or support to be able to fully participate in this course, please talk with me as soon as possible by e-mail or in office hours. If you have documentation from Disability Services authorizing specific accommodations, please bring it; however, a documented disability is not necessary for me to be willing to talk with you about how to make the course work for you.


While we are partly back to “normal”, this semester is difficult, with an unknown set of risks hanging over us and massive disruption in our work, study, and outside lives. I have designed the content, assessment plan, and policies of this class to try to be as flexible and accommodating of the various pressures and difficulties as possible while providing our intended learning outcomes. If there are things that are not working, or challenges arise that affect how you can engage with the class, please let me know — I want to make this work for you.

One specific risk is that one of you (or I) contract COVID-19 or another serious illness during the semester. I hope this does not happen to any of us, but I have attempted to account for this risk by setting the design parameter that you should be able to miss two full weeks of the semester (or perhaps even three) with minimal impact on learning outcomes and grades, possibly with the need to do some catch-up work. The following policies are specifically designed to accommodate this:

  • Asynchronous content delivery through video lectures and readings, so you can watch them at your convenience and catch up with course content later.

  • Dropping the lowest assignment score, so you can miss an entire assignment window with no effect on grade. If you need to do this, I encourage you to still do the missed assignment for your own study purposes, but you can do that on a timetable that works with your schedule and health needs.

  • Grading in-class work and quizzes on 10 out of 15 weeks.

  • A makeup exam to replace your lowest midterm grade, so you can miss a midterm and compensate with the makeup.

If you need to miss more than 2–3 weeks of class due to illness, it is likely that you will need to take an incomplete on the course. I am happy to work with you on this if it becomes necessary.

While COVID-19 is the direct impetus for much of this design, these class features are not limited to that specific illness. I want us to take care of our own and each others’ health, which may include staying home from class, and while doing so will result in missed practice and discussion (and the design of in-class activities is not amenable to recording), I am designing the grading policies to avoid formal penalties for taking care of yourself.

Communication will be the key to making this semester work. I welcome your feedback in general, and particularly around things that are or are not working for the course structure, both generally and for your particular situation.


The purpose of this rule is so that we can discuss assignment solutions on Thursday after they are due.