Skip to content

Syllabus

The purpose of this course is for students to learn how to engage in the scientific process using data-centric concepts and methods and to think like a data scientist by critically analyzing their own work and the work of others.

Final as of Aug. 20

Since the semseter is now underway, this syllabus is now policy-frozen. Any further updates will be accompanied revision notes in a Revision Log section.

Learning Outcomes

It is my goal that after completing this course successfully, you will be able to:

  1. Explore a data set to determine whether and how it might illuminate questions of interest.
  2. Define and operationalize a research question such that a data analysis could produce meaningful knowledge.
  3. Use best practices to carry out analyses in a documented, reproducible, and efficient fashion.
  4. Present the results of a data analysis with appropriate visuals and written argu-ment.
  5. Identify weaknesses in a data analysis and assess their impact on the correctness and utility of the results.
  6. Assess ethical implications of an analysis in terms of both classical human subject research ethics and contemporary concerns such as fairness and bias.
  7. Understand the space of data science techniques and applications, and relate future learning to this framework.

Course Logistics

Course Title
CS 533: Introduction to Data Science
Credits
3
Schedule
Tuesdays 9:00–10:15 AM or 7:15–8:30 PM (synchronous activity)
Thursdays 9:00-10:15 AM (open Q&A / discussion)
Course Website
https://cs533.ekstrandom.net/
Blackboard (private links, grades, and assignments)
Course Discussion
Piazza

Instructor

I am Michael Ekstrand, an assistant professor in the Dept. of Computer Science.

Office
CCP 255 (I wish)
E-mail
michaelekstrand@boisestate.edu (but please use Piazza for non-grade class questions)
Office Hours
By appointment (since we’re online)

I generally respond to course questions during normal hours (9a–5p M–F). I may occasionally reply to a question in an evening or on the weekend, but do not plan on it.

Resources and Readings

The Resources page on the course web site has a more complete list of resources that I will update throughout the semester.

Textbook

Our primary textbook is:

Python for Data Analysis, 2nd Edition by Wes McKinney (O’Reilly, ISBN 978-1491957660)

The following text is a useful resource, and influences a lot of my teaching:

Think Like a Data Scientist by Brian Godsey (Manning, ISBN 978-1633430273)

And if you want a more thorough treatment of the core Python language traditional book format:

Learn Python the Hard Way by Zed Shaw

Online Readings

Throughout the semester, I will assign various readings from the Internet and research papers. These will be posted to to the course web site.

Software

Throughout this class, we will be using Python with the PyData tools (Pandas, Numpy, Scipy, matplotlib, Seaborn, etc.). The easiest way to install the required software is to install Anaconda Python. On Onyx, you can install the Linux version in your home directory.

I will not provide support for debugging Python installations other than Anaconda.

The various Python libraries we use each have their own documentation:

Course Structure

Content delivery for this course is asynchronous, through the following resources:

  • Video lectures
  • Accompanying notes (on the course web site)
  • The textbook
  • Other readings linked from the lecture notes and course web page

Synchronous class time (via Zoom) will be used for discussion and group exercises. Generally, I plan the following structure:

  • Tuesdays we will have interactive activities and exercises, that will count towards your grade.
  • Thursdays I will be in Zoom during the scheduled time for open discussion on the week's class topics.

Grading

Your final grade will be computed from the course components as follows:

Category %
Participation 5%
Quizzes 10%
Assignments 50%
Exams 20%
Final 15%

The standard 70/80/90 scale determines the minimum grade you will receive (that is, if you have 80 total course points, you will receive at least a B-).

Participation

I expect you to actively participate in:

  • The weekly synchronous activity sessions (with the lowest 5 weeks dropped)
  • Online Q&A and discussion

This will earn the Participation section of your grade.

Quizzes

There will be a short weekly quiz in Blackboard, due Monday night, on the readings and videos. The lowest 5 quizzes will be dropped. I will clearly indicate in each week's module page and in the quiz which of the readings and videos are required before the quiz. The purpose of these quizzes is to help make sure you are prepared for Tuesday's activity.

Assignments

There will be 8 homework assignments practicing data science techniques in Python. Each assignment is due at midnight on Sunday of the week in which it is due.

I have scheduled this due date to give you the weekend to finish the assignment if that works best with your schedule. However, as documented above, I do not commit to checking Piazza on the weekends to watch my time, and therefore you should work on the assignment early enough that you can raise questions and have them resolved before the weekend is over.

The first assignment (A0) is a warm-up assignment to make sure that you can install the software and run Python notebooks. You must complete this assignment individually.

The other assignments (A1–7) are full assignments doing data science with Python. You may do up to 4 of these assignments with a partner, and must complete 3 individually. You may choose which assignments you solo and which are a group effort. When doing an assignment with a partner, submit one copy for both of you, and indicate your partner's name in the Blackboard submission comments.

I will drop the lowest 2 assignment grades.

Exams and Final

There will be two midterm exams and a final. All exams will be take-home. You will have 48 hours to complete each midterm exam, and 72 hours to complete the final. No collaboration is allowed on the exam, except public Q&A for clarifications on Piazza.

There will be a makeup exam available the last week of class. If you turn in the makeup exam, your grade on it will replace your lowest normal exam grade.

Course Policies

Web Site and Announcements

I will use Piazza for all course communication, including announcements. Please make sure your Piazza notifications are set correctly, so that you are notified of important announcements.

I will sometimes need to update assignments after I have issued them. When this is necessary, I will include a revision log at the top of the assignment describing the changes, and will make a course announcement regarding the change.

Late Work

For the assignments, you have a budget of 4 late days to use throughout the semester, at your discretion. Each late day extends an assignment deadline by 24 hours with no penalty; late days are indivisible, so submitting an assignment 12 hours late requires an entire late day. You may use up to 3 late days on a single assignment.1 When submitting an assignment using a late day, state with your submission the number of days you are using. I appreciate it if you notify me (via a Piazza private message) prior to the deadline that you are planning to submit late, but do not require you to do so.

This policy, combined with dropping the lowest assignment grades, is designed to accommodate most ordinary need for extensions or late submissions. Therefore, exceptions beyond this policy will not generally be granted; any requests for individual exceptions must be submitted in writing (by e-mail or Piazza) so that I have a record of the request and my response.

Exams will be at the published times.

Cheating and Academic Integrity

As both a scientist and a student, you are expected to do your own work, attribute sources, and respect the legal and moral rights of others with respect to their work; as a student, you are also required to abide by the Boise State University Student Code of Conduct. While I aim to allow you to make reasonable use of resources, cheating (including copying code, using unauthorized resources during tests, etc.) is not ok. If you are found to be cheating, the penalty may range from an F on the assignment to an F on the course.

Conduct

I expect you to behave in a civil, respectful manner in all class interactions and to contribute to a constructive learning environment.

The Recurse Center Social Rules are a good source of guidance on how to maintain a constructive and educational environment.

If you experience or witness harassment of any form, please let me know.

Disability Accommodations

If you need particular accommodations or support to be able to fully participate in this course, please talk with me as soon as possible.

Coronavirus

This semester is difficult, with an unknown and dangerous set of risks hanging over us and massive disruption in our work, study, and outside lives. I have designed the content, assessment plan, and policies of this class to try to be as flexible and accomodating of the various pressures and difficulities as possible while providing our intended learning outcomes. If there are things that are not working, or difficulties arise that affect how you can engage with the class, please let me know — I want to make this work for you.

One specific risk, obviously, is that one of you (or I) contract COVID-19 during the semester. I hope this does not happen to any of us, but I have attempted to account for this risk by setting the design parameter that you should be able to miss two full weeks of the semester (or perhaps even three) with minimal impact on learning outcomes and grades, possibly with the need to do some catch-up work. The following policies are specifically designed to accomodate this:

  • Asynchronous content delivery through video lectures and readings, so you can watch them at your convenience and catch up with course content later.
  • Dropping the lowest assignment score, so you can miss an entire assignment window with no effect on grade. If you need to do this, I encourage you to still do the missed assignment for your own study purposes, but you can do that on a timetable that works with your schedule and health needs.
  • Grading class participation on 10 out of 15 weeks.
  • A makeup exam to replace your lowest midterm grade, so you can miss a midterm and compensate with the makeup.

If you need to miss more than 2–3 weeks of class due to illness, it is likely that you will need to take an incomplete on the course. I am happy to work with you on this if it becomes necessary.

Communication will be the key to making this semester work. I welcome your feedback in general, and particularly around things that are or are not working for the course structure, both generally and for your particular situation.


  1. The purpose of this rule is so that we can discuss assignment solutions during the first Q&A after they are due.