Week 13 — Unsupervised
In this week, we are going to talk more about unsupervised learning — learning without labels.
We are not going to have time to investigate these techniques very deeply, but I want you to know about them, and you are experimenting with them in Assignment 6.

This week's content is lighter, since we just had a large assignment and a midterm, and another assignment is due on Sunday.

There is no quiz this week due to the cluster of deliverables and upcoming assignment.
I will double-check that this is not a grading problem for anyone.

This week's videos are also available as a Panopto playlist .

No Supervision
In this video, we review the idea of supervised learning and contrast it with unsupervised learning.

CS 533
INTRO TO DATA SCIENCE
Michael Ekstrand
NO SUPERVISION
Learning Outcomes (Week)
Distinguish between supervised and unsupervised learning.
Project data into lower-dimensional space with matrix factorization.
Cluster data points.
Photo by Benedikt Geyer on Unsplash
Learning So Far
We learn to predict a label
Categorical label → classification
Continuous label → regression
This is called supervised learning
We have ground truth for outcome
Sometimes called supervision signal
Unsupervised Learning
What can we do without a supervision signal?
Group instances together (clustering)
Learn vector spaces for items
Learn relationships between items
Learn relationships between features
Middle Ground: Self-Supervised Learning
Sometimes we can extract supervision signals from data
Word embeddings: predict if two words appear together
Why?
Exploring data
Reducing data complexity
For visualization
For learning (“curse of dimensionality”)
Inputs into other models
Sometimes it’s all we have
Wrapping Up
Unsupervised learning learns patterns from data without labels.
It’s useful for grouping items together, exploration, and as input to other models.
Photo by Fran Jacquier on Unsplash
Decomposing Matrices
This video introduces the idea of matrix decomposition , which we can use to reduce the dimensionality of data points.

CS 533
INTRO TO DATA SCIENCE
Michael Ekstrand
DECOMPOSING MATRICES
Learning Outcomes
Review matrix multiplication
Decompose a matrix into a lower-rank approximation
Photo by Carissa Weiser on Unsplash
What Is a Matrix?
Matrix Multiplication
Sparse Matrix
A matrix is sparse (mathematically) if most values are 0.
Sparse matrix representations only store nonzero values
scipy.sparse
np.ndarray is our dense matrix
DataFrame and Series cannot be sparse 😔 (they store 0s)
Dimensionality Reduction
Why?
Compact representation
Remove noise from original matrix
Plot high-dimensional data to show relationships
SVD preserves distance
SVD can improve distance
Find relationships between features
Principle Component Analysis – find vectors of highest variance
How?
Principal Component Analysis
Use Case 1: Compression & Denoising
Use Case 2: Visualization
Low-dimensional vectors can be visualized!
See example notebooks
Use Case 3: Better Neighborhoods
High-dimensional spaces have 2 problems for distance:
Distance more expensive to compute
Points approach equidistant in high-dimensional space
Decomposed matrices can improve this!
k-NN classification
k-means clustering
Use Case 4: Categorical Interactions
Wrapping Up
Matrix decomposition (also called matrix factorization or dimensionality reduction) breaks a high-dimensional matrix into a low-dimensional one.
It preserves distance and, in some configurations, finds the direction of maximum variance.
Photo by Thomas Willmott on Unsplash
Resources
Movie Decomposition
The Movie Decomposition notebook demonstrates matrix decomposition with movie data.

Clustering
This video introduces the concept of clustering , another useful unsupervised learning technique.

CS 533
INTRO TO DATA SCIENCE
Michael Ekstrand
CLUSTERING
Learning Outcomes
Understand the idea of ‘clustering’
Interpret the results of clustering with k-means
Photo by Markus Winkler on Unsplash
Grouping Things Together
What if we want to find groups in our data points?
We don’t know the groups (or we would classify)
Find them from the data
This is clustering
Membership Kinds
Mixed-membership: point can be in more than one cluster
Matrix factorization can be a kind of clustering
Single-membership: point is in precisely one cluster
Centroid-Based Clustering
K-Means Algorithm
Clustering in SKlearn
KMeans class
fit(X) learns cluster centers (can take y but will ignore)
predict(X) maps data points to cluster numbers
cluster_centers_ has cluster centers (in input space)
Other clustering algorithms have similar interface.
Evaluating Clusters
Look at them
Seriously. Look at them.
If you have labels, compare
Useful for understanding behavior
Quality scores
E.g. silhouette compares inter- and intra-cluster distances
Can be used to compare clusterings, no absolute quality values
Wrapping Up
Clustering allows us to identify groups of items from the data.
May or may not make sense.
Cluster quality depends on features, metric, cluster count, and more.
Photo by Igor Milicevic on Unsplash
Resources
Clustering Example
The clustering example notebook shows how to use the `KMeans`

class.

Vector Spaces
This video talks about vector spaces and transforms.

CS 533
INTRO TO DATA SCIENCE
Michael Ekstrand
VECTOR SPACES
Learning Outcomes
Introduce more formally the concept of a vector space
Understand vector space transformations
Photo by Markus Winkler on Unsplash
Vector Spaces
Vector Operations
Addition (and subtraction)
Scalar multiplication
Inner products (sum of elementwise products)
Distance (inner product of subtraction with itself)
Matrix of Data Points
What Is A Matrix?
A collection of row vectors
A collection of column vectors
A linear map from one vector space to another
A few matrix ops:
Addition
Multiplication (by scalar or compatible matrix or vector)
Transpose
Special Matrices
Matrix-Vector Multiplication
Transformations
All by multiplying by a matrix:
Reduce (or increase) dimensionality
Translate
Scale
Skew
Rotate
Any linear transformation (this is actually what linear means)
Linear Systems
Wrapping Up
Vectors represent data points in a vector space.
These can be manipulated and transformed.
Linear algebra teaches much more.
Photo by Jurica Koletić on Unsplash
Practice: SVD on Paper Abstracts
The Week 13 Exercise notebook demonstrates latent semantic analysis on paper abstracts and has an exercise to classify text into new or old papers.

It requires the chi-papers.csv file, which is derived from the HCI Bibliography .
It is the abstracts from papers published at the CHI conference (the primary conference for human-computer interaction) over a period of nearly 40 years.

If you want to see how to create this file, see the Fetch CHI Papers example .

Assignment 6
Assignment 6 is due November 22, 2020 .