Week 13 β Unsupervised (11/14β18)
Contents
Week 13 β Unsupervised (11/14β18)#
In this week, we are going to talk more about unsupervised learning β learning without labels. We are not going to have time to investigate these techniques very deeply, but I want you to know about them, and you are experimenting with them in Assignment 6.
This weekβs content is lighter, since we just had a large assignment and a midterm, and another assignment is due on Sunday.
π§ Content Overview#
Element | Length |
---|---|
2m51s | |
17m22s | |
6m56s | |
7m27s | |
10m31s |
This week has 0h45m of video and 0 words of assigned readings. This weekβs videos are available in a Panopto folder.
π Deadlines#
Quiz 13, November 17
Assignment 6, November 20
π₯ No Supervision#
In this video, we review the idea of supervised learning and contrast it with unsupervised learning.
- But in this video, I'm going to introduce you to the idea of unsupervised learning.
- This week we're going to learn about the difference between supervised and unsupervised learning.
- You're going to learn how to project data into lower dimensional spaces with matrix factorization and to cluster data points.
- So so far, we've been focusing on learning what we're trying to predict a label.
- We have a categorical label where we're trying to predict. And this is classification we're trying to classify as spam, not spam fraud.
- There's a couple of the examples we've been using.
- We can have a continuous label we're trying to predict, in which case we call it, in which case it's three regression.
- But this is called we can also try to predict ordinal variables, et cetera.
- But this is all called supervised learning. Where we have the key idea here is we have a ground truth for the outcome.
- We have observed outcomes for our training data. This is sometimes called a supervisions signal.
- And we're trying to learn to predict these known outcomes.
- That's that's the heart of what it means to do supervised learning. But.
- We can do things without having a supervision signal.
- And some of the things we can do without access to a supervision signals, we can try to group instances together what's called clustering,
- where we try to find related groups and clustering and multiclass classification are related.
- Because if you've got multiple class labels, then you're trying to divide them into that.
- Clustering is where you're trying to buy the Met, but you don't have the class labels.
- You can try to learn vector spaces for items in order to say,
- learn the relationship between items in some cases, also to learn the relationships between features of items.
- There's also a middle ground called self supervised learning where you don't have labels in the sense that we use them in supervised learning,
- but you extract something that looks like a label from the data and use that as a supervisions signal.
- Word and beddings are one example of of self supervised learning.
- So why do we want to do this unsupervised learning? There's a few reasons.
- One is that it can be useful as a data exploration tool.
- If you can find clusters in the data, then that can help guide where you your investigation to understand what's going on in your data.
- It can help to reduce data complexity for either visualization or for subsequent learning tasks.
- You can use them as inputs into other models and sometimes it's all we have.
- We don't have access to labels and we're trying to make sense of our data source. Unsupervised learning techniques can be helpful in order to do that.
- So to wrap up unsupervised learning learns patterns from data.
- We don't have labels available. It's useful for grouping items together, exploration and as input into other models.
π₯ Decomposing Matrices#
This video introduces the idea of matrix decomposition, which we can use to reduce the dimensionality of data points.
- Blow in this video. I want to introduce the idea of Matrix decompositions.
- There's a couple of notebooks that go with this to demonstrate the concepts more and to give you some additional readings in this.
- This video is going to explain what's going on. So goal here is to view matrix multiplication and decompose a matrix into a lower rank approximation.
- So if you've taken a linear algebra class, you may have seen a matrix.
- We have a it's a two matrix is just a two dimensional array of numbers.
- We say it's dimension is M by N Rose always go first. When we're notating matrices, this is also the convention used by NUM Pi.
- And so it is. It's Rose R and dimensional row vectors.
- So that's a row vector. It's columns. R m dimensional column vectors have a column vector there.
- We can also compute its transpose if we swapped the rows and columns. Vampyre exposes that as the exact capital T operation.
- But a matrix is it's this two dimensional array of numbers. We can do a few things with them.
- We can add them together. We can subtract them. One of those things we can do is we can multiply them.
- So if we have two matrices A and B and A is M by K and B is K by N.
- This is important. The inner dimensions of the two matrices have to match.
- Then we can compute the Matrix product and it's gonna be M times and so for multiplying two matrices.
- Unlike multiplication of of scalars.
- Matrix multiplication is not commutative. You can't switch B get the same result.
- You have to have the same the same matrix or the same dimensionality on the inside.
- And what you get as the result is the dimensionality of the outside.
- So what it's defined as is is CIJ Row, Row I column J is defined by the SARM.
- Across the row of A and down the column of B of the pairwise items.
- So it is it is the DOT product of Roe A.
- And Column B. Or and of column Jay of of Matrixx Base.
- You compute. You compute the DOT product. So see what C is, is it is the dot product of every row of A with every column of B num pi.
- You can compute this with the A and B operation. That's the Python Matrix multiplication operator.
- So this is a fundamental operation for matrices that you can multiply them together.
- We also can have what we call sparse matrices and a matrix is sparse.
- If most of its values are zero, that's what it means mathematically for it to be sparse.
- Computationally a sparse matrix is a matrix with a zero. Values are not stored.
- And so Saipov provide the sparse Matrix class that we can a number of sparse matrix classes that we can use.
- The number high ENDI array is a dense matrix. Data frame is also a dense matrix.
- S data frame.
- It's very serious, can't be sparse, but if we need to do sparse computations, we can use this Sipi that sparse package to give us sparse matrices.
- This is what Saikat learned does under the hood.
- When you do, when you tokenized text with its with its count vector Dreiser or its TFI D.F. Vector riser,
- its giving use Sibai sparse matrices as a result.
- Now one of the things we can do with another thing we can do with a matrix is do what's called the dimensionality reduction.
- And this follows from a theorem that if we have a matrix and intown by hand,
- then we can compute a decompositions into the multiplication of three matrices P Sigma and Q Transpose.
- And this gives us we can break down any matrix into this, Sipi provides us with functions in order to compute this decompensating given an ax.
- It will compute piece Sigmon Q or Q Transpose. We can also then truncate this.
- So Sigma is what's called the singular values.
- We can truncate this matrix, only keep the K largest ones and set the rest of zero or just cut them out so that.
- To so that we can get we get a narrower P and a narrower cure, a shorter Q transpose.
- And this gives us an approximation of the original Matrix X.
- There's a few useful properties. So the rows of P rows of P correspond to rows of X rows correspond.
- So what Pier gives us is a K dimensional. Representation of rows of X, if X has a lot of columns.
- This is super useful because if case more than no columns of X, then we get the smaller,
- more compact representation of the original rows of the Matrix.
- Also, it preserves distance things that are things are approximately as far apart in the in P as they are in the original X.
- So why do we want to do this? One reason is for a compact representation. As I said, we get this k dimensional representation of our values x.
- It can be useful to remove noise. I'll talk more about that in a little bit.
- It can be useful for plotting high dimensional data to show relationships. If you've got 50 columns of X.
- You can plot just like two columns of it. Or you can take the SDD to find columns that are particularly hard to project it into another.
- Another vector space that you can show it in two dimensions that maximize the maximize the
- amount or the extent to which the data points you can be spread out in those two dimensions.
- It can also improve our ability to compute distances and fine and then it can provide.
- It finds relationship and it can be helpful for finding relationships between features.
- So if we have correlated features, we have multiple features that are partially measuring a similar thing.
- They're correlated with each other. Principal component analysis is an application of matrix composition that allows us to find those relationships
- and combine those correlated things and extract non correlated components out of these correlated observations.
- So how do you actually do this? So Saikat learn provide a truncated SPDM class.
- It's a transformer. If you call fit, it learns Kute transpose.
- If you call transformatory turns the rows of P for the instances you pass in the anthraces,
- you pass and don't have to be the same instances that you gave to fit.
- Fit. Transform does the whole thing at once. I'm giving you example Kobel that you see this in action.
- And then there's also the SEVIS function inside PI that computes the SVOD of a sparse matrix.
- So one of the applications of SFD, as I said, is something called principal component analysis.
- If you mean center, your features are you can standardize them. But if you mean center your features and then you compute the S.V. D.
- What you get is the columns of P or what we call principle components.
- So columns zero. Is the position of the data point along a vector that has a maximum variance and you're over?
- You can go over to Q and find that vector in the original of the original data space.
- And so what it does is it finds you've got this you've got this data in space and it finds it aligned through the data.
- That explains more variance over a long which is not that it explains more
- variance along which there is more variance than there is along any other line.
- You could draw through that data point through that that space. That can be the axis at axis and a vector space.
- And then if you see projekt all of your points onto that line, then you can find another line that explains most of the remaining.
- The more of the remaining variance than any other.
- So here I have I have data projected in two dimensional space, it's actually three dimensional data.
- There's some correlation. We get this line here that runs through it.
- And this line, if you go along so that the variance along the axis,
- there's a fair amount of variance is a fair amount of variance along why there's more variance along this line.
- So it gives just this line that this is the line through which there's most of the variance.
- We could transform the data. So this line is now our X X-axis.
- And then we could look at where's the where's the rest of the variance? So we can see here.
- Here I'm showing the vectors the first and the second principle components.
- The first one is along this line. I showed you the first place. It can go either way.
- PCI does not guarantee which direction the sign is going to go. It does the same flip.
- You can point the arrow the other direction, but you've got this this vector here that lets us.
- Is this line along which there's more variance than any other?
- But then there's this second line and it's orthogonal to the first and it's OK, where's the next chunk of variance?
- What direction do I go to find the next amount of variance? You and The Notebook was online that generated these plots along with the simulation.
- You can play with a little bit. So why do we want to use this? There's a few different useful use cases.
- One is to compress and genoise our data. So as I said, we truncate we can truncate DVD.
- We keep the K largest singular values. This means that P and Q are much smaller than pretty liste.
- P is much smaller than X.
- Then the result is that when we multiply them back together, it approximates X and it is it is the best rank approximation to the rank.
- The rank of a matrix is basically a measure of how complex it is.
- What it is, is it's the number of non-zero values in the singular value decomposition.
- And so if we zero out the smallest values, what we get is.
- If least squares error is our measure. Of how good an approximation of the original matrix is.
- There is no better approximation than the truncated SPDM.
- Another thing that happens is if there's noise in X, if X is some strong signal and a bunch of noise.
- The largest singular values and singular vectors are probably going to pick up the signal and not the noise.
- Always both for the most part. And so if you add the noise will be learned in the smaller vectors.
- And so if you drop the smaller vectors, then you're dropping a bunch of the noise.
- And so it can be useful to clean up data for some various purposes.
- If X is sparsely observed,
- you can use this also to impute values if you're careful about how you set up the composition because that he got but this you have to have
- the full matrix if you're careful about how you set it up or use an alternative means of learning one that can deal with missing data.
- Then you can multiply them back together to predict what the values you weren't able to observe of X are.
- Really useful technique for imputing that data.
- And for filling in unobserved values.
- This is how a lot of recommender systems work. Actually, if we observe your preference for some movies,
- we can use a singular value decomposition or a derivative of it in order to fill back in and estimate your preference for the movies you haven't seen.
- And then if X is the document term Matrix or the Roeser documents and the columns or terms and we take the SFD,
- this is what's called latent semantic analysis or latent semantic indexing.
- And it's a way for understanding.
- What we call the topics in a corpus, because these these dimensions and in the reduced dimensionality space, the metric, the.
- This inner vector space and talk of another video about what a little bit more about what that means.
- They correspond theoretically to different kinds of topics.
- And so if that document becomes represented rather than the words, it becomes represented as a vector over topics.
- And each document is a mixture of these topics and words correspond to topics as well.
- The model there is that a document produces a word or contains a word because the document is about topic and the word is relevant to the topic.
- And so you learn these topics.
- And it lets you compare documents even if they don't have as many words in common because you can establish this in enemies, OK?
- These words are on the same topic. Then if I use some of them, I'm on that topic.
- And another document uses other ones that it's on that topic. And we can learn the topic relationship by doing a matrix, the composition.
- Another one is for visualization. So low dimensional vectors can be visualized.
- And I show this in the example notebooks.
- But if we take an SVOD, then we can you say the first two columns of the SFD to visualize our data points in a space.
- The space is not human interpretable. But let's see how spread out the points are.
- We can also use it to get better neighborhood. So one of the problems, there's a couple of problems with high dimensional spaces.
- We're trying to compute distances.
- One is that distance is more expensive to compute because the more dimensions you have, the more compute you need to do.
- But also, as the dimensionality of a space increases, the number of features,
- the number of columns in your in your Matrix point start to look about the same distance from each other.
- It's called the cursive dimensionality. Decomposed matrices can help with this.
- So doing SFD can help make either a K and then classifier or a commune's clustering approach work better if you work on the.
- If you do the K and N or the K means clustering, which we're going to talk about in the next video.
- On top of. The transformed data using an SVOD, it can sometimes be more effective than if you just use it on its own.
- The fourth case is the model categorical interaction. So if we want to models say the likelihoods of words to appear together,
- like what's the likelihood that apple and fish appear within three words of each other in a sentence?
- We can think about this as a probability, but there's N squared of them because we have no probability for every pair of words.
- That's a lot to learn if we were going to learn.
- If you want to learn a matrix that maps the probability between every pair of words in the English language, that's a very, very large matrix.
- So instead, what we can do is we can learn of reduced dimensionality, space, and we usually don't do this by actually taking the NCD.
- We do it with with approximation method that just directly optimize these vectors.
- But we can learn vectors for words so that.
- You basically using a logistic model of the probabilities so that the DOT products between them is the law gods of the two words appearing together.
- And so words that appear together are going to have similar vectors.
- Words that appear far apart are going to have very different vectors. And this is called a word embedding.
- This is what a word embedding does. Like word the vac glove. These various word embedded.
- This is what they do. And more sophisticated versions of this are at the heart of a lot of machine learning models.
- So a lot of neural architecture is a lot of deep learning.
- Models have various embedding and all in embedding is it's a vector representation of something.
- And they're often done through these kinds of dimensionality, reduction techniques or approximations of them,
- so that you get you get these vectors, these low dimensional vectors that are in a space like they're 10 dimensional vector.
- And the 10 dimensions don't mean anything. They're just dimensions that are useful for explaining this.
- This this instance is relationship to whatever we're trying to do with it.
- And so they take you a long ways and a lot of machine learning.
- And then they're the core piece of a lot of different models.
- So to wrap up Matrix decompositions, which is also called a matrix factorization or dimensionality reduction,
- breaks a high dimensional matrix into a low dimensional one. And it's useful for compressing data.
- You've got a more compact representation. It's useful for making it more well behaved numerically.
- We can compute better distances. We compute distances more efficiently, can reduce noise in the data.
- There's a lot of different purposes for which decomposing data into this lower dimensional space is super useful.
Resources#
The next notebook
The PCADemo, demonstrating the PCA plots
π Movie Decomposition#
The Movie Decomposition notebook demonstrates matrix decomposition with movie data.
π₯ Clustering#
This video introduces the concept of clustering, another useful unsupervised learning technique.
- This video I want to introduce clustering, so learning outcomes are for.
- To understand the idea of clustering and to interpret the results of clustering with K means.
- So the idea of clustering is to group things together. So if we want to find groups in our data points, but we don't know what the groups are,
- many clustering techniques require us to know how many groups there are. But we don't know what the groups are.
- If we did, we would just use a multiclass classifier to find them. We want to find them from the data.
- This is what we call clustering. So there's a couple of different kinds of clustering in terms of the membership of the clusters.
- One is mixed membership where a point can be in more than one cluster.
- And it has a different degree of affinity for the different clusters matrix factorization we can see as a kind of mixed membership clustering.
- Where do the values in the decomposed and the lower dimensional space are?
- How strongly the matrix is associated? The data point is associated with that cluster, but single membership clustering we want to find.
- We want to find clusters. And we want to put each point in one cluster. So we might have movie types, different types of movies.
- Want to put each movie in a different type. These might align with genres. They might align with something else.
- So the idea one technique is to do it based on what we call centroid and the centroid is just the center of a cluster.
- And so to do this, we typically need a distance function between two data points, between two vectors.
- Often this is the Euclidean distance, but we have to define the vector space properly.
- We need to do the feature engineering, have the features appropriately normalized and standardized so that the distance between them.
- The distance between their vectors actually reflects how far apart the vectors are, the instances are with regards to our clustering goal.
- If the distance does not relay, if it isn't so, that more similar items in terms of what we what we hope the cluster is going to uncover,
- those more similar items need to have a smaller distance between each other than they do their distance to a not a less similar right along again,
- along whatever it is that we hope the clustering is going to uncover.
- We can do clustering on on dimensioned after dimensionality reductions that we can get.
- We can get our ah. We can work in a lower dimensional space and sometimes that'll make where our distances be better behaved.
- So the goal is to find the centroid of these clusters. And then what we'll do is we'll when an item comes in, we'll find which of our clusters.
- So we have 10 clusters. We're gonna fi compare it. We're gonna measure its distance from the centers of all of the clusters.
- And we're gonna say it's in the closest one. And so the K means algorithm does this by.
- So we tell you how many clusters we want. We want five clusters, 10 clusters.
- And it picks ten points. And says, these are my cluster centers.
- And then figures out what cluster all of the data points are in.
- And now that it's got all the data points clustered, it uses the it takes each cluster and recompute the new center.
- It takes all the data points, computes the center of that set of points. And that's the new cluster center.
- It then does this again because then you move the cluster center.
- It might be that some points on the edge between it and another cluster switch clusters.
- And then once you've switched clusters, you compute the center centroid again and you repeat this several times until what we call convergence.
- And so this is a this is an example. We've seen a couple of others of what's called an iterative method,
- iterative method as a method where you start somewhere and you incrementally improve your result.
- So we start with some cluster centers, cluster the data points, move the centers to reflect the data points.
- Try again. And convergence basically means it stops moving.
- We've rerun another round of it. And our cluster centers haven't moved very much.
- This does require us to know. Okay. It can't figure out how many clusters there are supposed to be.
- And they're optimizations that can improve it various ways, particularly picking a like the simple way to do it.
- If you pick K points just completely at random. There are more sophisticated ways to pick those points that can result in better clustering behavior.
- So to do this, since I can't learn the K means class or do K means clustering fit learns, the cluster centers predict,
- will map a data point to a cluster no super give it predict with some data and it will give you numbers, cluster numbers, cluster centers.
- If you look if you go into and get the cluster centers attribute out of the escolar an object, that's the center centroid.
- If your clusters of other clustering algorithms and Saikat learn of a similar interface.
- Now we've got these clusters. How do we see how well they work?
- Well, look at them like the purpose here is we want to uncover data, uncover connections and groupings in the data.
- But we don't have labels, so one thing you really have to do with clustering is just look at it.
- Do the clusters seem to be finding coherent? Do they seem to be finding coherent sets of things that we're clustering?
- If you do have labels, sometimes we will have labels and we can like we have labels for a little bit of data.
- We can use it to compare clustering behaviors, clustering systems also, or cluster clustering results.
- Also, it can be useful when we're experimenting with clustering techniques to cluster data
- where we do have the labels to see how good a job it's doing at recovering labels.
- And we do have them to get some idea of how it might do. And we don't have them.
- And then there are some quality scores.
- There's a score called silhouette that compares the distances within a cluster to the distances between items and items and the closest other cluster.
- And if things tend to be closer to each other than they are to other clusters, then you've got a better clustering.
- These can be used to compare clustering, but there's not an absolute quality value like,
- oh, a silhouette of point five means they've got a good clustering. No clustering is a really, really evaluating.
- Clustering is a really imprecise thing. But the basic it basically is the clustering useful for what you're trying to do with it.
- So to wrap up clustering allows us to identify groups of items in the data. These clusters may or may not make sense.
- You have to look at them really. Cluster quality depends on a number of things.
- Your features and your metric are super important because if you don't have a feature space and a
- metric such that things that are similar to each other are close together in the on your metric,
- then clustering is not going to be able to find the relationships you're looking for. Also, the cluster counters superimportant.
- If there are eight natural groups and you try to find five clusters, clusters might not work so well.
- Now the natural gropings and the cluster count do not necessarily need a map.
- Sometimes you can get good cluster rings with an extra cluster or not having quite as many clusters.
Resources#
π Clustering Example#
The clustering example notebook shows how to use the KMeans
class.
π₯ Vector Spaces#
This video talks about vector spaces and transforms.
- This video, I want to talk a little bit more about vector spaces, we've talked about them a little,
- but we're going to talk about the concept in a little more detail now.
- Want to formally introduce more formally introduced the concept of vector space and understand the idea of a vector space transformation.
- We're only going to scratch the surface for a lot more. I recommend that you read a good linear algebra book.
- So remember, vector is a vector is there's a sequence of numbers, an array, basically a one dimensional array.
- So X X one X two X and that's the vector real's to the end is an end dimensional vector space.
- In the real numbers, you vector spaces over other things to integers over.
- Complex numbers over weird things, but it's an n dimensional vector space and we can do a few operations, the vector.
- We can add, subtract, we can multiply by a scalar. That's a real number, we can compute the inner product.
- You cannot multiply multiplying two vectors, just I'm going to multiply defector's together is not an operation.
- If you both apply to vectors and know what you're actually getting is the pairwise
- multiplication like it multiplies the elements together if they're compatible sizes.
- That is not actually a linear algebra operation. There is the inner product, which is the sum of the element Y's products.
- And then there's a distance, which is the inner product of a subtraction with itself. And so we can have a matrix.
- So we've got a sample exit rows and instance, each instance is a row vector.
- As a row of this matrix X, we can do all the vector things with these rows.
- Matrix, a matrix is, as we said in an earlier video. Is this this two dimensional array?
- Of numbers. It's a collection of row vectors and it's a collection of Cullom vectors.
- It's also a linear map from one vector space to another. And the other one might be the same the the the same vector space in terms like it
- might be n by n subtle map are an N dimensional vector to another N dimensional vector.
- But it's had some transformations applied to it. There's a bunch of things we can do with matrices.
- We can add them, multiply them by either a scalar or by a compatible matrix or vector.
- We saw that earlier. You can transpose them, et cetera. There's a number of special matrices.
- So we have column vectors which are m by one, remember, we always have rows first.
- So M rose by one column is an M by one column vector.
- One row by N columns is an it is a one by N row vector.
- We can have a square matrix where the two dimensions are the same.
- We can have a diagonal matrix that only had that zero everywhere except the diagonal.
- So you've got your big matrix. It's got. The diagonal is non-zero.
- All of this is zero. You can have an identity matrix, which is a diagonal matrix where all the non-zero values are one.
- You can have a triangular matrix where either the upper right. Or the lower left corner is non-zero and the other side is zero.
- So it's the everything above and to the right of the diagonal is zero for a lower triangular matrix.
- Everything down the left of the diagonal is zero for an upper triangular matrix.
- Also a symmetric matrix, which is a square matrix where it's equal to its transpose.
- So the top right corner is equal to the top bottom left corner.
- You flip it. You flip the rows in the column and you get the same Matrix backout. You can also have what's called an orthogonal matrix,
- where if a transpose times A is equal to the identity matrix, then you have an orthogonal matrix.
- Matrix vector multiplication is super useful operation.
- So if we've got an M byan matrix and we've gotten N dimensional column vector, then we can compute Y equals X and we can.
- And this is going to be an M dimensional column vector.
- And what we've done here is we have mapped X into another vector space or we've transformed it.
- And even if even if the even if a a square. So it's from R and R n.
- What we're doing we can apply this can transform the vector so that it's in it's in the same space.
- But it is its relationships to other vectors have changed.
- And it's effectively a. In the different organization of the same space for lack of a better term.
- I'm trying to avoid getting deep into the linear algebra terms like so like change of basis and things.
- Because I'm trying to give you the intuition for it.
- And linear algebra class is gonna really eat either a class or a textbook or an online course is going
- to help you shore up a lot of the details that you're going to need to dove deeper into linear algebra.
- So multiplying by a matrix can give us a bunch of different transform.
- We can reduce dimensionality. We can basically project, project and do other transformations.
- So a projection is when you just strip off vectors. So if we have X.
- So if we've got one seven. The projection of this under the first dimension is just one.
- But you can also do some additional transformations at your besides just projection to get it down to a lower dimensional space.
- One of the things you can do is translate. So if we've got here. We can translate it.
- We've got Vector's. And we just shift them, same relationship to each other.
- They're just moved. We can scale them so that they're going to this vector is going to.
- Get a vector here. We can scale it.
- You can skew the space. You can also rotate within the space and any combination of these, you can do any linear transformation.
- And actually this is what it means for something to be a linear transformation of linear
- transformation is the transformation you can express through a matrix multiplication.
- But also, we have linear systems, so linear systems are written as matrix vector operations.
- We can solve this for Beda. So why is X better than Beda is equal is X inverse times Y.
- If we want ordinarily, this is this is the direct exact solution to the linear equations.
- But if they don't have a solution, we can get the least squared solution by solving a different system.
- Multiply. X transpo solve, X transpose, Y equal to X transpose, x beda.
- I missed it x there, I just wrote it in.
- Now one particular note though is so I wrote a matrix inverse here matrix and versus an operation but usually don't actually want to perform it.
- Matrix and Versus are almost always used for solving a system.
- Linear equation solving the system is usually a better solution than actually inverting a matrix.
- So wrap up vectors represent data points in a vector space. These can be manipulated and transform, particularly by multiplying them by a matrix.
- I recommend that you can salties Lynge some linear algebra learning resources to learn a lot more.
Resources#
Linear Algebra Done Right by Sheldon Axler
Handbook of Linear Algebra (terse and comprehensive reference)
π₯ Information and Entropy#
This video introduces the idea of entropy as a way to quantify information. Itβs something I want to make sure youβve seen at least once by the end of the class.
Resources#
An Introduction to Information Theory: Symbols, Signals & Noise by John R. Pierce
Entropy (information theory) on Wikipedia
π© Week 13 Quiz#
Take the Week 13 quiz on Canvas.
π Practice: SVD on Paper Abstracts#
The Week 13 Exercise notebook demonstrates latent semantic analysis on paper abstracts and has an exercise to classify text into new or old papers.
It requires the chi-papers.csv
file, which is derived from the HCI Bibliography.
It is the abstracts from papers published at the CHI conference (the primary conference for human-computer interaction) over a period of nearly 40 years.
If you want to see how to create this file, see the Fetch CHI Papers example.
π© Assignment 6#
Assignment 6 is due November 20.