1:07

This algorithm is a very easy and fast algorithm to use, thus,

Â it's a very popular algorithm.

Â It's often one of the first algorithms people get exposed to.

Â We've seen a similar algorithm with the K-nearest neighbors,

Â where we calculated neighboring distances between K points.

Â Here, however, we have K-clusters.

Â And the algorithm is done in a unsupervised manner in that we don't know

Â whether a data point belongs to a cluster ahead of time.

Â We simply have their features and the values for those features for

Â all the different instances in the data set.

Â 2:09

The first reading is going to talk about the K-means clustering algorithm and

Â how you can use it via a blog article.

Â There's also an interactive demo for the K-means algorithm.

Â And I really like this demo a lot so I'm going to walk you through that.

Â And then, lastly, there is the notebook.

Â So first, the blog article that talks about K-means clustering.

Â This is a very low level article that introduces the idea of K-means clustering.

Â So it should give you a pretty good feel for how things are working.

Â One point of emphasis that I should make is K-means

Â is called the means because we find the cluster center,

Â or we define the cluster center by finding the mean of the data points.

Â You could also do a median of the data points,

Â in which case the algorithm is called K-medians.

Â There's other statistical quantities you might use,

Â it just changes the name of the algorithm slightly.

Â It doesn't change the way the algorithm works.

Â That's just the statistic that's used to determine the cluster centers.

Â So this article's nice, it talks about this.

Â It talks a little bit about how you can choose K to try to get the best

Â results for your algorithm or for your data set.

Â I wanted to show you this, though.

Â Because this is a really nice article that actually gives you

Â a way of picking a different data set.

Â So here you can see that here's a data set.

Â And we can just start adding cluster centers.

Â And we can see how the algorithm starts clustering things, so let's try one more.

Â And now, we just click GO and Update Centroids.

Â And you can see how it just moves the centroids around.

Â And keep clicking on this until there's no more movement.

Â And you could notice that right now there's only a little bit of movement here

Â between the red and black cells there.

Â As I keep moving around, there's really not much movement, and

Â we're pretty much done.

Â This is the end of the algorithm, right?

Â So you could see that, well, maybe these really were just one cluster here.

Â Maybe I added too many centroids.

Â But you can go in and try this with different data sets,

Â different number of centroids, and really see visually see what's happening.

Â So let me go ahead and do that.

Â Let's choose it randomly, here we go, we got 3 clear cells.

Â So let's put those 3 together, right.

Â Now, as I go GO, you can see what happens.

Â Very first thing is that it assigns points to the nearest cluster center.

Â So these points over here get split to green and blue.

Â Now, as I click Update Centroids, you notice that, very quickly,

Â that green center moves over.

Â So literally within the second running of this algorithm is iteration.

Â We've already got our 3 centers on the data that they should be assigned to.

Â Now I reassign points, notice the points immediately go back.

Â Update my centroids, right, we're almost done, we could keep doing this.

Â And, look, now, I'm getting no change in the cluster centers.

Â So, very quickly, within,

Â basically, 3 iterations, this algorithm found the 3 clusters.

Â Now, this was easy, because I could see there were 3 clusters at the beginning.

Â One way we might've done this in practice is try 2 clusters and

Â see how well it does, try 3, try 4, try 5.

Â And we could try to figure out some sort of metric for

Â indicating whether that number of clusters was the right value or not.

Â 5:17

The last item here is the K-means spatial clustering notebook.

Â We're going to just take a little about the Iris data and

Â use the Iris data with the K-means algorithm.

Â I'm going to show you the Elbow method, which is a way to determine K empirically.

Â And then we're going to use the handwritten digit data to understand how

Â the K-means algorithm performs on a larger data set.

Â It's a higher dimensional data set.

Â So we'll see that performance.

Â We'll see how it does with dimension reduction.

Â We'll also look at it with manifold learning.

Â So first step here when we're applying our K-means algorithm, there's 2 steps.

Â The assignment step, we assign each data to a cluster.

Â And then there's the update step, where you compute the new cluster centers,

Â and then we iterate back.

Â Now you saw this in the previous website where we

Â computed our initial cluster centers.

Â We assigned the points to the clusters and we updated, etc.

Â 6:15

Then we apply this and we're going to see here's our 3 clusters.

Â I said K was 3 earlier.

Â So now we have 3 clusters, and then we can visualize these clusters.

Â So when I scroll down, you can see there's the first cluster center,

Â there's the second and there's the third.

Â This is a very useful technique to do.

Â But let's try it with the Elbow method,

Â which will allow us to basically compute for

Â a different number of clusters what the score, total distance in this case, is.

Â So basically what you do is you say, well, if there's only one cluster,

Â all points, if I sum up the total distances to the nearest cluster center,

Â it's going to be really big.

Â As the number of clusters increases,

Â then each data point has a cluster center closer to it.

Â And so the total distance will be smaller.

Â And the idea of the Elbow method is the optimal value is somewhere here in this

Â curve break point.

Â It's where you change from the very steep part, where the distances are changing

Â very rapidly, to the more shallow part of the curve.

Â So here we would look at this curve, and

Â we would go this looks like 3 is really the elbow.

Â And the idea here is it should look like somebody's arm.

Â So the arm is hanging down, and then at the elbow it bends.

Â That's the idea of the elbow method.

Â 7:21

Now we can look at it with the digit data, remember, the handwritten digits.

Â As I look across the screen, you can see 0 through 9.

Â We can cluster these images, and when we look at this, first of all,

Â this is a confusion matrix showing how where we're doing.

Â You look at this, you go this is horrible.

Â We have 0 is matched to 5, not 0, etc.

Â Well, keep in mind, this is an unsupervised technique.

Â Thus, when we say, when something comes in as cluster 0, or

Â the handwritten digit 0, that doesn't mean it actually is labelled 0.

Â Instead it's just a group of points.

Â And so when we come in and find them, it actually turns out we did quite well.

Â Almost all of them were marked together.

Â That's the key thing, how well are they marked together.

Â And as we look through this, most of the data points are marked quite nicely.

Â Just looking at this confusion matrix,

Â we see that 6 and 2 seem to be not done quite right.

Â So, as I scroll down, we look, and

Â we can see, yup, sure enough, there is the cluster 2, it's a little ambiguous.

Â And cluster 6, cluster 6 is very ambiguous.

Â It looks just like a vertical line.

Â But some of these others, they look pretty clear, right?

Â We have a 4, we have a 0, we have a 9, we have a 6, we have a 7.

Â We have a 2, we have a 3 and a 5.

Â And even this one kind of looks like a 1.

Â And if we scroll up here and

Â we see it, you can see that that's sort of spread out, and yeah, it sure is.

Â It actually is a 1, so that's doing a pretty good job of pulling that out.

Â But, in general, this technique does quite well.

Â 8:41

As I scroll down, you can sort of see other examples of what we're trying to do.

Â We can transform the date in.

Â Now we can get the actual typical image values for each cluster.

Â So if you were to think about each cluster center being an actual

Â image in this space, this is what the image would correspond to.

Â And, again, see there's cluster 2 and cluster 6, they're a little ambiguous.

Â We can also say what's the score?

Â Is there a way, a metric, of determining how well we're doing?

Â And there is a number of them, and I show some of them here.

Â We can see better what they do down below.

Â We actually can compute these scores and show them.

Â Typically, a value of 1 for most of these metrics is best.

Â And so, you can see, they generally do quite well.

Â And this, again, is for the digit data.

Â And then we could say, well, that was a pretty big data set,

Â dimensionality for our data set.

Â What if we cut the number of dimensions down?

Â What if we cut it to, say, 20?

Â We start with 64, if we cut to 20, how well do we perform?

Â And you can see that here's a confusion matrix, here's the actual results.

Â And here's one of our scores, one of our performance metrics for finding clusters.

Â And you can see it's still quite high.

Â So it turns out that your cluster finding still does rather well for

Â the reduced dimensionality of the data set,

Â even though you've sort of thrown some of the information away.

Â Enough is still there to be able to do realistic clustering.

Â So this is an important point.

Â Sometimes you'll be faced with a very large dimensional data set.

Â You can perform dimensional reduction or some sort of feature selection.

Â Reduce the large amount of features down to a more manageable set, and

Â still get very good accuracy results.

Â If you think about it, that's kind of what manifold learning does,

Â in that we start with this really high-dimensional space.

Â And we find groups of points that are nearby each other in this

Â high-dimensional space.

Â And we map them down to this manifold in two-dimensional, three-dimensional space.

Â And try and keep those points close together.

Â So you could ask, well,

Â what happens when we apply manifold learning to K-nearest neighbors?

Â So first thing we do is we compute our manifold, since we can generate.

Â Here's our t-SNE digit data, you can see that there it is.

Â And then we can actually find clusters in this same data set.

Â And then say, well, how well did they map down?

Â And again, you could see that generally these clusters that were marked

Â outside here are done quite well.

Â And only really in the middle do we start getting that mixing.

Â It's the same idea we saw before with some of those clusters

Â were a little bit confused in terms of what they actually were.

Â