0:00

I've noticed that almost all the really good machine learning

Â practitioners tend to be very sophisticated in understanding of Bias and Variance.

Â Bias and Variance is one of those concepts that's easily learned but difficult to master.

Â Even if you think you've seen the basic concepts of Bias and Variance,

Â there's often more new ones to it than you'd expect.

Â In the Deep Learning Error,

Â another trend is that there's been

Â less discussion of what's called the bias-variance trade-off.

Â You might have heard this thing called the bias-variance trade-off.

Â But in Deep Learning Error there's less of a trade-off,

Â so we'd still still solve the bias,

Â we still solve the variance,

Â but we just talk less about the bias-variance trade-off.

Â Let's see what this means.

Â Let's see the data set that looks like this.

Â If you fit a straight line to the data,

Â maybe get a logistic regression fit to that.

Â This is not a very good fit to the data.

Â And so this is class of a high bias,

Â what we say that this is underfitting the data.

Â On the opposite end,

Â if you fit an incredibly complex classifier,

Â maybe deep neural network,

Â or neural network with all the hidden units,

Â maybe you can fit the data perfectly,

Â but that doesn't look like a great fit either.

Â So there's a classifier of high variance and this is overfitting the data.

Â And there might be some classifier in between,

Â with a medium level of complexity,

Â that maybe fits it correctly like that.

Â That looks like a much more reasonable fit to the data,

Â so we call that just right. It's somewhere in between.

Â So in a 2D example like this,

Â with just two features,

Â X-1 and X-2, you can plot the data and visualize bias and variance.

Â In high dimensional problems,

Â you can't plot the data and visualize division boundary.

Â Instead, there are couple of different metrics,

Â that we'll look at, to try to understand bias and variance.

Â So continuing our example of cat picture classification,

Â where that's a positive example and that's a negative example,

Â the two key numbers to look at to understand bias and variance will be

Â the train set error and the dev set or the development set error.

Â So for the sake of argument,

Â let's say that you're recognizing cats in pictures,

Â is something that people can do nearly perfectly, right?

Â So let's say, your training set error is 1% and your dev set error is,

Â for the sake of argument,

Â let's say is 11%.

Â So in this example,

Â you're doing very well on the training set,

Â but you're doing relatively poorly on the development set.

Â So this looks like you might have overfit the training set,

Â that somehow you're not generalizing well,

Â to this whole cross-validation set in the development set.

Â And so if you have an example like this,

Â we would say this has high variance.

Â So by looking at the training set error and the development set error,

Â you would be able to render a diagnosis of your algorithm having high variance.

Â Now, let's say, that you measure your training set and your dev set error,

Â and you get a different result.

Â Let's say, that your training set error is 15%.

Â I'm writing your training set error in the top row,

Â and your dev set error is 16%.

Â In this case, assuming that humans achieve roughly 0% error,

Â that humans can look at these pictures and just tell if it's cat or not,

Â then it looks like the algorithm is not even doing very well on the training set.

Â So if it's not even fitting the training data seam that well,

Â then this is underfitting the data.

Â And so this algorithm has high bias.

Â But in contrast, this actually generalizing at a reasonable level to the dev set,

Â whereas performance in the dev set is only 1% worse than performance in the training set.

Â So this algorithm has a problem of high bias,

Â because it was not even fitting the training set.

Â Well, this is similar to the leftmost plots we had on the previous slide.

Â Now, here's another example.

Â Let's say that you have 15% training set error,

Â so that's pretty high bias,

Â but when you evaluate to the dev set it does even worse,

Â maybe it does 30%.

Â In this case, I would diagnose this algorithm as having high bias,

Â because it's not doing that well on the training set, and high variance.

Â So this has really the worst of both worlds.

Â And one last example,

Â if you have 0.5 training set error,

Â and 1% dev set error,

Â then maybe our users are quite happy,

Â that you have a cat classifier with only 1%,

Â than just we have low bias and low variance.

Â One subtlety, that I'll just briefly mention that

Â we'll leave to a later video to discuss in detail,

Â is that this analysis is predicated on the assumption,

Â that human level performance gets nearly 0% error or,

Â more generally, that the optimal error,

Â sometimes called base error,

Â so the base in optimal error is nearly 0%.

Â I don't want to go into detail on this in this particular video,

Â but it turns out that if the optimal error or the base error were much higher, say,

Â it were 15%, then if you look at this classifier,

Â 15% is actually perfectly reasonable for training set and you

Â wouldn't see it as high bias and also a pretty low variance.

Â So the case of how to analyze bias and variance,

Â when no classifier can do very well, for example,

Â if you have really blurry images,

Â so that even a human or just no system could possibly do very well,

Â then maybe base error is much higher,

Â and then there are some details of how this analysis will change.

Â But leaving aside this subtlety for now,

Â the takeaway is that by looking at

Â your training set error you can get a sense of how well you are fitting,

Â at least the training data,

Â and so that tells you if you have a bias problem.

Â And then looking at how much higher your error goes,

Â when you go from the training set to the dev set,

Â that should give you a sense of how bad is the variance problem,

Â so you'll be doing a good job generalizing from a training set to the dev set,

Â that gives you sense of your variance.

Â All this is under the assumption that the base error is quite

Â small and that your training and your dev sets are drawn from the same distribution.

Â If those assumptions are violated,

Â there's a more sophisticated analysis you could do,

Â which we'll talk about in the later video.

Â Now, on the previous slide,

Â you saw what high bias,

Â high variance looks like,

Â and I guess you have the sense of what it a good class can look like.

Â What does high bias and high variance looks like?

Â This is kind of the worst of both worlds.

Â So you remember, we said that a classifier like this,

Â then your classifier has high bias,

Â because it underfits the data.

Â So this would be a classifier that is mostly linear and therefore underfits the data,

Â we're drawing this is purple.

Â But if somehow your classifier does some weird things,

Â then it is actually overfitting parts of the data as well.

Â So the classifier that I drew in purple,

Â has both high bias and high variance.

Â Where it has high bias, because,

Â by being a mostly linear classifier,

Â is just not fitting.

Â You know, this quadratic line shape that well,

Â but by having too much flexibility in the middle,

Â it somehow gets this example,

Â and this example overfits those two examples as well.

Â So this classifier kind of has high bias because it was mostly linear,

Â but you need maybe a curve function or quadratic function.

Â And it has high variance,

Â because it had too much flexibility to fit those two mislabel,

Â or those live examples in the middle as well.

Â In case this seems contrived, well,

Â this example is a little bit contrived in two dimensions,

Â but with very high dimensional inputs.

Â You actually do get things with

Â high bias in some regions and high variance in some regions,

Â and so it is possible to get classifiers like this

Â in high dimensional inputs that seem less contrived.

Â So to summarize, you've seen how by looking at your algorithm's error on

Â the training set and your algorithm's error on the dev set you can try to diagnose,

Â whether it has problems of high bias or high variance,

Â or maybe both, or maybe neither.

Â And depending on whether your algorithm suffers from bias or variance,

Â it turns out that there are different things you could try.

Â So in the next video, I want to present to you,

Â what I call a basic recipe for Machine Learning,

Â that lets you more systematically try to improve your algorithm,

Â depending on whether it has high bias or high variance issues.

Â So let's go on to the next video.

Â