0:00

If you run a the learning algorithm and it doesn't do as well as you were hoping,

Â almost all the time it will because you have either a high bias problem or a high

Â varience problem. In other words, either an under fitting problem or a over

Â fitting problem. And in this case it's very important to figure out which of

Â these two problems is bias or variance or a bit of both, that you actually have,

Â because knowing which of these, two things is happening will give a very

Â strong indicator for whether the use for, in promising ways to try to improve your

Â algorithm. In this video, I'd like to delve more

Â deeply into this bias in various issue and understand them better as well as

Â figure out how to look at a learning algorithm and evaluate or diagnosis

Â whether we might have a bias problem or a variance problem, since this would be

Â critical for figuring out how to improve the performance of a learning algorithm

Â that you may implement. So you've already seen this figure a few

Â times where if you fit two simple hypothesis that go straight line that

Â underfits the data. If you fit a two complex hypothesis, then

Â that might fit the training set perfectly but overfit the data and that is maybe

Â hypothesis of some intermediate level of complexities of some maybe degree two

Â polynomials, a not too low and not too high degree that's just right and gives

Â you the best generalization error of all of these options.

Â Now that we're armed with the notion of train, training and validation in test

Â sense, we can understand the concepts of bayes inference a little bit better.

Â Concretely, let's let our training error and cross validation error be defined as

Â in the previous videos. Just say the squared error, the average

Â squared error as measured in the training sets or as measured on the cross

Â validation set. Now lets plot the following figure on the

Â horizontal access I'm going to plot to the degree of polynomials.

Â So as I go to the right I'm going to be fitting higher and higher order of

Â polynomials. so way on the left on this figure where

Â maybe D equals one we're going to be fitting very simple functions whereas way

Â here on the right of the horizontal access have much larger values of D so I

Â have a much higher degree of polynomial and so here that's going to correspond to

Â fitting. Much more complex functions to your

Â training set. Let's look at the training error and the

Â cross validation error and plot them on this figure.

Â Let's start with the training error. As we increase the degree of the

Â polynomial we're going to be able to fit high training set better and better.

Â And so, if if D equals 1, we can have a relatively high trading error if we have

Â a very high degree polynomial our trading error is going to be really low, maybe

Â even zero because we'll fit the training set really well.

Â And so as we increase the degree of polynomial we find typically that the

Â training error decreases so I'm going to write J.

Â subscript train of data there, because our training tends to decrease with the

Â degree of the polynomial that we fit to the data.

Â Next, let's look at the cross-validation error or for that matter, if we look at

Â the test set error, we'll get a pretty similar result as if we were to plot the

Â cross validation error. So we know that if D1 equals 1, we're

Â fitting a very simple function and so we may be under fitting the training set and

Â so going to have a very high cross validation error.

Â If we fit, you know an intermediate degree polynomial, we had D2 equals 2 in

Â our example in the previous slide, we're going to have a much lower cross

Â validation error because we're just fitting, finding a much better fit to the

Â data. And conversely, if D were too high.

Â So D took on say a value of 4, then working over fitting and so we end it

Â with a high value for cause validation error.

Â So if you were to bury this smoothly and plot a curve, you might end up with a

Â curve like that. Where F, JCV of theta.

Â And again, if you plot J test of theta, you get something very similar.

Â And so this sort of plot also helps us to better understand the notions of bias and

Â variance. Our theory, suppose you've applied a

Â learning algorithm and it's not forming as well as you are hoping so, so if your

Â cross validation set error or your test set error is high, how can we figure out

Â if the learning algorithm is suffering from high bias or if it's suffering from

Â high variance? So the setting of the cross-validation

Â error being high corresponds to either this regime or this regime.

Â So this regime on the left here corresponds to a high bias problem that

Â is if you are fitting a overly low order polynomial such as a D equals one when we

Â really needed a higher order polynomial to fit data whereas in contrast this

Â regime corresponds to a high variance problem that is a deed of degree of

Â polynomial was too large for the data center we have and this figure gives us a

Â clue for how to distinguish for these two cases.

Â Concretely. For the high bias case.

Â That is the case of under-fitting. What we find is that both the cross

Â validation error and the trading error are going to be high, so if your

Â algorithm is suffering from a bias problem.

Â The training set error will be high. And you might find that the cross

Â validation error will also be high. It might be close, maybe just slightly

Â higher than a trading error. And so if you see this combination,

Â that's a sign that your algorithm may be suffering from high buyers.

Â In contrast, if your algorithm is suffering from high variance, then if you

Â look here, we'll notice that j trade, that is the trading error, is going to be

Â low. That is the fitting the training set very

Â well. Whereas, your cross-validation error.

Â assuming that this is say the square of error, you should try to minimize this.

Â Whereas the contrast your error on the cause validation set or your cost

Â function cause validation set will be much bigger.

Â The new training center. So there's a double greater than sign.

Â That's the math symbol for much greater than denoted by two greater than signs.

Â And so if you see this combination of values, then that might give you, that's

Â a clue that your learning algorithm may be suffering from high variance and might

Â be over fitting And the key that distinguishes these two cases is if you

Â have a high bias problem your training center will also be high.

Â Your hypothesis is just not fitting the training set well.

Â And if you have a high variance problem. Your trading center would usually be low.

Â That is much lower than your cost validation error.

Â So, hopefully though it gives you a somewhat better understanding of the two

Â problems of bias and variance. I still have a lot more to say about bias

Â and variance in these few videos. But, what we'll see later, is that by

Â diagnosing whether a learning algorithm may be suffering from high bias or high

Â variance, I'll show you, even more details on how

Â to do that in later videos. We'll see that by figuring out, whether a

Â learning algorithm may be suffering high bias or high variance or a combination of

Â both, that that would give us much better guidance of what one might be promising

Â things to try in order to improve the performance of a learning algorithm.

Â