0:00

The term human-level performance is sometimes used

Â casually in research articles.

Â But let me show you how we can define it a bit more precisely.

Â And in particular, use the definition of the phrase, human-level performance,

Â that is most useful for helping you drive progress in your machine learning project.

Â 0:19

So remember from our last video that one of the uses of this phrase,

Â human-level error, is that it gives us a way of estimating Bayes error.

Â What is the best possible error any function could,

Â either now or in the future, ever, ever achieve?

Â So bearing that in mind, let's look at a medical image classification example.

Â Let's say that you want to look at a radiology image like this,

Â and make a diagnosis classification decision.

Â 0:49

And suppose that a typical human, untrained human,

Â achieves 3% error on this task.

Â A typical doctor, maybe a typical radiologist doctor, achieves 1% error.

Â An experienced doctor does even better, 0.7% error.

Â And a team of experienced doctors, that is if you get a team of experienced doctors

Â and have them all look at the image and discuss and debate the image,

Â together their consensus opinion achieves 0.5% error.

Â So the question I want to pose to you is, how should you define human-level error?

Â Is human-level error 3%, 1%, 0.7% or 0.5%?

Â Feel free to pause this video to think about it if you wish.

Â And to answer that question, I would urge you to bear in mind that one of the most

Â useful ways to think of human error is as a proxy or an estimate for Bayes error.

Â So please feel free to pause this video to think about it for a while if you wish.

Â But here's how I would define human-level error.

Â Which is if you want a proxy or an estimate for Bayes error,

Â then given that a team of experienced doctors discussing and

Â debating can achieve 0.5% error,

Â we know that Bayes error is less than equal to 0.5%.

Â So because some system, team of these doctors can achieve 0.5% error,

Â so by definition, this directly, optimal error has got to be 0.5% or lower.

Â We don't know how much better it is, maybe there's a even larger team

Â of even more experienced doctors who could do even better,

Â so maybe it's even a little bit better than 0.5%.

Â But we know the optimal error cannot be higher than 0.5%.

Â So what I would do in this setting is use 0.5% as our estimate for Bayes error.

Â So I would define human-level performance as 0.5%.

Â At least if you're hoping to use human-level error in the analysis of bias

Â and variance as we saw in the last video.

Â 2:56

Now, for the purpose of publishing a research paper or for

Â the purpose of deploying a system, maybe there's a different definition of

Â human-level error that you can use which is so

Â long as you surpass the performance of a typical doctor.

Â That seems like maybe a very useful result if accomplished, and

Â maybe surpassing a single radiologist, a single doctor's performance

Â might mean the system is good enough to deploy in some context.

Â 3:22

So maybe the takeaway from this is to be clear about what your purpose is

Â in defining the term human-level error.

Â And if it is to show that you can surpass a single human and therefore argue for

Â deploying your system in some context, maybe this is the appropriate definition.

Â But if your goal is the proxy for Bayes error,

Â then this is the appropriate definition.

Â To see why this matters, let's look at an error analysis example.

Â 3:51

Let's say, for a medical imaging diagnosis example,

Â that your training error is 5% and your dev error is 6%.

Â And in the example from the previous slide, our human-level performance,

Â and I'm going to think of this as proxy for Bayes error.

Â 4:12

Depending on whether you defined it as a typical doctor's performance or

Â experienced doctor or team of doctors, you would have either 1% or

Â 0.7% or 0.5% for this.

Â And remember also our definitions from the previous video,

Â that this gap between Bayes error or estimate of Bayes error and

Â training error is calling that a measure of the avoidable bias.

Â And this as a measure or an estimate of how much of a variance problem you have in

Â your learning algorithm.

Â 4:44

So in this first example, whichever of these choices you make,

Â the measure of avoidable bias will be something like 4%.

Â It will be somewhere between I guess, 4%,

Â if you take that to 4.5%, if you use 0.5%, whereas this is 1%.

Â 5:06

So in this example, I would say,

Â it doesn't really matter which of the definitions of human-level error you use,

Â whether you use the typical doctor's error or

Â the single experienced doctor's error or the team of experienced doctor's error.

Â Whether this is 4% or 4.5%, this is clearly bigger than the variance problem.

Â And so in this case,

Â you should focus on bias reduction techniques such as train a bigger network.

Â Now let's look at a second example.

Â Let's see your training error is 1% and your dev error is 5%.

Â Then again it doesn't really matter, seems but

Â academic whether the human-level performance is 1% or 0.7% or 0.5%.

Â Because whichever of these definitions you use, your measure of avoidable bias

Â will be, I guess somewhere between 0% if you use that, to 0.5%, right?

Â That's the gap between the human-level performance and your training error,

Â whereas this gap is 4%.

Â So this 4% is going to be much bigger than the avoidable bias either way.

Â And so they'll just suggest you should focus on variance reduction techniques

Â such as regularization or getting a bigger training set.

Â But where it really matters will be if your training error is 0.7%.

Â So you're doing really well now, and your dev error is 0.8%.

Â In this case, it really matters that you use your estimate for Bayes error as 0.5%.

Â 6:48

And so this suggests that maybe both the bias and variance are both problems but

Â maybe the avoidable bias is a bit bigger of a problem.

Â And in this example, 0.5% as we discussed on the previous slide was the best measure

Â of Bayes error, because a team of human doctors could achieve that performance.

Â If you use 0.7 as your proxy for Bayes error, you would have estimated

Â avoidable bias as pretty much 0%, and you might have missed that.

Â You actually should try to do better on your training set.

Â 7:18

So I hope this gives a sense also of why making progress in a machine learning

Â problem gets harder as you achieve or as you approach human-level performance.

Â In this example, once you've approached 0.7% error,

Â unless you're very careful about estimating Bayes error,

Â you might not know how far away you are from Bayes error.

Â And therefore how much you should be trying to reduce aviodable bias.

Â In fact, if all you knew was that a single typical doctor achieves 1% error, and

Â it might be very difficult to know if you should be trying to fit your training set

Â even better.

Â 8:04

Whereas in the two examples on the left, when you are further away human-level

Â performance, it was easier to target your focus on bias or variance.

Â So this is maybe an illustration of why as your pro human-level performance is

Â actually harder to tease out the bias and variance effects.

Â And therefore why progress on your machine learning project just gets harder as

Â you're doing really well.

Â 8:25

So just to summarize what we've talked about.

Â If you're trying to understand bias and variance where

Â you have an estimate of human-level error for a task that humans can do quite well,

Â you can use human-level error as a proxy or as a approximation for Bayes error.

Â 8:47

And so the difference between your estimate of Bayes error tells you how

Â much avoidable bias is a problem, how much avoidable bias there is.

Â And the difference between training error and dev error,

Â that tells you how much variance is a problem, whether your algorithm's able

Â to generalize from the training set to the dev set.

Â And the big difference between our discussion here and

Â what we saw in an earlier course was that instead of comparing training error to 0%,

Â 9:18

And just calling that the estimate of the bias.

Â In contrast, in this video we have a more nuanced analysis in which there is no

Â particular expectation that you should get 0% error.

Â Because sometimes Bayes error is non zero and sometimes it's just not possible for

Â anything to do better than a certain threshold of error.

Â 9:41

And so in the earlier course, we were measuring training error, and

Â seeing how much bigger training error was than zero.

Â And just using that to try to understand how big our bias is.

Â And that turns out to work just fine for problems where Bayes error is nearly 0%,

Â such as recognizing cats.

Â Humans are near perfect for that, so Bayes error is also near perfect for that.

Â So that actually works okay when Bayes error is nearly zero.

Â But for problems where the data is noisy, like speech recognition on very noisy

Â audio where it's just impossible sometimes to hear what was said and

Â to get the correct transcription.

Â For problems like that, having a better estimate for

Â Bayes error can help you better estimate avoidable bias and variance.

Â And therefore make better decisions on whether to focus on bias reduction

Â tactics, or on variance reduction tactics.

Â 10:30

So to recap, having an estimate of human-level performance gives you

Â an estimate of Bayes error.

Â And this allows you to more quickly make decisions as to whether you should focus

Â on trying to reduce a bias or trying to reduce the variance of your algorithm.

Â