0:14

This number to be any positive number and follows certain distribution.

Â So, we'll soon be talking about the expectation of error relative to this, the

Â distribution of this x. But everybody got some error term, which could depends on

Â the actual weight x and also differs from one person to another.

Â We call this the estimate y sub I for the i-th participant in the experiment as a

Â function of x So this is an additive error model.

Â 0:51

Of the estimate provided by each person, and let's say there are n of these people.

Â So, i goes from one, to da, da, da, to n say 787.

Â Now we're going to assume that these error terms are both unbiased and independent.

Â Unbiased means that the expectation of this error term for each user I as a

Â function of x, Taken over the distribution of. x is zero.

Â So, sometimes you may overestimate, sometimes you underestimate but, you would

Â not systematically over or underestimate this goes for all users on.

Â 1:35

That's unbiased independent means that this error term only depends on i, it does

Â not depend on any other user j. In reality neither unbiased nor

Â independent assumption is true in general. We'll often see biased, systematic bias,

Â certain people tend to overestimate while others tend to underestimate and sometimes

Â these errors also dependent. In fact, in lecture seven we'll look at

Â example of information cascade, where. The dependents of estimates destroys the

Â wisdom of Kraus. Now, for Amazon review, actually.

Â Is it unbiased or independent? Essentially not unbiased.

Â [inaudible]. Although you may be able to use user ID

Â and history of her rating to do some normalization.

Â And it's sort of independent. Most people enter review based on her own

Â opinion. So even if she can see the existing rating

Â and reviews, she may not be reacting in response to them.

Â But sometimes they do. Sometimes a review says that, the previous

Â reviews are certainly biased. I'm here to correct that.

Â So there is also some element of dependence on Amazon.

Â Alright, so now we are going to look at the so called wisdom of crowds.

Â Exemplified by Garden's experiment by comparing two terms, one is the average of

Â errors made by each individual participant, the other is the error of the

Â averaged estimate, and the hope is that the error of average estimate is much

Â smaller than compared to the average of the error.

Â Let's first look at average of error this is easy, the error term Claire is epsilon.

Â I of participant i and we're going to use our two norm or mean squared error.

Â Just like last lecture for Netflix, except over there we use the Ruben square.

Â Sometimes we take the square root but the idea is the same.

Â I'm going to use the squared error as the metric to quantify error terms.

Â So, we'll look at this thing squared and then we'll look at the expectation of this

Â term over the distribution of x and then we'll sum, sum over all the participants

Â from one to n. Take the average by dividing over n, and

Â this is what we call the, error term for the average error.

Â 4:19

Okay, or averaged square to be more precise.

Â So just remember this expression. This is the averaged squared error.

Â On the other hand, we also want to look at the error of the average.

Â First of all, what is the average? Well, average is the sum of Yis divided by

Â n. And the error of the average is that minus x,

Â Okay? This could be positive, it could be

Â negative but this is the same as one over n times sum of Yi minus n times x,

Â obviously. So we can also write this as the sum of

Â one over n times summation of Yi - x by bringing x inside the summation here cuz

Â there's effective n, we're okay. And this is just one times the summation

Â of errors. Cuz each term inside the summation of y - x is just epsilon i by

Â definition. So now, this error of the average is what

Â we want to look at. And we're going to look at this error of

Â average as the expectation over distribution of x, of one over n summation

Â epsilon i. This whole thing squared which equals one

Â over n squared can take that out of the expectation times the expectation of the

Â summation epsilon Ix. Summing over i. This whole term squared.

Â 6:05

Now you say, look this expression isn't it the same as the last expression.

Â Looks like saying there's and summation expectation epsilon squared but not quite.

Â This is taking the sum of the expectation of squared epsilon.

Â Whereas this one is the expectation of this square of the sum.

Â So, this one is talking about square of summation, not the summation of squares,

Â and the two are clearly different. For example, this two user case.

Â Epsilon plus epsilon two squared, which is what this is about, is epsilon squared

Â plus epsilon two squared plus two times epsilon one, epsilon two.

Â This, is what, the averaged error is about.

Â This is the error, of the average. The difference is the collection of these

Â cross terms. But, if they are independent,

Â Then we say that epsilon times epsilon two, You take the expectation over

Â distribution x on this expression is going to be zero, because these two error terms

Â are not correlated with each other. So, these cross terms are cancelled.

Â And in that case, sum of square, and square of sums are the same.

Â 7:30

And therefore, this expression is the same as this expression.

Â So, the only difference is the multiplication factor in front of it.

Â In this case its one over n. In this case its one over n squared and therefore we

Â have the desired relationship, that is the error term of the averaged, the average

Â first. Then you look at the error, is actually only one over n of the error

Â term, if you look at the averaged error. So compute error first of the individual

Â ones, and then take the average versus average first and then look at the error

Â term. These two are different bi effector,

Â multiplicative effector of n. And this multiplicative effect of N is

Â codified as an example of wisdom of crowds in Galton's experiment.

Â This as simple averaging can work if this unbiased and independent estimates.

Â There's no bias terms, and these cross terms cancel each other under the

Â expectation of distribution backs. And we have effected n enhancement.

Â If there are five people, is five times better.

Â If there are 1000 people, it's 1000 times better in terms of the mean square error.

Â 9:21

If they're completely dependent. If they're somewhat dependent, then you

Â would have to look at the expectation of these error terms correlation in more

Â detail. But you'll be somewhere between the vector

Â one versus a vector of n or one over n, depending on how you look at it.

Â This sounds very promising encouraging with a simple duralation, which seems to

Â be able to identify the root cause of the wisdom of crowds.

Â Well, and let's look at the positive side before we look at some cautionary remarks.

Â The positive side says that as long as they're dependent of each other, then you

Â have the wisdom of crowds. This particular type of wisdom has nothing

Â to do with identifying who's the expert. It just says that you might be wrong in

Â all kinds of directions. But as long as you are wrong in different

Â directions, then we will have the power of this fact of n enhancement of their

Â return. In hindsight it's hardly a surprise, this

Â is really the law of large number, plus the convexity of convex quadratic function

Â at play here. Now, you may object first.

Â What if I actually know who are the experts here, maybe she's a farmer.

Â 10:45

Well in the advanced material, we'll look at how we can use boosting methods to

Â extract the more important opinion by the experts.

Â There will be a very different philosophy. This philosophy in what we just talked

Â about, only depends on the fact that you can all be wrong as long as you're wrong

Â in different ways, independent of each other.

Â Now second objection is what about the scale?

Â This factor of n effect holds even when n is two or when n is 1,000,000.

Â To most people the wisdom of crowds means that there should be some threshold value

Â and star above which you see the effect below, which you don't.

Â But this one applies just as well to a two people case, so that's some mystery that

Â remains to be needs to be resolved. The third objection or to say the addition

Â to this discussion is that this vector n is only one view on [inaudible] crowds we

Â call the, multiplexing game. There's another real covert diverse

Â thinking that says if there's some bad events that you didn't want happen, then,

Â by putting n of these entities together you get a one minus, one minus P bracket

Â to the power n whereas n is the size of the crowd.

Â In fact, this is called the diversity gain.

Â This is called the multiplexing gain. We will encounter this diversity gain in

Â later chapters. In fact, we encounter both kinds of gains

Â in [INAUDIBLE] in both technology and the social networks.

Â Between wireless networks, on the WiFi LTE network, all the way to Galton's example.

Â But this is our first encounter on one side of this coin.

Â