0:01

So we call our diamond data set from before.

Â Here y was the price and x was the carat.

Â Consider now try to fit a line through this data where we have both

Â an interception of slopes.

Â So a two parameter progression setting.

Â 0:14

And of course, we wouldn't wanna fit a line to the intercept in this case

Â cuz look at should go somewhere around 200 for the y point, so

Â we definitely don't want a line through the intercept.

Â We simply wanna find the best fitting like through the data like that.

Â So we wanna minimize y- beta nought

Â times Jn + Beta 1 times x.

Â Where Jn is the vector of ones so that is just one

Â 1:11

But first before we discuss the two parameter problem,

Â let's talk about what we're doing relative to projections.

Â So imagine in this case why isn't three dimensional Y is however, many points that

Â is dimensional, 20 or something like that, but imagine if y was three dimensional.

Â We only have three data points, what would we be doing?

Â Well our vector wire outcome would be a point in three dimensional space, right?

Â 1:39

And so our surface would be all the points that we want to project on

Â would be all the points that are of this form as beta not and beta one vary.

Â So that's a plane in three dimensions, right?

Â Because it's inherently two dimensional, right?

Â It only two parameters are varying linearly in that space.

Â The space that looks like this, the space gamma which is the collection

Â of Beta nought times J n plus Beta 1 times x for Beta nought, Beta one, in R2.

Â That space is of course two dimensional and it's linear so

Â it looks like plane in three dimensions.

Â So what we're trying to do is,

Â given our outcome y which is right here, we want to project it.

Â Onto this two dimensional plane and find the point, let's call that y-hat.

Â The point that lives in that two dimensional plane

Â that is closest to the observed data y.

Â The specific values of beta that multiply times Jn and x to give us that point

Â y hat, are going to be, we'll call those beta 0 hat and beta 1 hat.

Â So in this lecture, we're gonna talk about how you find beta 0 hat and beta 1 hat.

Â Another way to think about this problem, is to think in terms of the scatter plot,

Â where we have y and x.

Â Here we have a line of the form y=B0+B1x and our least

Â 3:17

and finding the line that minimizes the sum of the squared vertical distances.

Â So that's another way to think about it.

Â So there's two ways to think about it.

Â One is to think about it as minimizing the vertical square distances

Â from the scatterplot and

Â another is to think about it as a projection in n-dimensional space.

Â Of course we can't visualize the n-dimensional projection,

Â we have to think of it,

Â pretend like it's a three-dimensional projection just for the illustration.

Â 3:48

So, consider y minus beta 1 x

Â minus beta naught times Jn squared.

Â Imagine if beta one was fixed.

Â So let's just think of that as a single vector.

Â Okay?

Â Imagine if beta one was fixed, then the minimum of this equation over beta naught

Â is just going to be the average, because this is just regression with a constant.

Â It's going to be the average of this vector here.

Â Okay, so the average of that vector there is just Jn transpose 1 over n,

Â Jn transposed times y minus beta one x.

Â Okay, but because of transitively, this works out to be one over n,

Â Jn transposed times y, which is y bar.

Â 4:38

And minus beta one which is scalar times one over n Jn transposed times x,

Â which is x bar.

Â So, we know that the minimizer has to go through the point y bar minus beta one

Â x bar, so that's our intercept,

Â beta nought hat as it depends on beta one has to be equal to that.

Â So let's plug that back into our equation.

Â So Y minus Beta one X, minus Beta one so

Â we are going to plug in Y bar minus Beta one X bar

Â times Jn, okay, squared.

Â So we know that this quantity right here, asterisk.

Â That asterisk, has to be greater than or equal to that thing.

Â 5:26

Okay, so let's just do some reorganization.

Â Okay, so this works out to be y minus y bar times J N

Â minus, so that takes care of that term.

Â And the y bar times Jn and

Â then we need a minus Beta 1X minus

Â X bar, Forgot my Jn.

Â 5:57

X bar times Jn.

Â Quantity squared, okay?

Â Now before, whenever we centered our random variables,

Â we were calling them y tilde, or x tilde.

Â So notice this is just the centered version of y, and

Â this is just the centered version of x.

Â So this is now = y tilde- beta 1 X tilda squared

Â where y tilda is simply y minus y bar times Jn and

Â x tilda is equal to x minus x bar times Jn.

Â Now, this equation is exactly regression through the origin

Â with the centered variables, which we talked about in a couple lectures ago.

Â So we know that beta one hat has to be equal to the inner product of y tilde and

Â x tilde over x tilde inner product of x tilde by itself.

Â And we also argued that that worked out to be the correlation between y and x,

Â the estimated correlation between y and x times the estimated standard deviation of

Â the y divided by the estimated standard deviation of the x.

Â So the regression, the best regression slope works

Â out to be exactly the same slope as if we centered the variables first and

Â then did regression to the origin.

Â 7:23

And the intercept, if we needed intercept, so that's beta one hat,

Â and if we needed intercept, beta not hat is just y bar minus beta 1 hat X bar.

Â And so we've proven by plugging these in that we've gotten as small as you

Â possibly can in terms of the set of inequalities, so

Â these must be the minimizer of the least squares equation.

Â