0:00

Lets cover some basic examples of multivariable linear regression models.

Â So, we've already covered a bunch and perhaps the simplest one is where we

Â want to just minimize Y minus a constant vector times an intercept,

Â and we saw that the minimizer of this led 2 beta 0 hat equal to y bar.

Â And of course, we can do that using multivariable regression formula

Â by looking at the solution, beta 0 hat, which should just be the Jn

Â transpose Jn inverse Jn transpose y.

Â Okay, so

Â this term right here is just the inner product of a vector of 1 with itself.

Â So 1 times 1 is just 1.

Â So we're just adding up a bunch of 1's and

Â then taking the inverse of that equal to 1 over n.

Â And this is just a bunch of 1's times y.

Â So that's just going to be the elements of y added up.

Â So that's times the summation of the elements of y, or y bar.

Â 1:06

The second case that we looked at is the instance where we have y minus x beta,

Â where x is a vector also.

Â Squared, but we want to minimize the least squares criterion like that.

Â This is so called regression to the origin.

Â Of course, the regression to the mean is a special case of that,

Â where x is just a vector of 1s.

Â And we saw already that the result of that was inner product

Â of x and y over x with itself.

Â That was beta hat.

Â Well, let's just show that that agrees as a special case of multivariable

Â regression.

Â So we have our beta hat = x transpose x inverse x transpose y.

Â Well, x transpose x inverse is just the inner product of x in the event that x

Â as a vector and x transpose y is just the definition of inner product of x and

Â y when x is a vector.

Â Okay, so two of our special cases are obviously variations of linear regression.

Â The next one I'm going to ask for you to do on your own

Â because we've kind of worked on it a lot and shown it in different ways, but

Â in the event that we want to do linear regression, beta- x jn +

Â beta1 x vector x, so our design matrix in this case.

Â Let me just call this trying to

Â minimize y- w beta squared,

Â where w = the vector of 1 x the vector of x.

Â So I should probably just write out like this, jn and the vector x.

Â 2:48

So what I'd like you to try and

Â show is now when you are beta hat where beta here,

Â again, I'm defining as beta 0 beta 1,

Â our beta hat is going to be w transpose w inverse w transpose y.

Â So what I'd like you to show is that this works out to be the standard definitions

Â for the linear regression intercept and slope.

Â This winds up being a 2 by 2 vector so

Â the inverse, you can actually calculate, just look up the inverse of a 2 by 2.

Â And this is fairly straight forward to do as well,

Â that winds something a 2 by 1 vector.

Â So it's a little bit tedious in terms of book keeping but you can show that just

Â direct use of the multi variable least squares solution winds up in a same

Â result in the three other cases that we spent a lot of time talking about so far.

Â 3:57

Let's consider another important special case of when

Â your model's applied in a pretty general setting.

Â So imagine if our y looks something like this.

Â It's y11 up to y1n over 2 and

Â y21 up to y2 n over 2.

Â So in other words, our y is really equal to two vectors, let's say y1 and

Â y2 where the first comes from one group and the second comes from another group.

Â So we might think of a setting where we're plotting y and we have Group 1,

Â Group 2, something like a box pot.

Â So some instance like that where you're interested

Â in modelling the fact that there's two groups using least squares.

Â So we could do this with Y minus let's say X beta minimizing

Â at least squared criteria where x is equal to a bunch of 1s and

Â a bunch of 0s and then a bunch of 0s and a bunch of 1s.

Â 5:31

exactly equal to of course X transpose X inverse X transpose Y.

Â But X transpose X, okay so

Â that's just the vector 1s and

Â 0, a vector of 0s and

Â 1s times 1s and 0s and 0s and 1s.

Â And we want that inverse.

Â I'm going to get rid of this.

Â 6:24

Okay, so looking at this matrix, this is going to be n/2,

Â because when I multiply this matrix, this vector times this vector,

Â it's going to just add up the number in that first group.

Â When I multiply this factor times this vector, it will be just 0, same thing for

Â this other diagonal and this one would N over 2.

Â And there's nothing in particular about having equal numbers into two groups.

Â They could have been an N1 and N2 there,

Â I just did N over 2 just in a balance case for this equal number and both groups.

Â Now let's look at the statement right here.

Â This first one is going to be the sum of the first group.

Â So let's just call that JN over 2 times Y1 and

Â the second one is just going to be J transpose and over 2 times Y2.

Â Okay?

Â And so and I'm sorry, that's inverted.

Â And inverse is pretty easy, because then it's a diagonal matrix, so

Â it's just 1 over both of those.

Â And then, so what we get is that y1 bar and

Â y2 bar are the slope estimates for beta.

Â Which is what we imagined should happen.

Â If we have an effect one for group 1 and the second effect for group 2,

Â the likely estimate would have to have turned out to be the average for

Â group 1 and the average for group 2.

Â So the fitted values in this case.

Â Is just going to be, if you're in group 1, it's going to be j n times y1 bar.

Â I'm sorry j n over 2 time y1 bar if you're in group 1.

Â And j n over 2 times y 2 bar if you're in group 2.

Â 8:50

And then a bunch of 0s and Jn2 where J is, again, a vector of 1s.

Â Let's say x1 is like that.

Â And we found, so we have two groups of data and we found out that our estimate

Â then works out, our beta hat works out to be y1 bar y2 bar, okay.

Â Where again our beta is equal in this case the beta 1 and beta 2.

Â 9:20

Now consider y- x2 gamma, consider minimizing the model,

Â where x1 is equal to a vector Jn1+n2 and

Â then a vector that's Jn1 and then a bunch of 0s.

Â Let me just write it as 0n2, meaning a vector of 0s of length n2.

Â 9:44

Now, so this is 0n2,

Â and this is 0n1.

Â And this is x2, I'm sorry, and

Â then gamma is equal to gamma 1 and gamma 2.

Â Now notice if I add these two columns, right?

Â I get this column right here.

Â 10:43

And for any observation in group 2, it's going to be y2 bar.

Â So, we know that beta 1 hat equals y1 bar, and beta 2 hat equals y2 bar.

Â We know that, because we worked it out in the last example, where we figured it out.

Â 10:57

Okay, now look at x2 times gamma.

Â Well, the fitted values for anyone in group 1,

Â are now if I multiply X 2 times gamma, is going to be gamma 1 hat plus gamma 2 hat.

Â And then, for anyone in group 2,

Â it's gotta just be gamma 1 hat by itself.

Â 11:36

And they have to agree, because the column space of the two is the same.

Â So, what we know then, is that beta 1 hat,

Â which is Y1 bar, has to equal to gamma 1 hat plus gamma 2 hat.

Â And beta 2 hat, which is equal to y2 bar, has to equal to gamma 1 hat.

Â We can use that to now solve for gamma 1 hat and

Â gamma 2 hat, without actually having to go to the trouble of inverting this matrix.

Â Now, it's a 2 by 2 matrix, so it shouldn't be that hard to invert.

Â But let's suppose you had a little bit harder of a setting,

Â then it would be a little bit harder to invert.

Â Let's suppose we just had ten columns.

Â And this is kind of a common trick in these ANOVA type examples where

Â you can re-parameterize to the easy case where you get a bunch of block diagonal

Â 1 vectors like in the case of x1 in which case x

Â transpose x works out to be a diagonal matrix, and then very easy to invert.

Â And then if your want any different reparameterization,

Â which would result in an x transpose x that's hard to invert,

Â you can use the fact that you know the fitted values have to be identical

Â to convert between the parameters after the fact.

Â So in this case you know that gamma 1 hat has to be equal to beta 2 hat.

Â And then you know that gamma 2 hat then just plugging in with those

Â two equations has to be equal to beta1 hat minus beta2 hat.

Â Okay, and so that gives you a very quick way to go between parameters.

Â In the equivalent linear models with different specifications,

Â with just different organization, okay?

Â So it's a useful trick when you're trying to work with these ANOVA type models.

Â 14:08

And my design matrix, which I'm going to call W, which will become clear for

Â reasons later is equal to a matrix called z and a vector called x.

Â Where x is in n by 1 and z is in n by 2.

Â And z looks like this, z looks like Jn1 and then an n1 vector of 0s.

Â And n2 vector of 0s, I'm sorry.

Â And then an n1 vector of 0s, and then Jn2.

Â So the z matrix looks like the two way ANOVA matrix from perviously.

Â But we've appended an x onto it as well, an x vector.

Â So this is the example if we do least squares with this w,

Â we are interested in fitting models for we have our

Â 15:15

the intercept for each of the groups.

Â Okay. So we want to minimize y-w and

Â I'm not going to call it beta, let me call it gamma.

Â Quantity squared.

Â Where gamma is equal to mu1, the intercept for group 1,

Â mu2 the intercept for group 2 and beta the common slope across the two groups.

Â So, we can write this

Â as y- x beta- z.

Â Let me just call it mu as the vector of mu 1 and mu 2.

Â Okay so we can write it out like that.

Â And then let's figure out what this works out to be.

Â So let's use our standard trick where we hold beta fixed and

Â we come up with the estimate.

Â For mu condition as it depends on beta.

Â Well, if beta's held fixed, that's just a vector and

Â this is just the two-way ANOVA problem that we discussed previously.

Â Remember that the two-way ANOVA problem worked out to,

Â the solution worked out to be the mean in group 1 and the mean in group 2.

Â So the estimate for mu1 as it,

Â depends on beta has to be the mean of this vector right here.

Â So that has to be Y1 bar, the group 1 mean for the Ys minus X1 bar beta.

Â And then, U2 the mean for group two as it depends on

Â beta has to be y2 bar minus x2 bar times beta.

Â Now if I were to plug those back in for mu 1 and

Â mu 2 into here and subtract them off from y,

Â what I get is nothing other than The center version so fy.

Â So I get y1 minus y1 bar times Jn1.

Â y2 minus y2 bar times Jn2.

Â That vector minus x minus x1.

Â 17:25

- x1 bar, x2- x2 bar.

Â I didn't define x1 and x2 but let me just say those are just

Â the group components of x, x1 is the first n1 measurements of x and x2.

Â And x2 is the n2 latter measurements of x.

Â And that should be times beta.

Â Okay?

Â So oops, and I shouldn't say equal.

Â It has to be greater than or equal to because we've plugged in

Â the optimal estimates for mu 1 and mu 2 for a fixed beta.

Â Well, this is now nothing other than regression to the origin

Â with the group centered version of y and the group centered version of x.

Â 18:26

The double sum of, well, here, probably the easiest way to

Â write it out first is Y tilde, the interproduct of Y tilde and

Â X tilde over the interproduct of X tilde with itself.

Â Where Y tilde, is the group center version of Y, and

Â X tilde is the group centered version of X.

Â In other words by group centered I mean each observations with its group mean

Â subtracted off.

Â 18:55

And you can show, and I have this in the notes, okay.

Â And you can show, well let's just do it really quickly here.

Â What does this work out to be?

Â This works out to be the double sum.

Â Let's say over i and

Â j of yij- y bar i.

Â So i = 1 to 2 and j = 1 to n sub i,

Â xij- x bar i all over the double

Â sum of x i j minus x bar i and

Â let me just explain my notation here.

Â So why i, j is the jth component of let's

Â say y 11 is the first component of the vector y1.

Â y12 is the second component of vector y1.

Â y21 is the first component of vector y2 and so on.

Â 19:58

So we can write this out, and I think this is

Â probably the nicest way to write it out, as for

Â i = 1, we can write this out as p times beta 1

Â hat plus 1 minus P times beta 2 hat where beta

Â 1 hat is the regression estimate for

Â only group 1, if you only had the x1 and y1 data,

Â the center of x1 data and the center of y1 data.

Â And beta 2 hat is the regression estimate if you only had the y2,

Â the centered y2 data, and the centered x2 data.

Â 20:42

Okay, so it is interesting to note that the slope and

Â covo works out to be an average, a weighted average,

Â of the individual groups specific slopes where in this case,

Â p works out to be the summation of ( x to the 1 j- x to the 1 bar).

Â 21:13

Over sum.

Â Oh, and this should've been square, sorry about that.

Â The double sum of x1j minus x1 bar squared, okay?

Â So P works out to be the percentage.

Â And yeah, xij minus x bar i.

Â So P works out to be the percentage of the total variation in the x's from group one.

Â So if most of your variation in your x's is in your group 1,

Â then the group 1 slope contributes more to the overall ANOVA slope.

Â And if the group 2 is more variable then the group 2 contributes more, and

Â if they're equally variable then both of them contribute equally.

Â Okay. So let's go back to now,

Â once we have our beta hat, we can figure out what our mu1 hat and our mu2 hat is.

Â So mu1 hat is equal to y1 bar- x1 bar beta 1 hat, and

Â mu 2 hat is to equal to y 2 bar minus x2 bar beta 2 hat.

Â 22:25

Okay, so the difference in the means mu

Â 1 hat minus Mew 2 hat works out to be y1

Â bar- y2 bar- x1 bar- x2 bar beta hat.

Â Now, one way to think about this,

Â the most common way to think about ANCOVA Is the instance where

Â you want to compare treatments, treatment 1 versus treatment 2.

Â But you have some confounding factor that you need to adjust for.

Â Say, for example, you're looking at a weight loss treatment

Â and your confounding factor is the initial weight of the person.

Â 23:07

Okay, and so if the initial starting weight of the people

Â receiving the one diet, one weight loss treatment

Â is different than the initial weight of the other weight loss treatment,

Â then you'd be worried about just directly comparing the two means.

Â Well, this shows what, in addition, to the two means, you need to subtract off.

Â If you model the data as an ANCOVA model.

Â Most interestingly, is if you randomize, and

Â your randomization is successful in the sense of balancing this observed

Â covariat, the baseline age.

Â Then the group 1 average should be pretty close to the group 2 average.

Â So this difference in means should be quite small so that whether or

Â not you adjust for baseline weight or meet baseline weight for the model and

Â just to straight two group of ANOVA, the estimate should be very similar.

Â However on the other hand if you happened to not have had some randomization and

Â had to have in having imbalance so that the average for

Â group 1 is very different from the average for group 2 then the difference between

Â the unadjusted estimate and the adjusted estimate can be quite large.

Â Okay, so that's ANCOVA, that's an important example.

Â I have some more written about it in the notes, but I think you can actually learn

Â a lot about regression and adjustment just by thinking about this one example.

Â