I'm not going to write that because I'm just going to pull it out.

because of the I doesn't change anything and the sigma squared is the scalar.

So then we get x transpose x transpose x inverse and

then we can put our sigma squared out here, okay?

So we have x transpose x inverse, x transpose x and x transpose x inverse.

So this works out to be x transpose x inverse sigma squared.

So a lot like our, Linear regression estimate.

The variability of the regressor winds up being,

in a sense, in a matrix sense, in the denominator.

And, this tells us that just like in linear regression,

in order to have our variance estimate of our coefficient be smaller

we want the variances of our x's to be larger.

And that makes a lot of sense.

If you think back in linear regression, if you want to estimate a line really well,

if you collect x's in a tight little ball,

you're not going to be able to estimate that line very well.

But if you collect x's all along the line, in other words the variance of x is very

large, then you're going to be able to estimate that line with greater precision.

So we want, it's interesting that we don't want variability in our y's,

we want sigma squared to be small.

But we do want variability in our x's, we want variability and our x's to be large.

And in fact, in linear regression, the most variable you can make things is if

you get half of your x observations at the lowest possible value,

and half of your x observations at the highest possible value.

That'll give you the maximum variance for the denominator.

Of course, you're banking a lot about the relationship not doing

anything funky in between this big gap where you didn't collect any data,

but that is, if you're really quite certain about the linearity,

then that would minimize the variance of the estimated coefficient.