0:39

My residuals, oop, my residuals are my response y minus the predicted values,

Â beta nought plus beta1 x, so here I've just carried the subtraction through.

Â There's my residuals.

Â My estimate of the standard deviate of the,

Â standard deviation around the regression line,

Â the variability around the regression is the average of the residuals,

Â except remember, we're dividing by n minus 2 instead of n and to get it to be

Â a standard deviation rather than a variance, I'm going to square root it.

Â There's my sigma.

Â That's my estimate of sigma from the previous several pages.

Â My sums of my squares of x's is just the numerator of the variance calculation,

Â but let's just do it directly.

Â It's x minus its mean squared and sum of the, adding those up.

Â So let me get that and

Â then my standard error from my beta nought from my intercept.

Â If I just simply plug into the formula, there, I've written it out.

Â And my standard error for my beta1, which is I think the more useful and,

Â and mentally important formula to commit to memory is the sigma,

Â the variance, the standard deviation around the regression

Â line divided by the sum of the squares of the x's.

Â The sum of the square deviations of the x's around their mean.

Â Okay.

Â There we go.

Â There's the standard error for beta1, I'm going to create the two t statistics.

Â Which if you're testing a hypothesis that beta1 is zero or

Â beta nought is zero, then that's just going to be the estimate.

Â Here's my estimate divided by its standard error.

Â And so we don't have to subtract off the true value,

Â because the true value is assumed to be zero under this hypothesis.

Â And so for my beta1 hypothesis, it's going to be beta1 divided

Â by its standard error and then I'm going to calculate my two p values.

Â Again, if you've taken the inference class,

Â then how you go from a t statistic to a p value should not be a great leap.

Â But in this case, it's going to be twice my t probability where

Â in both these cases, the estimates are larger than zero so the,

Â so I'm going to look at the probability of being this statistic or

Â larger and I'm going to double those two t probabilities [SOUND] and

Â them I'm going to create my coefficient table created manually by

Â me without having done any lm or any built in higher level R function.

Â And I'm going to give it, its rownames and its column names, so

Â that it looks like R's output.

Â And then I'm, now I'm going to show you the easy way to do it.

Â Okay.

Â So here let me print it out.

Â There it is printed out and then now, I'm just going to do fit, lm,

Â my response y is a linear relationship with my predictor x and

Â then I'm going to get the summary of the coefficients.

Â You'll see everything is exactly the same.

Â So, if you didn't follow all this, it's, it's not too big of a deal.

Â I just wanted to show you for

Â once that the formulas that I'm giving you are exactly the right formulas and

Â we verified them computationally by checking them out on a dataset.

Â Maybe check them out on another dataset just to verify for yourself.

Â Let's go through getting a confidence interval.

Â So my fit, the variable fit was the output of lm.

Â Let's see what happens if I type summary fit.

Â You get the full output from lm.

Â So it gives you facts about the residuals, it give you the coefficients,

Â the estimated coefficients.

Â The Standard Errors, the t values and the associated p values.

Â These are all for test of whether or not the intercept is 0 or the slope is 0.

Â Now of course 0 intercept may not be of interest, but

Â almost always a test of a 0 slope is of interest.

Â But it also gives you the residual standard deviation,

Â the degrees of freedom The r-squared, the adjusted r-squared,

Â which we'll talk about later on in the class.

Â Let's go ahead and generate our confidence intervals for

Â our intercept and for our slope.

Â So now I've assigned that summary.

Â The summary from the lm statement to an object in R, but

Â I just wanted the table part, so I'm grabbing the, just the coefficient.

Â So, it's right here.

Â See, I'm just grabbing the coefficient table.

Â So let me just print out that variable, just to show you what it looks like.

Â It's just that table part, just the estimates, the standard errors,

Â the t values, and the p values.

Â Okay.

Â So, our confidence interval is just going to be estimate.

Â So here's for the intercept, estimate plus or minus

Â the 97.5 t quantile degrees of freedom, just to make

Â sure I get the degrees of freedom right, so I don't even have to think at all,

Â I'm grabbing fit$ degrees of freedom and then times the standard error.

Â There, I'm grabbing the standard error.

Â 5:40

And then I'm doing it for the slope, which is in the second row of the first column.

Â So the slope estimate plus or

Â minus the releventy quantile times its standard error,

Â which is in the two two cell of this table here.

Â And then the, slope is going to be interpreted in change in Singatore,

Â Singapore dollar price per one mass increase in carats,

Â in carat, the unit for the x, the predictor variable.

Â But I, I might want it for a 0.1 increase, because as we mentioned earlier or

Â looking at the data, a one carat increase was kind of a big increase.

Â So why don't I divide my whole intervil, interval by ten.

Â So let's run both of them.

Â And particularly, focus in on this later one,

Â the confidence interval for the slope.

Â So, it's saying, with 95% confidence,

Â we estimate that a 0.1 carat increase in diamond size is going

Â to result in a 356 to 389 increase in price in, in Singapore dollars.

Â