0:01

So let's go through an example of calculating a prediction interval.

Â In this case, we're going to use the mtcars dataset, okay?

Â And so, I'm going to fit a model that has miles per gallon as the outcome.

Â Horsepower and weight and an intercept as predictors, there's my fit.

Â And if I do summary(fit) you can see, there it is, okay.

Â So if I want to predict in a new car, I'm going to create a new data frame.

Â And I want to predict that a horsepower of 90 and a weight of 2.2, okay?

Â So then if I do, predict(fit, newdata = newcar), okay,

Â it gives me my prediction, 25.8 miles per gallon.

Â Note, if I do predict of my linear model fit without any arguments, okay?

Â It predicts for all the existing x values because your x beta.

Â Basically, it gives you x beta hat, where x is the observed design matrix, okay?

Â So it gives you the yhat values from all the observed x values.

Â Okay, now, if I want to predict of this new data set but

Â I want a confidence interval.

Â For the prediction surface, the two dimensional prediction surface.

Â Then, I want to put confidence and it gives you the fit which is 25.8.

Â It gives you that lower and

Â upper confidence limits for a 95% confidence interval.

Â Okay, if I do interval = "prediction", then it does the same thing.

Â However, it's giving you a prediction interval.

Â So, notice, that the lower and

Â upper confidence intervals are the lower one is lower, and

Â the upper one is higher, representing that one plus that exists in there.

Â Okay, so let's now do this manually because in this class,

Â we like to know exactly what's going on under the surface.

Â Okay, so I'm going to grab dplyr.

Â 1:58

Okay, and then, my y is my miles per gallon.

Â And my x matrix is the intercept, which is a 1.

Â And I'm just going to grab from the empty cars dataset the horsepower and weight.

Â And I like to do that with select, which is I think ,in the dplyr package.

Â Okay, my n is the number of observations I have.

Â And my p is the number of columns of x, which in this case p should be 3.

Â So 1 for the intercept, 1 for horsepower, and 1 for weight.

Â Okay, so x, transpose x inverse, is just x transpose x inverse.

Â And then, my beta is going to be x transpose x inverse times x transpose y.

Â 2:39

And then, the new value I'd like to predict at is my intercept,

Â okay, 90 and 2.2.

Â Okay, 90 for the horsepower and 2.2 for the weight, okay, there we go.

Â And then, my yhat at that new x value, is going to be, x knot times beta.

Â But let me get to that in a minute.

Â My yhat and my observed x values is x beta, okay?

Â And then, my residuals are going to be y- yhat.

Â And then, my residual variance is just my average squared residuals

Â divided by n- p rather than n, okay?

Â Now, my prediction, my yhat0 at this new value of x,

Â is just going to be x knot times beta.

Â And I just did sum,

Â just to avoid having to type out the matrix multiplication operator.

Â Okay, so, what's my confidence interval?

Â It's yhat then + the + or- the 0.975 fifth quantile.

Â So instead of + or -, I just say, + the 0.25 and 0.975 fifth quantile.

Â because notice, if you do that, it's going to return the negative and

Â the positive version, okay?

Â And times s, then times square root x knot transpose,

Â x transpose x inverse x knot transpose.

Â So there it is, you need 24 to 27.2, 24.26.

Â So if we go up to our confidence interval, it's 24 to 27.2.

Â Okay, so it's the same thing.

Â Now, let's do our prediction interval.

Â It's the same thing, only now, there's this 1+ right here, okay?

Â So let's do that again.

Â And we get 20.356, 31.31.

Â Okay, so 20.356, 31.31.

Â Okay, so that's what's going on under the scenes.

Â It's pretty straightforward logic on how it's doing this.

Â And just take into account if you want to estimate

Â the prediction surface at a particular point, you want a confidence interval.

Â And if you want to evaluate the prediction surface plus the natural variability

Â that exists around that prediction surface.

Â Make sure you do a prediction interval rather than a confidence interval.

Â And it's all pretty easy with the predict function, okay.

Â You should only do these kind of calculations just as part of something

Â like this class.

Â Where you're just verifying that you understand how it works.

Â And then, from then on you would use the more natural function to do this.

Â