0:00

Also, I'd like to remind everyone that we can get a prediction for

Â Y by taking beta not hat plus beta nought hat and

Â multiplying it by the X that we want to predict at.

Â Now if we plug in the observed X's, then we get the fitted values.

Â The ones who were trying to make as close as possible to the observed data, but

Â that doesn't mean that we can only predict at the fitted values.

Â We can predict at any value of X, we'd like to plug into the equation.

Â However, we're going to have more reasonable predictions if the value of

Â x that we plugin is in the cloud of data that we're, we used to build the model.

Â Later on, we'll also talk about how to account for

Â that kind of uncertainty with prediction intervals.

Â But for the time being, let's just talk about how we get a prediction.

Â And in the next couple slides, we'll go through an example of interpreting

Â the intercept, interpreting the slope and

Â generating predictions from a specific regression model and setting.

Â All right. Let's go through some code now

Â to interpret our regression coefficients and

Â show you running of the regression coefficient sort of in real time.

Â The dataset is the diamond dataset from the UsingR package.

Â The data is diamond prices in Singapore dollars and diamond weight in carats,

Â which is a standard measure of diamond mass.

Â To get the data, you need to start out by using

Â library UsingR to get the UsingR package and

Â data diamond and then we want, I want to do ggplot2,

Â because I'm going to do a ggplot first.

Â So here, let me go through my ggplot commands.

Â So, I would like to assign to the variable g my ggplot, the dataset is diamond.

Â My aesthetic has the horizontal axis variable as carat and

Â the y-axis variable as price.

Â I'd like to label, I'd like to get in the practice of labeling my axes,

Â so my plot, I add a layer where the xlab isM ass and carats.

Â And where the y label, ylab is pr, price in Singapore dollars.

Â So let me run those, [SOUND] let me run those lines and then I'd like to

Â add the points, I'd like to add the points of the black background and

Â then a, a light alpha blending color on top and

Â then it quite easy to add the regression line.

Â So, I'm going to add a layer that is geom_smooth and method equals,

Â method equals lm will add the regression line.

Â And if you omit any arguments,

Â it's just going to assume the regression line with y's the outcome and

Â x is the predictor and that I want my regression line color to be black.

Â So let me run that line and then call my plot.

Â [SOUND] There's my plot, you can see on my x-axis,

Â I have mass, on my y-axis I have price.

Â And now what I'm plotting is the fitted line, the line that minimizes the sum

Â of the squared vertical distances between these points and the lines.

Â Now let's actually go through and get our fitted line.

Â Just to remind us, the function lm is r is linear model procedure,

Â so it includes regression as a special case.

Â Y on the left-hand side of our tilde is the outcome price,

Â then tilde, think of that as sort of the equal sign in the model.

Â Our x variable carat.

Â By default, lm includes an intercept.

Â So, if you don't want an intercept, you have to explicitly force it in the model.

Â Then the comma,

Â we want the dataset that we're looking at to be the diamond dataset.

Â So, in other words, we have to give it the data frame.

Â Otherwise, it looks in the regular r environment for variables in the model and

Â we're going to assign that to, to the variable named fit.

Â So let's see what that output looks like after we run it.

Â So there I've run the model and

Â now I'm going to type fit just to see what it prints out.

Â It basically just prints out the coefficients beta nought and beta one.

Â I would note that you can get a much more detailed printout

Â by doing summary of the outputted variable from the l,

Â lm fit and you get this more elaborate printout.

Â And we're going to go through and detail all of the numbers on this printout,

Â you'll be able to interpret everything on this

Â printout at some point after the class.

Â But for the time being, lets just talk about the coefficients.

Â If you just want to grab the coefficients as a vector, lets do coef fit and

Â then we get the intercept and labels it as Intercept and the regression variable for

Â the carat, the slope for the carat regression variable.

Â So let's look at this 3,721 variable and try to interpret it.

Â It's saying that we have an expected 3,721 Singapore dollar

Â increase in price for every carat increase in mass of the diamond.

Â The intercept, negative 259 is the expected price of a 0 carat diamond.

Â So not very interested, interesting,

Â because we're not interested in zero carat diamonds.

Â Now let's mean center our x variable, so

Â that the intercept is on a more interpretable scale.

Â The first thing I'd like to do is assign it to a different variable,

Â fit2 instead of fit, because I don't want to overwrite the original fit.

Â Lm is again, my linear model procedure.

Â My outcome stays the same and

Â now I want to main center my predictor variable, carat.

Â So carat minus mean carat.

Â If you want to do arithmetic operations inside equation statements in lm,

Â you actually have to surround them by this I function and

Â then we still want our dataset to be the diamond dataset.

Â So let's run that code.

Â [NOISE] So, I've run fit2 [NOISE] and

Â there are my new coefficients.

Â Notice of course, the slope stays the same,

Â 3,721, but my intercept has changed to 500.

Â So $500, Singapore dollars is the expected price of the average sized diamond.

Â In this case, the average diamond is about 0.2 carats.

Â A one carat increase is actually kind of big.

Â What about changing the units to one-tenth of a carat?

Â We can do this just by dividing the coefficient by ten.

Â So we know that we would expect to see a $372 increase in price for

Â every one-tenth of a carat increase in the mass of a diamond.

Â But let's actually show in r how this works, as well.

Â Here I am, now assigning to the ver, to the variable fit3.

Â The linear model fit, where now instead of putting in carat,

Â I'm putting in carat times 10.

Â So the units of this new variable is one-tenth of a carat.

Â The data is of course, still the diamond dataset.

Â So let me run that and then let me find the coefficient.

Â And you get, of course,

Â that it is now 372 rather than 3,721.

Â So, imagine if someone came to you with three new diamonds that they had.

Â 0.16 carats, 0.27 carats and 0.35 carats,

Â so here they are right here and let me assign those.

Â And they wanted to know what you would estimate the price would be.

Â Well, you could do it manually by grabbing the two coefficients in multiplying

Â the intercept or adding the intercept plus the slope times these new values.

Â Let's do that.

Â And so you would predict 336, 745 and $1,006 for these three diamonds,

Â respectively based on your fitted linear regression model.

Â Which by the way, from the scatter plot seems to fit pretty well.

Â Often, you don't want to do even that much coding, you want to more general method,

Â especially when you get lots of regression variables.

Â So there's this general method called predict that will take the output from

Â several different kinds of model fits.

Â Linear models are one example, but predict is a generic function, you know, or, and

Â it applies to several different prediction models.

Â So we predict from the output of our lm fit and

Â then you need to give it some new data to predict that.

Â So new data is a data.frame that has the new values of x for the carat variable.

Â Then when we do that, what you'll see is of course, it gives you the same answer.

Â The, now in a way that scales up when we have lots of regressors

Â in much more complicated settings.

Â So you generally, want to predict using the predict function.

Â If you omit this new data statement, if you just do predict fit,

Â I'll show it to you.

Â [SOUND] It predicts at the observed x values,

Â so it gives you the y hat values.

Â If you want it at new x values, you have to give it this new data argument.

Â I just wanted to briefly illustrate what the prediction was accomplishing.

Â Here's our observe data points in blue.

Â The fitted values when we do the predict command, the fitted values in red all of

Â the observed x values and their associated fitted points on the line.

Â These are if we were to draw vertical lines from the observed data points on to

Â the fitted line, they would occur on these red points.

Â When we predicted a new value of x,

Â what we're doing is we're finding a point along this horizontal axis.

Â That here, I'm giving the three values that we want, 0.16, 0.27 and 0.34.

Â We're drawing a line up to the fitted regression line and

Â then over to dollars and those are our predicted dollar amounts.

Â