0:00

In this lecture we're going to talk about a particular skill.

Â Which is reading the output from aggression.

Â Now remember early on in this course, that I got one

Â reason to model is to be an intelligent citizen of the world.

Â That there's just a certain amount of modeling understanding that

Â you have to have to get buy and really contribute.

Â Well you may be doing some project, and what's going to happen is somebody's

Â going to give you some sort of regression output that looks like this.

Â And what you've got to be able to do is, read it,

Â understand it, know what this is saying.

Â So we see some words and we already know like R

Â squared, and we see things here like, say coeffic, coefficient intersect.

Â What we would like to do is we had

Â to fully understand what output that this is saying.

Â Okay, so the first thing to notice, we've got to think about what's really going on.

Â We see regression output, but it really is this, it's a

Â linear model, but it's a linear model based on multiple variables.

Â So remember

Â before we had Y equals mx plus b, right that was our model.

Â The linear model were Y depended on x.

Â What we're going to do now, typically we see regression output, is that Y, which is

Â your dependent variable, depends on more than one,

Â so it's like m1x1 plus m2x2 plus b.

Â So for example, suppose, again let's go back to the health example, suppose Y

Â is health outcomes.

Â Well x1 might be how many hours of exercise you get, x2

Â might be how many hours of sleep you get, and so on, right.

Â So what you've got is your dependent variable, 1, depends on a lot of stuff.

Â So when you look at regression output, like when you look at stuff that looks

Â like this, you see there's more than one x, there's an x1, and there's an x2.

Â 1:35

Let's do an example, you know, watch the sound of it.

Â Supposing we're looking at student test scores.

Â Right?

Â So that's y, that's our dependent variable.

Â When it could depend on a couple things.

Â It could depend on t, which is teacher quality,

Â and it could depend on z, which is class size.

Â So we just write down a simple linear model that says

Â y, equals c times t, plus d times z, plus b.

Â B is again our intercept, right?

Â That's because I put my intercept before, just like Y equals MX plus B.

Â Now when we think of this column, what should we expect?

Â What we should expect

Â is that as teacher quality gets better class size, or scores get better.

Â So we should expect c to be bigger than zero.

Â But we should expect as class size gets bigger,

Â that the school performance, or class performance should fall.

Â So therefore we should expect d, to be less than zero.

Â So, when you see a model like this one of the first things you want to

Â do, is you want to sort of come at it with some expectations, some preconceived

Â ideas about what you think is going to be true.

Â That way when you look at your output you can decide, is it surprising?

Â Right? Or is it not surprising?

Â So, let's go back, and take just a generic model,

Â where we have Y equals aX1 plus bX2, plus c.

Â So c here is going to be our intercept, right?

Â That's our intercept, and a and b, these are the coefficients of

Â our independent variables.

Â What we want to do is we want to look at

Â the outputs, going to tell us something about those coefficients.

Â So here we go, here's some regression output.

Â Looks a little scary, but let's, just relax a second.

Â So let's look first and see what we see, first

Â we see this thing that says R squared is 0.72.

Â What is that telling us?

Â Well, we already know, that's saying, there's a whole bunch of variation

Â in the data, and 72% of it, was explained by our model.

Â That means a linear model, in this case is explaining 72% of the variation.

Â That's totally great.

Â Standard error, 24.21, is telling us on average,

Â what was the standard deviation of the model.

Â So how far from the mean were things, so this is telling us on average about 24.

Â And then this observation thing's 50, is going to say we had 50 data points.

Â So, we had 50 data points on average, it was 24

Â away from the mean, and we could explain 72% of that variation.

Â So, you know, not a bad model.

Â Alright, now we look down at this part down here.

Â This whole part of it, this is telling us something about what

Â our linear regression model is saying about the coefficience of the intercept.

Â So the first thing we notice is the

Â 25 here is the intercept, and so that's saying

Â our final regression equation is going to look like

Â y equals something times X1, plus something times X2

Â plus 25. Alright,

Â 4:18

now this next term here this 20 is telling us that the coefficient of X1 is 20, so

Â it's going to be 20 X1. And then this 10 corresponds to X2, right.

Â Plus 10 X2, so this is the space we're telling is,

Â our equation is Y plus 20 X1 plus 10 X2, plus 25.

Â Now let's suppose, let's go back to our,

Â the previous example, which I was talking about.

Â Let's suppose that X1 was teacher quality.

Â So this would say Y, which was student test scores,

Â are increasing in teacher quality so that's what we'd expect.

Â But suppose X2 is class size and here we have a

Â ten, then we get the test scores actually increasing in class size.

Â Well then we say hmm, this is sort of surprising to me because I

Â accepted class size to have a negative

Â coefficient, and it actually has a positive coefficient.

Â So, we might think maybe

Â our data's wrong, maybe my intuition's wrong, so let's look a little bit deeper.

Â So some things to look at first as

Â we should note, we've only have 50 observations.

Â So 50 observations isn't very many.

Â So it could be that maybe these coefficients aren't right.

Â Well, how do we know that?

Â Well, this is where you want to look at this column right here, with SE.

Â 5:28

SE stands for standard error. And what it means is, it's sort of how,

Â what's the error in our coefficients?

Â So, for example, here we've got a coefficient

Â of 25, and it says the standard error

Â is 2, so what that;s meaning is, let's go back, remember think of our bell curve.

Â Sort of saying, our model is guessing, that the coefficient of this thing is 25.

Â Right that the coefficient is 25?

Â But we've got a standard error of 2, so that means, if we went between 23 and 27.

Â We'd be right,

Â 68% of the time.

Â So what it say is this 25 is a guess based on the day and what this standard error of

Â 2 is telling us is that, well 68% of the

Â time, it actually, the coefficient will be between 23 and 27.

Â So it's sort of saying, you know maybe its 25, but

Â you know it's probably almost for sure between 21 and 29.

Â Well, let's look at our X1.

Â We've got a coefficient of 20, but the standard error is only 1.

Â So what that's saying is,

Â we can be really sure, that the coefficient on X1 is between like, let's

Â say, 17 and 23, and we can be incredibly sure it's between 16 and 24.

Â But now let's look at this last one, X2.

Â The coefficient was 10, and the standard error was 4.

Â So if I draw my bell curve, and let's go over here and

Â draw a big bell curve here, make sense of this for a second, right.

Â What it's saying is from the data I'm estimating that

Â coefficient of 10, but I've got a standard error, right, of 4.

Â So that means there's a 68% chance that it lies between 6 and 14.

Â And there's a 95% chance it lies between two and 18.

Â And there is actually at least a 2% chance it is you know, below two.

Â Right?

Â So there's some chance that this coefficient actually, instead of being

Â ten, could be negative.

Â 7:17

So you think, well, why don't they tell us that?

Â And the answer is, they do.

Â So this column right here, this is P-value, that

Â tells you, the probability that the sign is correct.

Â Or, I'm sorry, the time is wrong.

Â So what this is saying, is, there's no way those two signs are wrong.

Â If we think about drawing this bell curve where we get

Â an estimate of 25, there's no way the real coefficient is, is

Â negative.

Â But when we get down to X2, it's saying look, there's about a 1.5% chance that

Â this coefficient of ten, instead of being positive, it actually should be negative.

Â 8:02

Alright, take a step back for a second, what do we got?

Â So we see this

Â regression output, it tells us a bunch of stuff.

Â It tells us first what the r squared is, how much of the data did we explain.

Â Second it tells us how many observations do we have.

Â A lot of observations or not many.

Â Third, it tells us how much variation was there in the data to begin with.

Â And the answer is, on average 25, 24.21.

Â So, you know, quite a bit of variation.

Â Then it tells us the values of the intercept and the coefficient,

Â right? These, these are estimates, 25, 20 and 10.

Â And so

Â 8:35

it tells us you know, this is probably positive and this

Â is probably positive, and it gives us a sense of magnitude.

Â So it tells us sign and magnitude.

Â But it also tells us in this P-value thing,

Â how sure we are that those coefficients are correct.

Â Now keep those.

Â We can't be sure it's 25, but we can be,

Â tell us sure we are that the coefficient is actually positive.

Â So here it's saying, we're really sure it's positive.

Â Right?

Â because there's almost no chance for making a mistake.

Â But for X2,

Â well there's a one and a half percent

Â chance that maybe there is a mistake there.

Â So if this were, again, a regression of test scores on, teacher quality

Â and class size, what we can

Â say is, teacher quality defiantly improves performance.

Â And there's a lot of evidence that that's true.

Â And we could say with class size, well, even though this study goes

Â the wrong way, it's possible since we only have 50 data points, right?

Â That maybe if we did another study it could go in the opposite direction.

Â And that's true as well, there's a lot of studies on class size that do in fact show

Â that as the class size gets bigger the students

Â do better, even though that is sort of counter-intuitive.

Â There's more studies that show the opposite, right?

Â That as the class size gets smaller, the students do better.

Â 9:41

So big things to take away when you look at that regression output.

Â The first thing is look at the sign, look at

Â every coefficient and ask, does Y increase or decrease in X?

Â Now before you look at it though, when somebody says, oh, I've

Â run a regression model, you should say well what are your variables?

Â And then what you should do is you should form expectations about

Â what you think the signs of those variables are, the coefficients are.

Â Then when you look at it you can say hmm, does

Â the variable have the effect that I thought it would have?

Â So if your looking at sales for your firm, you might want to say, well geez,

Â you know, the coeffecient on advertising is negative,

Â the more we advertise the less we sell.

Â That would be totally counter-intuitive. Right?

Â But if the

Â coefficient on advertising were positive, that the more you

Â advertise the more you sell, then it would make sense.

Â Then we want to look at magnitude, and we want to say okay, how

Â big of an effect on Y, does a one unit increase of X have?

Â And if it's got a big coefficient, that means,

Â wow this is something I should pay attention to.

Â And that's got a small coefficient, it's

Â something maybe you shouldn't pay attention to.

Â 10:59

of those coefficients are, right?

Â So is the sign positive, and what's the magnitude of that coefficient?

Â And we also get that P-value thing, which tells

Â us what's the probability that the coefficient's actually wrong.

Â [LAUGH]

Â That maybe, you know, the data's so noisy we can't say for sure.

Â So what's great is, you know, if you've got data out

Â there, you can throw that data into a linear regression model.

Â Right?

Â If you've, if you've got an idea of what variables you want to include.

Â And you can get some output, and then from that output you

Â can get an understanding of, you know, how good is the model?

Â What's the R squared?

Â What is the sign of the magnitude of the coefficients,

Â and how confident can we be that those coefficients are right?

Â So it's actually a really useful way

Â for making sense of the world, and as we've shown in the previous lecture,

Â right, it's usually better than we are

Â at figuring out how the world's going to work.

Â Thank you.

Â