Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

Loading...

From the course by University of Houston System

Math behind Moneyball

41 ratings

At Coursera, you will find the best lectures in the world. Here are some of our personalized recommendations for you

Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

From the lesson

Module 9

You will learn how to rate NASCAR drivers and get an introduction to sports betting concepts such as the Money line, Props Bets, and evaluation of gambling betting systems.

- Professor Wayne WinstonVisiting Professor

Bauer College of Business

So how do we test if a gambling rule or

a system performs significantly better than expected?

I don't want to get too deep into statistics, but

basically what statistics is about is you have some hypothesis

about what would happen in sort of a normal situation.

Like we beat the point spread half the time if you don't have a good system.

And then you see how many standard deviations above or

below average the performance of your system is by computing a z-score.

And we know z-score bigger than plus two or

less than minus two will occur about 5% of the time.

Given our hypothesis, let's say that our system has no predictability.

Okay, so let's suppose we go back to the NFL point spreads, and we said

if you bet against a visiting team that's favored by at least points, that should.

If you bet on the home team in that situation, that's a good betting system.

Well you sort of have a null hypothesis or view of the world,

That basically you will beat the spread 50% of the time.

Now this role wins 70 out of 121.

Is beating the spread 70 of 121 significantly better than 50%?

Okay, that's one question we want answered.

So the other question is,

those visiting teams perform 1.46 points below the point spread.

So another null hypothesis we have, Is,

Betting on home dogs of at least eight points,

Will not, on average, outperform the points spread.

In other words, there shouldn't be betting on the home dogs should not do

significantly better than average.

And see, what we found was betting on the home dogs would do 1.46 points better than

average.

Is doing 1.46 points better than the spread, not average there but the spread,

In 121 games statistically significant?

See, if we did ten points better than average on 121 games,

I think you'd have no doubt it was statistically significant.

But what you have to know here is computing the mean and

standard deviation under the null hypothesis.

Okay, so if beating the point spread or

not beating it is what we call binomial random variable.

Each trial is a success or failure.

You have independent trials, in this case betting against the spread in each game.

Same chance of success on each trial,

there is a mean and standard deviation.

You have n trials, probability p in this case of one-half.

So the mean number of successes that you'd expect is just the number

of trials times the probability of success if we could prove this stuff,

and the standard deviation.

And you use this if there's at least 30 trials.

These formulas are right, but basically we're going to assume that

the number of successes in, let's say, 121 trials follows a normal random variable,

which is true by something called the central limit theorem.

If you add up 30 or more independent random variables,

you will get basically a normal random variable.

So the standard deviation is n*p*(1-p).

So we can compute the mean and standard deviation here of the number of games you

would pick right against the point spread if you were 50-50 for a coin flip picker.

And we can see,

at 70 out of 121 is significantly better than average by doing a z-score.

Remember, z-score, Is actual,

Value minus expected.

And then you divide by the standard deviation.

So we can do that in each of these two places.

Okay.

So we have n is 121, p is 0.5.

So we'd expect on average 121 times

0.5 equals 60.5 successes.

And the standard deviation, Of this should be a square root.

I was right in the area.

So it'd be the square root of 121

times 0.5 times 1 minus 0.5.

5.5.

Okay, so what we saw was 70 successes.

So the z-score, Would equal the number

of successes you saw minus the expected divided by 5.5.

And this'll be very close to plus 2.

Okay, that will be 9.5 divided by 5.

So betting the home dogs,

If they're graded equal to eight points,

performs 1.9 standard deviations above average, than expected.

Which is pretty close to being significant.

Two standard deviations to the 5% level.

This is pretty close to being significant.

I should get more data, and maybe we could go above the plus 2 level.

But I'm firmly well convinced that basically betting home dogs,

when they're big home dogs, works pretty well.

And the reason for that is, I believe, people who bet overvalue favorites.

And so when the bookies say you're an eight point favor,

you're probably really only a seven point favor,

because they can bump it up and still get half the money on each side.

And there's a lot of debate whether the bookies will,

on big point spreads, not set a fair spread,

because they can make more money with that approach than setting a fair spread.

So I'm not sure that's ever been discussed, but

it's pretty well known people will overbet on favorites, so

then you can push the spread up and maybe get more of the money on the wrong side.

On the average, more bets on the wrong side.

But betting big home dogs does pretty well.

Now what about the 1.46 points?

Okay, so we have 121 games, and

basically we're looking at the mean points

by which our system outperforms the spread.

What we need to know is, what's the standard deviation of a sample mean?

Standard deviation of a sample mean, Of size n,

Is known to be the population standard deviation

divided by the square root of the sample size.

Okay, so what's the standard deviation of the amount by which we outperform or

underperform the spread?

We know that's 14 in the NFL.

So the standard deviation of the mean

deviation from the spread in 100 games,

121 games,

Is 14 divided by the square root of 121.

And that happens to be 11, square root of 121, so that's easy to do.

1.27, so the z-score for what we saw was the home

dogs did 1.46 points better on average.

We'd predict them to do zero points better than average,

if we have no predicted value.

And we would divide that by 1.27 to get a z-score.

And that z-score would be,

1.15.

So the home dogs on point deviation about the spread.

This isn't quite as significant.

They did better than expected.

Performed 1.15 standard deviations better than average, better than expected.

So we wouldn't really discard the hypothesis that the teams that are eight

point dogs or bigger perform against the point spread as expected.

I mean, we have evidence that they do better, but it's not significantly better.

But on beating the point spread we're pretty closet to significance.

I mean, any system that performs about two standard deviations above average

seems pretty good.

So betting on the home dogs when they're big home dogs,

let's say eight points or more, seems to do pretty well.

Okay, so that gives you some idea how you can evaluate a gambling system,

and we'll give you some homework problems and

a test question that test your understanding of this.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.