Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

Loading...

From the course by University of Houston System

Math behind Moneyball

33 ratings

Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

From the lesson

Module 3

You will learn how Monte Carlo simulation works and how it can be used to evaluate a baseball team’s offense and the famous DEFLATEGATE controversy.

- Professor Wayne WinstonVisiting Professor

Bauer College of Business

Okay let's change sports and let's do a quick football simulation,

like what's the chance of getting a first down.

Okay, let's suppose that you could run a play and basically the number of

yards you gain and the probability you gain it, we'll just make something up.

So you might lose two yards, probably .1,

gain zero yards with probably .3.

Gain 4 yards for probably .35.

And 65, 75 and gain 5 yards probably .25.

So those probably add to one.

So you're going to go for it on fourth down more on that later in the class.

But let's suppose in four downs.

So you will go for it on fourth down if you need to.

What is the chance you get a first down?

Okay, so we're going to play this out once and

then use the data table to play it out let's say 5,000 times.

Now, the trick is, how do I use the RAND function to generate the minus two

yards or zero yards or four yards or five yards with the given probability?

Well, it's easy.

See, if I want 10% chance of minus 2 yards,

I'd say RAND less than or equal to 0.1.

Gives me that, that'll happen 10% of the time.

And then the next 30% if I say .1 okay, less than RAND,

let's say RAND less than .1 rather than less than or equal to,

doesn't really matter because Excel does I think 8 or 16 decimal points.

But if I would say .1 is less than or equal to RAND,

is less than .4, 30% of the time I gain zero.

And now I need 35%, so I take 0.4 less than or equal to RAND less than 0.75.

And then I'd say, if the RAND is greater or equal to 0.75, I get this.

So 35% of the time, I get a RAND between 0.4 and

0.75, because that's the difference between those.

It's 35% at the interval between 0 to 1.

So I can use a vertical lookup.

We did a vertical lookup also I believe earlier in the course, but

I don't think we talked about v lookup truth.

So if you used the last drawer to commit true, v lookup with a true,

you get what's called an approximate match.

You could also leave out.

And so what you need to do is write down the random numbers.

Start with zero.

And then you put the next cutoff, which would be 0.1, and then the next cutoff,

which is 0.4 from where you change the yards gained [INAUDIBLE] 0.75.

And then you put the yards over here.

Which would go minus two, zero, four and five.

And then, to simulate how many yards you gain on the play,

this is called the discreet random variable.

You could list to that.

We'll learn about continuous random variables later, when we try and

simulate the NCA tournament.

And then we'll learn how to simulate a normal random variable to figure out

the odds of teams winning the NCA tournament.

[NOISE] Now, if we get less than or

equal to .1, what will Excel do?

It will stop at the approximate match will be 0 but

I get a number between .1 and .4 the approximate match stops at .1.

Okay, if I get a number between .4 and .75, the approximate stops at the .4 row.

And then we go to the second column and pick up the right number.

Get .75 or higher, the approximate match makes it 0.75 when we get 5 yards.

And that'll give the right probabilities.

See anything between 0.1 and 0.4, for example, occurs 30% of the time, and

then I get 0 yards gained.

Okay, so now I can figure out downs one through four.

Pick a random number.

And I can get yards gained.

And then how do I know to get a first down?

I want to estimate the probability of getting a first down.

How to estimate a probability, very important by simulation.

Give yourself a one if the event happens.

Zero otherwise.

The mean of that cell is the probability of the event.

See supposed you run a simulation 100 times, you get 60 1's and 40 0's.

Well, that means the event happened 60% of the time and

the average of that cell would be 60 * 1

+ 40 * 0 / 100 which would be 0.6.

Okay, so just give yourself a one if something happens.

Otherwise, take the mean of the cell and

you will get estimate of the chance of the event happening.

Okay, so let's generate the yards gained.

Then how do we know we got a first down?

We got a first down.

We compute total yards through each down.

Then we got a first down.

If the maximum of the total yards column, it's greater than or equal to 10.

You don't just look at fourth down, because I might lose yards on fourth down,

so this is a tricky problem.

Okay.

So I have a column for total yards.

So use a RAND function right here.

And then the yards gained, I'll use a v lookup like we talked about.

So I do v lookup on a random number.

Use these two columns.

I'll dollar sign that by hitting F4 in the second column.

And I don't need word true but let's put it anyways.

Let your true yardage come shining through.

So, now 35% is between .1, well let's just copy it down.

Okay, so the 0.006, I don't get to the 0.1, I stop here.

I get minus two.

The 0.27, I get to the 0.1 because 0.27 squared would go to 0.1,

and less than 0.4, I'd get 0.

The 0.91 is greater than the 0.75; I'd get 5 yards.

The 0.42 puts me on 4 yards.

So, total yards gained, it would just be what I gained on first down, and

then I would take the total yards gained so far plus what I gain on the next play.

And if anywhere I see a number greater than or equal to ten,

I made the first down.

See there's a case I gain nine yards.

Okay, there's a case I made on fourth down I made the first down and got 14 yards.

So did I get the first down?

I would say if the max of this column is greater and

equal to ten, give yourself a one otherwise a zero.

That's how this works.

If this is true, then I get the one otherwise a zero.

And if you want to see how a formula works,

you can go up to Formulas > Evaluate Formula.

And just click it.

So it says it's the max in 11-M14.

So it's three greater than or equal to ten, that's false and

then it's going to give you a zero.

Now maybe if I hit F9, let's see what happens.

Okay, there I got a first down.

So I would do evaluate formula.

Is 11 greater or equal to 10?

The answer is that's true, and so it defaults to the first option, which is 1.

Okay, so I want to run this, let's say 5,000 times.

So again I put a one, and then I do home field series,

columns 1 though 5000.

That's the output cell.

We do this over and over.

So again I'll do a one-way gated table.

The column of blank cells.

So I go to here I go data, what if analysis,

data table, column input cell is a blank cell.

No row input cell.

Now, I gotta hit F9.

Well, I don't have to hit F9 to get, because the rolls are different.

Okay, so if I hit F9.

So probability of first down.

I would average this column.

And I'll get a different answer each time because simulation only gives

an approximate result.

So that's like a 53, it's almost 50/50.

Okay, and that's a hard problem to solve.

I mean, analytically, I probably could do it, but it's not easy.

I mean, it's pretty much stabilizing at 51 to 53%.

So it looks like maybe 51% to

52% chance of a first down.

Of course, the more iterations you run, each row is an iteration.

But more exact will be your signal estimate.

But it looks like 51-52%.

So that's I think a really good example of my parallel simulation.

Coursera provides universal access to the world’s best education,
partnering with top universities and organizations to offer courses online.