Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

Loading...

From the course by University of Houston System

Math behind Moneyball

25 ratings

University of Houston System

25 ratings

Learn how probability, math, and statistics can be used to help baseball, football and basketball teams improve, player and lineup selection as well as in game strategy.

From the lesson

Module 8

You will learn how to use game results to rate sports teams and set point spreads. Simulation of the NCAA basketball tournament will aid you in filling out your 2016 bracket. Final 4 is in Houston!

- Professor Wayne WinstonVisiting Professor

Bauer College of Business

In this video we want to introduce you to the concept of regularization,

which has become very important recently in the forecasting of statistics.

So the idea is if you fit past data and you try and

predict the future, basically you don't do as well If you use for

prediction the perfect fit to the past data.

And that's just because things regress to the mean.

We'll hopefully have a video on that kind of thing later.

And so I want to show you how regorization might work on the context of our NFL

forecasting model.

The first one we'll predict the score of the game.

So, here we've got the 2013 NFL season.

Okay and so let's fit our software model, let's use the first five games.

That minimize the sum, the squared errors of the first five games and

see how well we do for the rest of the season.

So this'll be sum, the squares errors.

First five games.

And so first five weeks is roughly.

Well, let's just suppose we used the first 70 games.

The first five weeks would be probably about,

I don't have exact dates here and I'm not going to fuss about.

But let's go through the first 77 games.

So we use first 77 games

And then so what that would be, our sum of squared errors,

would go down through the 77 game.

Where is that?

[NOISE] Okay.

So and then we can get the sum of squared errors, the rest for

the rest of the season.

And okay again using the predictions that we make

when we're only fitting the model to the first 77 games.

So that would be from here on.

Okay.

So now the target cell was the first 77 games,

so if I run the model it's basically roughly the first five games.

Okay, so let's see what happens.

I don't have to change anything in this algorithm model.

But you'll see you get some weird numbers.

So for the first five games, okay,

you get Jacksonville's 18 points worse than average, I mean that's just horrid.

The Steelers are 11 points worse than average,

the Rams 11 points worse than average.

Okay, and so now if you predict the rest of the season, okay so

let's remember what we got here.

The sum of square errors for the rest of the season was 43,000.

Okay.

Now what regularization says, let's make a copy of this sheet.

We'll call it regular resumption.

It says basically you should push your ratings closer to zero.

Okay, now if I do the sum of the squared ratings.

And I could just do summer product, the ratings column with

the ratings column.

The question is how you should push things in and that requires a lot more research.

I've got a great student Eric Webb who has been working with me in that regard.

I hope, when the results get published.

[INAUDIBLE] See I get 1,700 if I take the ratings squared.

Basically, the rating of the Cardinals squared plus the ratings of

the Falcons squared.

So, that's a measure of sort of how spread out these numbers are that

the square deviation from the mean of zero.

Okay so let's suppose the question is how should I push this in.

So let's use a thousand.

Okay, and that it's very simple we just say

If I take the sum of the squared ratings.

And the question is, in each week of the season how far

should you push it in to best predict the future and

this is really the key to successful forecasting in any team,

alignment, the text and so let's say we do less or equal to a thousand.

Okay, so remember the sum of square errors was 42,622.

No regularization.

Okay.

So now if want to regularize again I have the same target cell the first 77 games.

But I simply add a constraint That the ratings are less than,

the sum of square ratings is less or equal to this number.

Okay, and I hope I do better for the rest of the season.

It should make Jacksonville not a minus 18.

Okay, so Jacksonville got pushed to a minus 14.

Now look at this.

The sum is with regularization.

The sum of the squared errors was 36,522.

I used 1,000.

Now if I use 500, all I'd have to do is change this to 500.

And see you could go through every season after each week and

see what the optimal regularization constant is.

Okay, and that's sort of what give.

Okay [INAUDIBLE].

But okay, if I use 500.

See the SSE was even the better but this was only one season of data.

We'd have to go through and do this for many seasons of data and

basically, where do you find scores of games?

Let me just show you.

Okay, So

if I would go to profootball-reference.com, seasons.

So if I say 2013 schedules and results there I've got the scores of the games.

Okay, right there.

Week-by-week scores of the games there that we can export and stuff like that.

Okay, so I mean, I can get the data and do analysis on the optimal way

to [INAUDIBLE], which I think would be interesting.

Okay, well, that's it for this video.

We have one more rating system to talk about, just rating teams based on wins and

losses.

That we'll get to in a couple of videos.

But I think we'll take a break and

talk about how you can simulate the end safe tournament.

And the NFL playoffs now that we know sort of how to rank teams based on the past

history of the teams.

Coursera provides universal access to the world’s best education, partnering with top universities and organizations to offer courses online.