0:05

In this session,

Â what we're going to be looking at is how to put together a basic forecasting model.

Â So when we're looking at forecasting, it might be from a finance perspective,

Â trying to forecast what stock price is going to look like.

Â From a marketing perspective perhaps we're trying to forecast consumer demand,

Â and what is our revenue going to be for products.

Â And maybe we're trying to do that at national level,

Â maybe we're trying to do that at a store level.

Â What we'll get into in the next session, maybe we're even interested in doing that

Â down to the level of the individual consumer.

Â 0:35

So what I've put together here is just a plot of weekly demand for

Â a product over the course of a couple of years.

Â And so, we see that there are going to be some ups and downs,

Â some fluctuations on a week to week basis.

Â But we also detect what might be a positive upward trend in our data set.

Â So, for example, if we were to look at just trying to put together

Â 1:01

the growth of this product over time, perhaps what we see is that incline.

Â So it seems like we're growing over time,

Â from week to week, we're seeing more baseline sales.

Â Now we've been throwing in that trend line.

Â It's not enough to capture all of the fluctuations, all the spikes and

Â valleys that we see in the data.

Â So what we want to look at today is

Â what are the different components that we need to put into a forecasting model?

Â How do we assess if those components are necessary?

Â And how do we use our forecasting models to project out into the future?

Â 1:35

So, I'm going to cover briefly a couple of different methods,

Â smoothing methods and auto-regressive methods,

Â before we end up focusing on regression-based forecasting models.

Â And those regression-based models, that's going to be the workhorse that we're going

Â to keep on coming back to because they provide us with much more flexibility than

Â smoothing and auto-regressive methods alone are going to provide.

Â 1:59

So if we look at smoothing models,

Â the underlying belief here is that the past is the best predictor of our future.

Â And more specifically,

Â the recent past is going to be the best predictor of the future.

Â So, for example,

Â if I'm trying to predict what are sales going to be next week, or next month,

Â or next quarter, well let me look to the most recent week or month or quarter.

Â All right, and that's going to serve as a baseline for me.

Â But I'm not just necessarily going to look back one week,

Â one month, one quarter at a time.

Â I'm going to look back multiple periods over time, and that's going to incorporate

Â some of the fluctuations that we see from one time period to the next.

Â So the idea with smoothing models is going to be,

Â let's take a bunch of recent observations and essentially average them out.

Â 2:49

Right, so we know that If we're focusing on weekly demand,

Â there are going to be those natural fluctuations from week to week.

Â And we're not interested in picking up what those random,

Â short-term fluctuations are.

Â What we want to capture is the underlying level of demand.

Â So as I'd said before, we're going to take an average.

Â And the hope is that that's going to smooth out some of

Â those short term fluctuations.

Â So, for example, in one week demand is going to be above average.

Â In another week demand is going to be below average.

Â If both of those observations are included

Â in our smoothing model they're going to cancel each other out.

Â 3:28

So if we look at kind of the simplest model that we can use is what's referred

Â to as a simple moving average.

Â So I'm going to take a period of time, length L.

Â And let's say L is going to be four weeks.

Â Well, if I want to predict what is next week's demand going to look like, I'm

Â going to look back at the most recent four weeks, and I'm going to take an average.

Â And that's all that this equation is formalizing for us.

Â So, in the numerator, what we're doing is we're just adding up the most recent

Â four observations that we have, the most recent four levels of demand.

Â And then in our denominator,

Â we're dividing by how many periods have we accumulated?

Â So we're taking that simple average.

Â All right, well, there may be some problems associated with this.

Â So, for example, when we're doing this,

Â we're saying all four of those weeks are equally informative.

Â Well, what if you believe that the most recent observation is more informative

Â than the observation that's furthest in the past?

Â This model wouldn't be enough to account for that.

Â What if we have something like seasonality in the data, and I'm on the border between

Â different seasons or I'm on the border between different quarters?

Â Well, the simple moving average model, not going to be able to account for

Â that on its own.

Â 4:43

So to alleviate one of those issues,

Â we're going to move toward what's referred to as a weighted moving average.

Â So rather than putting an equal amount of weight on each of the observations.

Â So in our last example where we said L equals 4,

Â we're going to look back at the foremost recent weeks of observations we

Â are essentially putting one quarter or 25% weight on each of those observations.

Â Well what we might want to do is put more weight on the most recent observation.

Â So perhaps we decide to put 50% of the weight on the most recent observation.

Â Well, now I need to allocate that remaining 50% of the weight among

Â the remaining three observations.

Â So this is going to give us that additional flexibility to say,

Â perhaps some weeks or some observations are more informative than others.

Â 5:45

Where we're weighting each of our observations Y, by the weight based on how

Â far into the past they are, and that set of weights is given by the Ws, all right?

Â And then, in the denominator, we're just adding up what those weights are.

Â Now these weights, think of them as probabilities,

Â think of them as being between 0 and 1, all right?

Â So imagine, for example, if we were to say each weight is going to be equal to 1?

Â All right, well if each of these weights is equal to 1,

Â the top term is just adding up my Y value.

Â So it's just taking the sum over those Y values.

Â The denominator just becomes the number of observations that we have.

Â So if we plug in weights equal to 1, we're back to using our simple moving average.

Â If we think of these weights as probabilities,

Â in the denominator, the sum of the weights is going to add up to 1.

Â So what we're left with is just a weighted sum where we get to determine

Â what are those weights W.

Â So if you believe that the most recent observation is more valuable,

Â perhaps we say the most recent observation gets a weight of 50%.

Â The next observation gets a weight of 30%.

Â The next observation gets a weight of 15%.

Â The final observation, which they get to weight of whatever is remaining.

Â Now, these weights don't have to be between 0 and 1,

Â you can put in any positive integer, or any 0 and above, any number that you like.

Â But if we think about these weights as proportions, that's going to make it

Â a little bit easier for us to understand how this weighting is happening.

Â So a simple moving average,

Â all that it's doing is saying all of our weights are equal.

Â So the weight is really just 1 over the number of observations that I have here.

Â Much more flexibility, we get to decide ultimately what those weights are, and

Â hopefully that is going to be informed based on what we've learned in the past.

Â A set of values that we've seem to be particularly predictive.

Â 7:51

So, where do these modules potentially break down,

Â and why do we need anything different?

Â Well let's suppose that there's a trend in the data.

Â We'd seen in that first graph that I showed,

Â that it seemed like there was a positive trend, that it was growing over time.

Â Well let's think for

Â a second what's going to happen if you're using a smoothing model that says

Â let's look back at the most recent set of observations to try to predict the future.

Â Well If there's a positive trend as we go further out into the future,

Â we're expecting more and more.

Â We're expecting to have more growth.

Â But if I'm relying on observations from the past to predict the future,

Â that growth isn't going to be taken into account.

Â So any time that there's a trend, the simple moving average or

Â the smoothing model's not going to be able to capture that for us.

Â Now if we're focused primarily on the short term, that's probably going

Â to be where we have the best application of these models,

Â particularly if we're staying within a season or within a quarter, and

Â we're trying to understand what's going on from week to week.

Â Now we're going to see this come up again, or

Â the idea behind it come up again, when we look at regression models.

Â When we think about a regression model,

Â what's going to help us predict the future?

Â Well one of our predictors might be past sales, past levels of demand.

Â So we can think of this as incorporating lagged variables into our regression

Â analysis.

Â So it's going to take the idea from smoothing models that there is information

Â contained in past y variables, but

Â let's bring those in as predictors for the future.

Â 9:30

All right, so I said that smoothing models, maybe that's not enough.

Â Maybe we need something that's a little bit more flexible.

Â Well, that's where an auto-regressive model can come into play.

Â And what I've put up here is just a simple structure using the most recent

Â two periods to predict what the next period's demand is going to look like.

Â And so, if we look at this, looks pretty similar to a regression model.

Â And, in a sense it is, but

Â we're putting a very specific structure on the predictor variables.

Â Typically, what's on the right hand side of the equation are x variables.

Â Well the only x variables that we're including in this analysis

Â 10:09

are going to be the most recent two observations that we have.

Â So those are our predictors, all right?

Â Well, if those are the predictors, what do the coefficients mean?

Â Well, just like in regression analysis,

Â our intercept value alpha, that's going to give us our baseline level.

Â And then the other two coefficients that we're interested in estimating,

Â in this case, beta 1 and

Â beta 2, those are going to be the weights that we put on our predictors.

Â So very similar to the smoothing models but

Â we're going to be estimating beta 1 and beta 2 based on the data,

Â rather than making assumptions about the value of W for those particular weights.

Â Now, I've included here the equation corresponding to using the two most recent

Â observations.

Â We might go look further back in time and say,

Â perhaps we need to look back three periods, four periods.

Â Well generalizing this model is going to be as easy as adding a few more terms.

Â Adding Beta 3, and that's going to be multiplied by

Â the Y value at time, t-2, and if we wanted a Beta 4,

Â going back to the Y value of time, t-3.

Â So that's another decision that we have to make.

Â How many lag variables do we need to include in this particular model?

Â And that's something that model fit criteria can reveal to us.

Â Just how far back into the past do we need to go?

Â Now, one of the things to keep in mind, the more terms I put in the past,

Â I need that many more observations when I'm making my predictions of the future.

Â So, for example, if I'm looking back and

Â I require four data points to predict the future,

Â I canâ€™t make a prediction until I have at least those four data points.

Â So there's a trade off that we're going to have to wait longer for

Â those observations to come in if I want to include them in my analysis.

Â