0:01

Welcome back to Practical Time Series Analysis.

Â We've talked in our previous lectures about the fundamental driving mechanisms

Â that give rise to stochastic processes that create the time series that

Â you might be interested in analyzing.

Â We've looked at moving average processes, auto regressive processes,

Â we've included trend, we've included seasonality.

Â We're going to take a bit of a different approach in this lecture and

Â begin to talk about forecasting.

Â There are many methods here.

Â We'll start with a very basic method, Simple Exponential Smoothing,

Â which does enjoy widespread application in business and industry.

Â It's something that people really do.

Â We're going to try to make predictions about the future or forecasts,

Â let's say, based upon data that we already have available to us.

Â So you might be interested in predicting sales figures for the upcoming

Â holiday season based upon what you've seen over the last several seasons.

Â You might be interested in ridership on a train, a railway system.

Â There are all sorts of reasons people have for looking at what's happening, or

Â making good guesses about what's likely to happen in the future based upon data that

Â you already have available.

Â In this particular lecture, we'll use Simple Exponential Smoothing.

Â And you'll be able to do this with time series data that you

Â find interesting to make a simple forecast.

Â As is often the case in these lectures, an explicit goal is, you should be able to

Â explain Simple Exponential Smoothing to a friend or a colleague.

Â What is it?

Â How do you do it?

Â What does it do for you?

Â 1:47

The data set that we'll be examining here is on London rainfall,

Â primarily in the 19th century, getting into the 20th century a little bit.

Â There's a nice discussion in A Little Book of R for Time Series,

Â and you can also access the original data.

Â 2:10

Rather than just using a built-in data set from R,

Â let's expand a little bit and grab these data right off the Internet.

Â R has a nice facility, the scan command, that will allow you to go to a website,

Â grab some data, and store it in an array.

Â Once we have our numbers available to us, we'll create a time

Â series object to get a little bit of structure and makes some calls.

Â Now, if you've never thought about rainfall in London before,

Â then it's good to do even the most vanilla things.

Â Let's get a histogram of our rainfall data.

Â We'll take a look at the distribution.

Â We'll almost reflexively take a look about whether it's normally distributed or not.

Â 3:09

Looking at the times series as a sequence, we look for

Â the sorts of patterns we like to observe.

Â If you feel that that's very difficult to do just by looking at the sequence,

Â pull up the auto correlation function and take a look.

Â 3:37

But maybe it's there, and I'm just not seeing it.

Â We'll call auto.arima to see if we can see if we can get nice, fitted model.

Â And even auto.arima says, no, sorry.

Â The coefficients, the autoregressive moving average coefficients, nothing.

Â But we do get an average so there is a model.

Â It's just a very simple model.

Â The model's just 24.8 So

Â in light of this, we'll try to do a little bit of forecasting.

Â 4:12

There are different notations that people use here.

Â And we'll look at one of the common ones in this lecture, and

Â we'll see another common one in the next lecture.

Â Where we'll let the subscript tell us where we'd like the forecast.

Â So h is how many periods into the future you'd like to look.

Â Maybe this is Tuesday and you'd like to look for Next Tuesday.

Â So h would be 7, if you have daily data.

Â The superscript tells you what data you're using when you're

Â making your forecast, data up through time step n.

Â The most naive forecasting method I can think of, is to say that what is going to

Â happen tomorrow, our forecast for tomorrow is just what was happening today.

Â That's considered a naive method.

Â 5:00

In the notation that we've developed, we would say x subscript n+1,

Â there's the next period.

Â Based upon data available at n is just your observed value at time period n.

Â Now, some data have a pretty obvious seasonality to them.

Â And we would say something like, the forecast that we'll make for

Â the next time period, n+1 based upon data available up through and

Â including time n is what was happening one season ago.

Â So if we're dealing with weeks, capital S there would be a 7.

Â Another way of thinking about your forecasting is to say that

Â we'll predict what's going to happen in the next period is just

Â an average of what's happened previously.

Â Simple Exponential Smoothing tries to do a little bit

Â better than just a plain old average.

Â It's going to develop a weighting of previous values.

Â So in our current data set, we're going to try to predict

Â the rainfall, the London rainfall in a future period based upon the data.

Â And we'll try to be aware of updating on our data set.

Â So instead of just taking an average and including all of the data points equally,

Â what we'll try to do is try to weight the data points that are closer to

Â us a little bit more and those that are further away a little bit less.

Â We're more formal in the reading.

Â In the readings, we'll deal with geometric series and

Â see that we can weight our averages through geometric series.

Â We'll also show, and this is not very deep,

Â that rather than include the infinite number of data points, what you can do is

Â treat this as a weighted average of, we'll say, for instance, you can see right here.

Â We'll start with some data value, x sub 1.

Â And we'll make a forecast for x sub 2 just based upon x sub 1,

Â we have a pretty meager amount of information as we just get started.

Â 7:08

Then we will say, okay, so if you would like to make a forecast for

Â time period three based upon data available in time period two,

Â let's take our previous smooth level value,

Â our previous averaged value, give that a weighting of 1-alpha.

Â But we'll update it by looking at the freshly available data point x sub 2.

Â This is the common pattern.

Â We'll take alpha times your freshest data point

Â + 1-alpha times your previous forecast.

Â So if you'd like to make a forecast about time period 4,

Â you'll take alpha times your new value at time 3 and

Â add to it 1-alpha times your previous level or your previous forecast.

Â Some of us learn by writing code, and that's what we'll try to do right now.

Â It's rather simple, we just need a fore loop.

Â 8:06

So in our little DIY, do it yourself code, we'll let alpha = .2.

Â And that's totally unmotivated at this point.

Â We'll see how to choose a good alpha in just a moment.

Â But we'll let alpha = .2.

Â We'll create a vector forecast.values, and we'll set it to NULL.

Â We're just trying to establish array that we can use in our loop.

Â n, of course, is the length of the data that you have available to you.

Â 8:33

Your first forecast is just going to be your first data point.

Â And now, we'll loop to get more forecasts.

Â So we'll create forecast values as alpha times your updated your freshly available

Â data point, + (1-alpha) times your previous forecast, your previous level.

Â A little before formatting, we'll use the past command so

Â it looks nice on the screen when we actually give our forecast.

Â 9:03

So for the year 1913, based upon data available up through,

Â including the year 1912, the forecast value using the unmotivated,

Â almost random alpha of 0.2 would be 25.3 inches of rain.

Â But let's drill down on this a little.

Â How could you choose alpha intelligently?

Â How much weighting do you want to give to values that are close at hand?

Â And how much weighting do you want to give to values that were further away?

Â In this particular data set, it looks like the best alpha, best in

Â terms of making our sums of squares errors or SSE as small as possible.

Â Best alpha seems to be really rather small.

Â Back around, it's hard to read off of this picture, and so

Â I've blown it up here, back around 0.024 or so.

Â We use the SSE approach, so we'll make a forecast for time period three.

Â And then, we'll compare that to the actual data point that we have available at time

Â period three.

Â Make a forecast at time period four,

Â compare it to the actual data point at time period four.

Â In each time, we'll, of course, take a different square and

Â then add them up to get an aggregate error.

Â 10:24

Now, such a common approach, of course, people have written routines for you.

Â HoltWinters, these are names that will

Â become famous to even us as we look into the next lectures.

Â HoltWinters is a routine available in R, and

Â it implements the work of these two mathematicians.

Â This is from the years 1957, 58, 1960, there abouts.

Â We'll grab the time series for rain.

Â There are going to be three parameters that we'll be keeping track of in the next

Â couple of lectures.

Â We'll deal with level, trend, and seasonality.

Â So this is a little unmotivated at this point,

Â but we'll turn the trend and the seasonality flags to FALSE.

Â And we'll just come up with quick forecasts.

Â 11:18

We predicted, or we established, a decent value alpha 0.024.

Â And you can see here, a more sophisticated routine is coming up with an alpha

Â value really very close to that.

Â 11:34

You can make a prediction.

Â You should do this.

Â Take the code that we developed a few screens ago.

Â And instead of alpha is 0.2, substitute this alpha value 0.02412151 and

Â you should come up with the same prediction that the routine does.

Â So you should be able to come up with the same forecasts for

Â 1913 as the HoltWinters routine does.

Â What we see in this picture is the smoothed average in red.

Â These are all of your forecasts right here.

Â And I've superimposed that over the general time series plot.

Â 12:16

At this point, you should be able to use Simple Exponential Smoothing to make

Â a simple forecast.

Â And you should be able to, in broad strokes,

Â explain Simple Exponential Smoothing to a friend or to a colleague.

Â