Hello everyone, this is Tural Sadigov.
In this lecture, we will continue fitting
SARIMA models into different real-world datasets.
In this lecture specifically,
we're going to look at time series from agriculture.
So objective is to fit SARIMA model or different SARIMA models to
milk production data from
Time Series Data Library and forecast future realities of the examined time series.
So just like before,
we're going to go a few steps into modeling.
In other words, we're going to look at the time plot,
we're going to transform if we need it,
we're going to look at differencing,
and we're gonna use ACF and PACF to determine the order of our models,
and then we're gonna try a few different models,
keeping in mind the parsimony principle which we highlighted last lecture,
and we're going to choose the minimum AIC.
At the end, we're going to look at our residuals,
we're going to look at the time plot ACF, PACF,
and we're going to look at the Ljung-Box test to
see if there is any autocorrelation left in the residuals.
And the parsimony principle that we agreed on is
that if we add our parameters in a SARIMA model,
we want it to be less than or equal to six.
Now, Time Series Data Library is created by Rob Hyndman.
He's a professor of statistics at Monash University in Australia,
and the web link is provided to The Times Series Data Library.
We're going to look at monthly milk production from agriculture.
It is basically monthly milk production pounds per cow.
Data is from 1962 till 1975.
Then the source is Cryer.
So, if we plot the data,
actually we can see that there is definitely a trend
going up and there's definitely a periodicity seasonality in the data.
If we zoom into the first two years, let's say,
we can definitely see that there is seasonality in our data,
and the span of the season is twelve months.
If I look at a ACF and PACF,
ACF already suggests that is some cyclic behavior going on,
so we will have to do some differencing.
So what we do, we look at the non-seasonal differencing because we have a trend,
and we have a seasonal differencing.
So we don't need to transform the data,
but we still have to do the differencing,
and if we do want seasonal and one non-seasonal differencing,
we obtain the following data.
Now, this data must be a stationary data.
Stationary data means there shouldn't be any change in the trend or variance,
but in this case,
we see a little spike around here – there are two spikes.
Assuming that those were outliers,
we're going to assume that this is a stationary time series,
and we're going to try to fit a SARIMA model.
Now ACF of the difference data and PACF of the difference data is given right here.
So, as we can see that ACF has
no spike in a closer lags but there is a spike on the seasonal lag,
which means we do not expect any moving average term but
we expect a seasonal moving average term,
maybe seasonal order would be up to three because we have a significant spike at lag 12,
at lag 24, and at lag 36.
Now, if you look at PACF,
again there is no autocorrelation in the beginning lags – in
these lags here – which would suggest that we do not expect any autoregressive terms,
but there are spikes on lag 12 and lag 24,
and then that might suggest that maybe we might have to two,
order up to two of the seasonal autoregressive terms.
So, ACF will tell us that q is zero,
but the Q, the capital Q can be up to three.
PACF tell us that p is zero,
but capital P can be up to two.
And we run our command,
but keeping in mind the parsimonious principle which means that
some of the parameters will be have to be less than or equal to six.
As you can see, even the last model,
if you add this parameters,
you will get exactly six.
And we are trying to find the smallest AIC value – smallest AIC value is 923.3288.
Smallest SSE value is 4618.498.
But, if you're going to choose smallest AIC,
and we can actually see that there is
no significant autocorrelation left in the residuals,
so our model is going to be SARIMA zero, one, zero, zero, one,
and one with the span of seasonality 12.
And if we fit this using SARIMA or ARIMA command or routine in R,
we obtain that our coefficient for seasonal moving average term,
which is right here,
and then p-value is so small that this is very, very significant.
If you look at the residuals,
we see that this is almost white except maybe one spike here,
where we ignored, then thought that maybe this is an outlier,
and there is no significant autocorrelation.
There is systematic departure from normality,
but I don't see any,
we don't see any small p-value which will tell us,
which tells us that there is no significant autocorrelation left in the residual.
So we can assume that actually residuals are,
at this point, is a white noise.
Now SARIMA(0,1,0,0,1,1)_12 model is basically the following.
One, non-seasonal differencing, that's 1-B.
One seasonal differencing, that's 1-B^12.
12 is the span of the seasonality.
There is no autoregressive terms.
And there's one seasonal moving average term which will tell me 1+ theta B_to_the_12 Z_t.
If we expand it,
this becomes our model.
Now, according to our estimation,
theta_hat, the estimation for the theta is -0.6750,
which leads us to the following model,
that X_t depends on previous lag,
the previous year, the same month,
and X_t_minus_13, and then there's noise at this point,
but the noise from the last year.
And Z_t, we use the normal with variance 34.47.
Now, if you forecast using our forecast routine from the forecast package,
we get our forecast right here.
This piece, this part is 80% confidence interval,
and then upper one is 95% confidence interval for our forecast.
In fact, if we write forecast(model),
we can actually see a point estimation or also confidence interval,
80% confidence interval and 95% confidence interval for the next two years, but here,
I put only next year but,
by default, it gives you next two years.
So what have we learned?
We have learned how to fit a SARIMA models to
a milk production data from Time Series Data Library,
and we learned how to forecast the future values of this examined time series.