Wangsheng Wu

Notes

Chapter 5

1. Preliminary Estimation
2. Maximum Likelihood Estimation
3. Relative Efficiency of Estimators
4. Diagnostic Checking
5. Forecasting
6. Order Selection

Modeling and Forecasting with ARMA Processes

November 12, 2024

Fitting ARMA Model

Fitting an ARMA(\(p, q\)) model, \[ \phi(B)(X_t - \mu) = \theta(B)Z_t \] to a realization of stationary time series involves 3 interrelated steps:
- order selection
- parameter estimation
- model diagnostics
3 steps are iterated until a suitable model is identified.

Parameter Estimation

Given (mean-corrected) data \(x_1, \dotsm, x_n\)
Suppose it is appropriate to fit a (zero-mean) ARMA model to the data
Assuming that orders \(p\) and \(q\) are known, we'll discuss estimation of model parameters
- \(\boldsymbol{\phi} = (\phi_1, \dotsm, \phi_p)'\)
- \(\boldsymbol{\theta} = (\theta_1, \dotsm, \theta_q)'\)
- \(\sigma^2\)

1. Preliminary Estimation

Preliminary estimation is useful for providing initial parameter estimates for likelihood optimization
We'll discuss Yule-Walker estimation for fitting AR models and innovations estimation for fitting MA models
- Both methods can be extended to fitting ARMA models with \(q > 0\)
Burg's algorithm and Hannan-Rissanen algorithm are also introduced in the textbook.

Yule-Walker Estimation; AR model

Consider fitting an AR(\(p\)) model:
\(X_t - \phi X_{t-1} - \dotsm - \phi_p X_{t-p} = Z_t\), \(\{Z_t\} \sim WN(0, \sigma^2)\) to \(x_1, \dotsm, x_n\)
Yule-Walker estimation of \(\boldsymbol{\phi}\) and \(\sigma^2\):
- First \(p+1\) Yule-Walker equations:
  \(\gamma(0) - \phi_1 \gamma(1) - \dotsm - \phi_p \gamma(p) = \sigma^2\)
  \(\gamma(k) - \phi_1 \gamma(k-1) - \dotsm - \phi_p \gamma(k-p) = 0 \) for \(k = 1, \dotsm, p\)
- In matrix form,
  \( \begin{cases} \sigma^2 = \gamma(0) - \boldsymbol{\phi}' \boldsymbol{\gamma}_p \\ \boldsymbol{\Gamma}_p \boldsymbol{\phi} = \boldsymbol{\gamma}_p \end{cases} \)
  where \(\boldsymbol{\gamma}_p = (\gamma(1), \dotsm, \gamma(p))'\) & \(\boldsymbol{\Gamma}_p = [\gamma(i - j)]_{i, j = 1}^p\)
- Replacing \(\gamma(\cdot)\) by \(\hat{\gamma}(\cdot)\) and solving for \(\hat{\boldsymbol{\phi}}\) and \(\hat{\sigma}^2\):
  \( \begin{cases} \hat{\boldsymbol{\phi}} = \hat{\boldsymbol{\Gamma}}_p^{-1} \hat{\boldsymbol{\gamma}}_p \\ \hat{\sigma}^2 = \hat{\gamma}(0) - \hat{\boldsymbol{\phi}}' \hat{\boldsymbol{\gamma}}_p \end{cases} \)

Remarks

Yule-Walker estimation is a method of moments.
The fitted model is causal.
Sample ACVF and ACVF from the fitted model agree at lag up to \(p\).
Sometimes it is more convenient to work with sample ACF instead of ACVF. That is, \(\hat{\boldsymbol{\phi}} = \hat{\boldsymbol{R}}_p^{-1} \hat{\boldsymbol{\rho}}_p \) and \(\hat{\sigma}^2 = \hat{\gamma}(0)[1 - \hat{\boldsymbol{\phi}}' \hat{\boldsymbol{\rho}}_p]\) where \(\hat{\boldsymbol{\rho}}_p = (\hat{\rho}(1), \dotsm, \hat{\rho}(p))'\) & \(\hat{\boldsymbol{R}}_p = [\hat{\rho}(i-j)]_{i,j=1}^p\).
Asymptotic behavior of the Yule-Walker estimators:
- \(\hat{\boldsymbol{\phi}} \sim AN(\boldsymbol{\phi}, n^{-1} \sigma^2 \boldsymbol{\Gamma}_p^{-1})\)
  - \(\hat{\boldsymbol{\phi}}\) is asymptotically efficient (i.e., it has the same asymptotic variance as MLE).
- \(\hat{\sigma}^2 \overset{P}{\rightarrow} \sigma^2\).

Order Selection for AR Models

To fit an AR model, we need to select \(p\).
Two techniques can be used:
- Select \(p\) as the smallest value \(m\) such that \[ \mid \hat{\alpha}(k) \mid \leq \frac{1.96}{\sqrt{n}} \text{ for } k > m \]
- Select \(p\) by minimizing an information criterion, e.g., AIC
  \(\text{AIC} \overset{\text{def}}{=} -2 \log{(\text{MGL})} + 2 (p + 1)\)
  where MGL is maximum Gaussian Likelihood.

Examples

Sunspot numbers for years 1770-1869: \(S_1, \dotsm, S_{100}\)
Consider fitting an AR(2) model \(X_t - \phi_1 X_{t-1} - \phi_2 X_{t-2} = Z_t\), \(\{Z_t\} \sim WN(0, \sigma^2)\) to the mean-corrected data \(X_t = S_t - 46.93, t = 1, \dotsm, 100\).
Given \(\hat{\gamma}(0) = 1382.2, \hat{\gamma}(1) =1114.4, \hat{\gamma}(2) = 591.73\) and \(\hat{\gamma}(3) = 96.216\), we can find the Yule-Walker estimates of \(\phi_1, \phi_2\) and \(\sigma^2\).
Dow-Jones Utilities Index, Aug. 28 - Dec. 18, 1972.

Solving Yule-Walker Equations Using Durbin-Levinson Algorithm

Selection of order \(p\) based on AIC requires fitting AR models of successively increasing orders \(1, 2, \dotsm\).
In this regard, Durbin-Levinson algorithm can be used.
Suppose \(X_1, \dotsm X_n\) are observations of a zero-mean stationary time series.

Durbin-Levinson Algorithm for Fitting AR Models

Provided \(\hat{\gamma}(0) > 0\), the fitted AR(\(m\)) process (\(m < n\)) by the Yule-Walker method is
\(X_t - \hat{\phi}_{m1}X_{t-1} - \dotsm - \hat{\phi}_{mm} X_{t-m} = Z_t\), \(\{Z_t\} \sim WN(0, \hat{v}_m)\),
where \(\hat{\boldsymbol{\phi}}_m\)\(= (\hat{\phi}_{m1}, \dotsm, \hat{\phi}_{mm})'\) \(=\)\( \hat{\boldsymbol{R}}_m^{-1} \hat{\boldsymbol{\rho}}_m \) and \( \hat{v}_m = \hat{\gamma}(0)[1 - \hat{\boldsymbol{\phi}}_m' \hat{\boldsymbol{\rho}}_m] \).
Recall that \(\hat{\boldsymbol{\phi}}_m = \hat{\boldsymbol{R}}_m^{-1} \hat{\boldsymbol{\rho}}_m\) is the coefficient vector of the BLP \(P_m X_{m+1} = \boldsymbol{\phi}_m' \boldsymbol{X}_m\) of \(X_{m+1}\) based on \(\boldsymbol{X}_m = (X_m, \dotsm, X_1)'\), and the MSE of the BLP is \(v_m = \gamma(0)[1 - \boldsymbol{\phi}_m' \boldsymbol{\rho}_m] \).
So, we can apply the Durbin-Levinson algorithm to fitting AR models of orders \(1, 2, \dotsm\) to the data.
\(\hat{\boldsymbol{\phi}}_1, \hat{\boldsymbol{\phi}}_2, \dotsm\), and \( \hat{v}_1, \hat{v}_2, \dotsm \), are computed recursively from Sample ACVF in the same way \(\boldsymbol{\phi}_1, \boldsymbol{\phi}_2, \dotsm \), and \(v_1, v_2, \dotsm\), are computed from ACVF.
If \(\hat{\gamma}(0) > 0\), then for \(m = 1, 2, \dotsm, n-1\),
\( \hat{v}_0 \) \(= \hat{\gamma}(0)\),
\( \hat{\phi}_{mm} \) \(= \frac{1}{\hat{v}_{m-1}} [\hat{\gamma}(m) - \sum_{j=1}^{m-1} \hat{\phi}_{m-1, j} \hat{\gamma}(m-j)] \),
\( \begin{bmatrix}\hat{\phi}_{m1} \\ \vdots \\ \hat{\phi}_{m, m-1}\end{bmatrix} \) \( = \begin{bmatrix} \hat{\phi}_{m-1, 1} \\ \vdots \\ \hat{\phi}_{m-1,m-1} \end{bmatrix} - \hat{\phi}_{mm} \begin{bmatrix} \hat{\phi}_{m-1, m-1} \\ \vdots \\ \hat{\phi}_{m-1,1} \end{bmatrix}\),
\( \hat{v}_{m} \)\(= \hat{v}_{m-1}(1 - \hat{\phi}_{mm}^2)\).

Remarks

Use of Durbin-Levinson algorithm bypasses the matrix inversion required in computing Yule-Walker estimators.
It also gives the sample PACF at lags \(1, 2, \dotsm\);
namely, \(\hat{\alpha}(1) = \hat{\phi}_{11}, \hat{\alpha}(2) = \hat{\phi}_{22}, \dots\).

Confidence Interval for \(\phi_{pj}\)

The Yule-Walker estimator \(\hat{\boldsymbol{\phi}}_p \sim AN(\boldsymbol{\phi}_p, n^{-1}\sigma^2 \boldsymbol{\Gamma}_p^{-1})\), where \(\hat{\boldsymbol{\phi}}_p = (\hat{\phi}_{p1}, \dotsm, \hat{\phi}_{pp})'\) and \(\boldsymbol{\phi}_p = (\phi_{p1}, \dotsm, \phi_{pp})'\).
Let \(\hat{v}_p\) be the estimator of \(\sigma^2\).
Let \(\Phi_{1 - \alpha / 2}\) denote the \((1 - \alpha / 2)\)-quantile of \(\mathcal{N}(0,1)\).
For large \(n\), an approximate \(100(1-\alpha)\%\) CI for \(\phi_{pj}\) is given by \[ \hat{\phi}_{pj} \pm \Phi_{1- \alpha / 2} \sqrt{\hat{v}_{jj} / n}, \] where \(\hat{v}_{jj} is the j^{\text{th}}\) diagonal element of \( \hat{v}_p \hat{\boldsymbol{\Gamma}}_p^{-1} \).

Confidence Interval for \(\phi_{p}\)

Moreover, confidence region for \(\boldsymbol{\phi}_p\) can be obtained.
Let \(\chi_{1-\alpha}^2 (p)\) denote the \((1-\alpha)\)-quantile of \(\chi^2(p)\).
For large \(n\), the region
\( \{ \boldsymbol{\phi} \in \mathbb{R}^p: (\boldsymbol{\phi} - \hat{\boldsymbol{\phi}}_p)' \hat{\boldsymbol{\Gamma}}_p (\boldsymbol{\phi} - \hat{\boldsymbol{\phi}}_p) \leq n^{-1}\hat{v}_p \chi_{1-\alpha}^2 (p)\}\)
contains \(\boldsymbol{\phi}_p\) with probability close to \(1 - \alpha\).

Example

El Niño and fish population (Shumway and Stoffer)
\(X_t =\) monthly number of new fish furnished by the Pacific Environmental Fisheries Group, for a period of 453 months ranging over the years 1950-1987.
The central Pacific warms every three to seven years due to the El Niño effect, which has been blamed, in particular, for the 1997 floods in the mid-western portions of the United States.

Yule-Walker Method; ARMA Model

ARMA(\(p, q\)) model:
\(X_t - \phi_1 X_{t-1} - \dotsm - \phi_p X_{t-p} = Z_t + \theta_1 Z_{t-1} + \dotsm + \theta_q Z_{t-q}\)
Yule-Walker equations:
- For \(0 \leq k < m\), where \(m = \max(p, q+1)\):
  \(\gamma(k) - \phi_1 \gamma(k-1) - \dotsm - \phi_p \gamma(k-p) = \sigma^2 \sum_{j=0}^{q-k} \theta_{k+j} \psi_j\)
- For \(m \leq k \leq p + q\):
  \(\gamma(k) - \phi_1 \gamma(k-1) - \dotsm - \phi_p \gamma(k-p) = 0\)
Replacing \(\gamma(\cdot)\) with \(\hat{\gamma}(\cdot)\), we solve for Yule-Walker estimates \(\hat{\boldsymbol{\phi}}, \hat{\boldsymbol{\theta}}\), and \(\hat{\sigma}^2\).

Remarks

\(\psi_j\) must first be expressed in terms of \(\boldsymbol{\phi}\) and \(\boldsymbol{\theta}\), using \(\phi(z) \psi(z) = \theta(z)\).
The \(1^{\text{st}}\) set of equations is nonlinear in \(\boldsymbol{\phi}\) and \(\boldsymbol{\theta}\), leading to possible nonexistence and/or non-uniqueness of solutions.
For MA and ARMA processes, the Yule-Walker estimates \(\hat{\boldsymbol{\phi}}\) and \(\hat{\boldsymbol{\theta}}\) are less efficient than MLE.

Example

...

Innovations Algorithm

Yule-Walker estimates for AR models can be found by applying the Durbin-Levinson algorithm with ACVF replaced by sample ACVF.
Innovations algorithm estimates for MA models are obtained in the same fashion.

Fitted Innovations MA(\(m\)) Model

Given the mean-corrected data \(X_1, \dots, X_n\), the fitted innovations MA(\(m\)) model, \(m = 1, 2, \dotsm, n-1\) is
\(X_t = Z_t + \hat{\theta}_{m1}Z_{t-1} + \dotsm + \hat{\theta}_{mm} Z_{t-m}\), \(\{Z_t\} \sim WN(0, \hat{v}_m)\).
The innovation estimates \(\hat{\boldsymbol{\theta}}_m := (\hat{\theta}_{m1}, \dotsm, \hat{\theta}_{mm})'\) and \(\hat{v}_m\) are obtained from the innovations algorithm with ACVF replaced by sample ACVF.
To be specific,
\( \hat{v}_0 \)\(= \hat{\gamma}(0)\),
\( \hat{\theta}_{m, m-k} \)\(= \frac{1}{\hat{v}_k}[\hat{\gamma}(m-k) - \sum_{j=0}^{k-1} \hat{\theta}_{k, k-j} \hat{\theta}_{m, m-j} \hat{v}_j]\) for \(k = 0, 1, \dotsm, m-1\),
\(\hat{v}_m\)\(= \hat{\gamma}(0) - \sum_{j=0}^{m-1} \hat{\theta}_{m, m-j}^2 \hat{v}_j\).

Order Selection for MA Models

We provide 3 techniques for selecting q when fitting an MA model:

Select \(q\) as the smallest value \(m\) such that \(\hat{\rho}(k)\) is not significantly different from zero for all \(k > m\).
We may examine the innovations estimators \(\hat{\boldsymbol{\theta}}_m, m = 1, 2, \dotsm\), to assess the appropriateness of an MA model and estimate \(q\).
See textbook
Select \(q\) by minimizing an information criterion, e.g., AIC:
\(\text{AIC} = -2 \log(\text{MGL}) + 2(q+1)\)
where MGL is maximum Gaussian likelihood.

Example

...

Asymptotic Behavior of \(\hat{\boldsymbol{\theta}}_m\)

Confidence Regions for \(\boldsymbol{\theta}_q\)

Innovations Algorithm Estimators When \(p > 0\) and \(q > 0\)

See textbook

2. Maximum Likelihood Estimation

3. Relative Efficiency of Estimators

4. Diagnostic Checking

5. Forecasting

6. Order Selection