Wangsheng's World
Modeling and Forecasting with ARMA Processes
November 12, 2024
Fitting ARMA Model
  • Fitting an ARMA(\(p, q\)) model, \[ \phi(B)(X_t - \mu) = \theta(B)Z_t \] to a realization of stationary time series involves 3 interrelated steps:
    • order selection
    • parameter estimation
    • model diagnostics
  • 3 steps are iterated until a suitable model is identified.
Parameter Estimation
  • Given (mean-corrected) data \(x_1, \dotsm, x_n\)
  • Suppose it is appropriate to fit a (zero-mean) ARMA model to the data
  • Assuming that orders \(p\) and \(q\) are known, we'll discuss estimation of model parameters
    • \(\boldsymbol{\phi} = (\phi_1, \dotsm, \phi_p)'\)
    • \(\boldsymbol{\theta} = (\theta_1, \dotsm, \theta_q)'\)
    • \(\sigma^2\)
1. Preliminary Estimation
  • Preliminary estimation is useful for providing initial parameter estimates for likelihood optimization
  • We'll discuss Yule-Walker estimation for fitting AR models and innovations estimation for fitting MA models
    • Both methods can be extended to fitting ARMA models with \(q > 0\)
  • Burg's algorithm and Hannan-Rissanen algorithm are also introduced in the textbook.
Yule-Walker Estimation; AR model
  • Consider fitting an AR(\(p\)) model:
    \(X_t - \phi X_{t-1} - \dotsm - \phi_p X_{t-p} = Z_t\), \(\{Z_t\} \sim WN(0, \sigma^2)\) to \(x_1, \dotsm, x_n\)
  • Yule-Walker estimation of \(\boldsymbol{\phi}\) and \(\sigma^2\):
    • First \(p+1\) Yule-Walker equations:
      \(\gamma(0) - \phi_1 \gamma(1) - \dotsm - \phi_p \gamma(p) = \sigma^2\)
      \(\gamma(k) - \phi_1 \gamma(k-1) - \dotsm - \phi_p \gamma(k-p) = 0 \) for \(k = 1, \dotsm, p\)
    • In matrix form,
      \( \begin{cases} \sigma^2 = \gamma(0) - \boldsymbol{\phi}' \boldsymbol{\gamma}_p \\ \boldsymbol{\Gamma}_p \boldsymbol{\phi} = \boldsymbol{\gamma}_p \end{cases} \)
      where \(\boldsymbol{\gamma}_p = (\gamma(1), \dotsm, \gamma(p))'\) & \(\boldsymbol{\Gamma}_p = [\gamma(i - j)]_{i, j = 1}^p\)
    • Replacing \(\gamma(\cdot)\) by \(\hat{\gamma}(\cdot)\) and solving for \(\hat{\boldsymbol{\phi}}\) and \(\hat{\sigma}^2\):
      \( \begin{cases} \hat{\boldsymbol{\phi}} = \hat{\boldsymbol{\Gamma}}_p^{-1} \hat{\boldsymbol{\gamma}}_p \\ \hat{\sigma}^2 = \hat{\gamma}(0) - \hat{\boldsymbol{\phi}}' \hat{\boldsymbol{\gamma}}_p \end{cases} \)
Remarks
  • Yule-Walker estimation is a method of moments.
  • The fitted model is causal.
  • Sample ACVF and ACVF from the fitted model agree at lag up to \(p\).
  • Sometimes it is more convenient to work with sample ACF instead of ACVF. That is, \(\hat{\boldsymbol{\phi}} = \hat{\boldsymbol{R}}_p^{-1} \hat{\boldsymbol{\rho}}_p \) and \(\hat{\sigma}^2 = \hat{\gamma}(0)[1 - \hat{\boldsymbol{\phi}}' \hat{\boldsymbol{\rho}}_p]\) where \(\hat{\boldsymbol{\rho}}_p = (\hat{\rho}(1), \dotsm, \hat{\rho}(p))'\) & \(\hat{\boldsymbol{R}}_p = [\hat{\rho}(i-j)]_{i,j=1}^p\).
  • Asymptotic behavior of the Yule-Walker estimators:
    • \(\hat{\boldsymbol{\phi}} \sim AN(\boldsymbol{\phi}, n^{-1} \sigma^2 \boldsymbol{\Gamma}_p^{-1})\)
      • \(\hat{\boldsymbol{\phi}}\) is asymptotically efficient (i.e., it has the same asymptotic variance as MLE).
    • \(\hat{\sigma}^2 \overset{P}{\rightarrow} \sigma^2\).
Order Selection for AR Models
  • To fit an AR model, we need to select \(p\).
  • Two techniques can be used:
    • Select \(p\) as the smallest value \(m\) such that \[ \mid \hat{\alpha}(k) \mid \leq \frac{1.96}{\sqrt{n}} \text{ for } k > m \]
    • Select \(p\) by minimizing an information criterion, e.g., AIC
      \(\text{AIC} \overset{\text{def}}{=} -2 \log{(\text{MGL})} + 2 (p + 1)\)
      where MGL is maximum Gaussian Likelihood.
Examples
  • Sunspot numbers for years 1770-1869: \(S_1, \dotsm, S_{100}\)
    Consider fitting an AR(2) model \(X_t - \phi_1 X_{t-1} - \phi_2 X_{t-2} = Z_t\), \(\{Z_t\} \sim WN(0, \sigma^2)\) to the mean-corrected data \(X_t = S_t - 46.93, t = 1, \dotsm, 100\).
    Given \(\hat{\gamma}(0) = 1382.2, \hat{\gamma}(1) =1114.4, \hat{\gamma}(2) = 591.73\) and \(\hat{\gamma}(3) = 96.216\), we can find the Yule-Walker estimates of \(\phi_1, \phi_2\) and \(\sigma^2\).
  • Dow-Jones Utilities Index, Aug. 28 - Dec. 18, 1972.
Solving Yule-Walker Equations Using Durbin-Levinson Algorithm
  • Selection of order \(p\) based on AIC requires fitting AR models of successively increasing orders \(1, 2, \dotsm\).
  • In this regard, Durbin-Levinson algorithm can be used.
  • Suppose \(X_1, \dotsm X_n\) are observations of a zero-mean stationary time series.
Durbin-Levinson Algorithm for Fitting AR Models
  • Provided \(\hat{\gamma}(0) > 0\), the fitted AR(\(m\)) process (\(m < n\)) by the Yule-Walker method is
    \(X_t - \hat{\phi}_{m1}X_{t-1} - \dotsm - \hat{\phi}_{mm} X_{t-m} = Z_t\), \(\{Z_t\} \sim WN(0, \hat{v}_m)\),
    where \(\hat{\boldsymbol{\phi}}_m\)\(= (\hat{\phi}_{m1}, \dotsm, \hat{\phi}_{mm})'\) \(=\)\( \hat{\boldsymbol{R}}_m^{-1} \hat{\boldsymbol{\rho}}_m \) and \( \hat{v}_m = \hat{\gamma}(0)[1 - \hat{\boldsymbol{\phi}}_m' \hat{\boldsymbol{\rho}}_m] \).
  • Recall that \(\hat{\boldsymbol{\phi}}_m = \hat{\boldsymbol{R}}_m^{-1} \hat{\boldsymbol{\rho}}_m\) is the coefficient vector of the BLP \(P_m X_{m+1} = \boldsymbol{\phi}_m' \boldsymbol{X}_m\) of \(X_{m+1}\) based on \(\boldsymbol{X}_m = (X_m, \dotsm, X_1)'\), and the MSE of the BLP is \(v_m = \gamma(0)[1 - \boldsymbol{\phi}_m' \boldsymbol{\rho}_m] \).
  • So, we can apply the Durbin-Levinson algorithm to fitting AR models of orders \(1, 2, \dotsm\) to the data.
  • \(\hat{\boldsymbol{\phi}}_1, \hat{\boldsymbol{\phi}}_2, \dotsm\), and \( \hat{v}_1, \hat{v}_2, \dotsm \), are computed recursively from Sample ACVF in the same way \(\boldsymbol{\phi}_1, \boldsymbol{\phi}_2, \dotsm \), and \(v_1, v_2, \dotsm\), are computed from ACVF.
  • If \(\hat{\gamma}(0) > 0\), then for \(m = 1, 2, \dotsm, n-1\),
    \( \hat{v}_0 \) \(= \hat{\gamma}(0)\),
    \( \hat{\phi}_{mm} \) \(= \frac{1}{\hat{v}_{m-1}} [\hat{\gamma}(m) - \sum_{j=1}^{m-1} \hat{\phi}_{m-1, j} \hat{\gamma}(m-j)] \),
    \( \begin{bmatrix}\hat{\phi}_{m1} \\ \vdots \\ \hat{\phi}_{m, m-1}\end{bmatrix} \) \( = \begin{bmatrix} \hat{\phi}_{m-1, 1} \\ \vdots \\ \hat{\phi}_{m-1,m-1} \end{bmatrix} - \hat{\phi}_{mm} \begin{bmatrix} \hat{\phi}_{m-1, m-1} \\ \vdots \\ \hat{\phi}_{m-1,1} \end{bmatrix}\),
    \( \hat{v}_{m} \)\(= \hat{v}_{m-1}(1 - \hat{\phi}_{mm}^2)\).
Remarks
  • Use of Durbin-Levinson algorithm bypasses the matrix inversion required in computing Yule-Walker estimators.
  • It also gives the sample PACF at lags \(1, 2, \dotsm\);
    namely, \(\hat{\alpha}(1) = \hat{\phi}_{11}, \hat{\alpha}(2) = \hat{\phi}_{22}, \dots\).
Confidence Interval for \(\phi_{pj}\)
  • The Yule-Walker estimator \(\hat{\boldsymbol{\phi}}_p \sim AN(\boldsymbol{\phi}_p, n^{-1}\sigma^2 \boldsymbol{\Gamma}_p^{-1})\), where \(\hat{\boldsymbol{\phi}}_p = (\hat{\phi}_{p1}, \dotsm, \hat{\phi}_{pp})'\) and \(\boldsymbol{\phi}_p = (\phi_{p1}, \dotsm, \phi_{pp})'\).
  • Let \(\hat{v}_p\) be the estimator of \(\sigma^2\).
  • Let \(\Phi_{1 - \alpha / 2}\) denote the \((1 - \alpha / 2)\)-quantile of \(\mathcal{N}(0,1)\).
  • For large \(n\), an approximate \(100(1-\alpha)\%\) CI for \(\phi_{pj}\) is given by \[ \hat{\phi}_{pj} \pm \Phi_{1- \alpha / 2} \sqrt{\hat{v}_{jj} / n}, \] where \(\hat{v}_{jj} is the j^{\text{th}}\) diagonal element of \( \hat{v}_p \hat{\boldsymbol{\Gamma}}_p^{-1} \).
Confidence Interval for \(\phi_{p}\)
  • Moreover, confidence region for \(\boldsymbol{\phi}_p\) can be obtained.
  • Let \(\chi_{1-\alpha}^2 (p)\) denote the \((1-\alpha)\)-quantile of \(\chi^2(p)\).
  • For large \(n\), the region
    \( \{ \boldsymbol{\phi} \in \mathbb{R}^p: (\boldsymbol{\phi} - \hat{\boldsymbol{\phi}}_p)' \hat{\boldsymbol{\Gamma}}_p (\boldsymbol{\phi} - \hat{\boldsymbol{\phi}}_p) \leq n^{-1}\hat{v}_p \chi_{1-\alpha}^2 (p)\}\)
    contains \(\boldsymbol{\phi}_p\) with probability close to \(1 - \alpha\).
Example
El Niño and fish population (Shumway and Stoffer)
\(X_t =\) monthly number of new fish furnished by the Pacific Environmental Fisheries Group, for a period of 453 months ranging over the years 1950-1987.
The central Pacific warms every three to seven years due to the El Niño effect, which has been blamed, in particular, for the 1997 floods in the mid-western portions of the United States.
Yule-Walker Method; ARMA Model
  • ARMA(\(p, q\)) model:
    \(X_t - \phi_1 X_{t-1} - \dotsm - \phi_p X_{t-p} = Z_t + \theta_1 Z_{t-1} + \dotsm + \theta_q Z_{t-q}\)
  • Yule-Walker equations:
    • For \(0 \leq k < m\), where \(m = \max(p, q+1)\):
      \(\gamma(k) - \phi_1 \gamma(k-1) - \dotsm - \phi_p \gamma(k-p) = \sigma^2 \sum_{j=0}^{q-k} \theta_{k+j} \psi_j\)
    • For \(m \leq k \leq p + q\):
      \(\gamma(k) - \phi_1 \gamma(k-1) - \dotsm - \phi_p \gamma(k-p) = 0\)
  • Replacing \(\gamma(\cdot)\) with \(\hat{\gamma}(\cdot)\), we solve for Yule-Walker estimates \(\hat{\boldsymbol{\phi}}, \hat{\boldsymbol{\theta}}\), and \(\hat{\sigma}^2\).
Remarks
  • \(\psi_j\) must first be expressed in terms of \(\boldsymbol{\phi}\) and \(\boldsymbol{\theta}\), using \(\phi(z) \psi(z) = \theta(z)\).
  • The \(1^{\text{st}}\) set of equations is nonlinear in \(\boldsymbol{\phi}\) and \(\boldsymbol{\theta}\), leading to possible nonexistence and/or non-uniqueness of solutions.
  • For MA and ARMA processes, the Yule-Walker estimates \(\hat{\boldsymbol{\phi}}\) and \(\hat{\boldsymbol{\theta}}\) are less efficient than MLE.
Example
...
Innovations Algorithm
  • Yule-Walker estimates for AR models can be found by applying the Durbin-Levinson algorithm with ACVF replaced by sample ACVF.
  • Innovations algorithm estimates for MA models are obtained in the same fashion.
Fitted Innovations MA(\(m\)) Model
  • Given the mean-corrected data \(X_1, \dots, X_n\), the fitted innovations MA(\(m\)) model, \(m = 1, 2, \dotsm, n-1\) is
    \(X_t = Z_t + \hat{\theta}_{m1}Z_{t-1} + \dotsm + \hat{\theta}_{mm} Z_{t-m}\), \(\{Z_t\} \sim WN(0, \hat{v}_m)\).
  • The innovation estimates \(\hat{\boldsymbol{\theta}}_m := (\hat{\theta}_{m1}, \dotsm, \hat{\theta}_{mm})'\) and \(\hat{v}_m\) are obtained from the innovations algorithm with ACVF replaced by sample ACVF.
  • To be specific,
    \( \hat{v}_0 \)\(= \hat{\gamma}(0)\),
    \( \hat{\theta}_{m, m-k} \)\(= \frac{1}{\hat{v}_k}[\hat{\gamma}(m-k) - \sum_{j=0}^{k-1} \hat{\theta}_{k, k-j} \hat{\theta}_{m, m-j} \hat{v}_j]\) for \(k = 0, 1, \dotsm, m-1\),
    \(\hat{v}_m\)\(= \hat{\gamma}(0) - \sum_{j=0}^{m-1} \hat{\theta}_{m, m-j}^2 \hat{v}_j\).
Order Selection for MA Models
We provide 3 techniques for selecting q when fitting an MA model:
  • Select \(q\) as the smallest value \(m\) such that \(\hat{\rho}(k)\) is not significantly different from zero for all \(k > m\).
  • We may examine the innovations estimators \(\hat{\boldsymbol{\theta}}_m, m = 1, 2, \dotsm\), to assess the appropriateness of an MA model and estimate \(q\).
    See textbook
  • Select \(q\) by minimizing an information criterion, e.g., AIC:
    \(\text{AIC} = -2 \log(\text{MGL}) + 2(q+1)\)
    where MGL is maximum Gaussian likelihood.
Example
...
Asymptotic Behavior of \(\hat{\boldsymbol{\theta}}_m\)
Confidence Regions for \(\boldsymbol{\theta}_q\)
Innovations Algorithm Estimators When \(p > 0\) and \(q > 0\)
See textbook
2. Maximum Likelihood Estimation
3. Relative Efficiency of Estimators
4. Diagnostic Checking
5. Forecasting
6. Order Selection