1. Preliminary Estimation
-
Preliminary estimation is useful for providing initial
parameter estimates for likelihood optimization
-
We'll discuss Yule-Walker estimation for fitting AR
models and innovations estimation for fitting MA models
-
Both methods can be extended to fitting ARMA
models with \(q > 0\)
-
Burg's algorithm and Hannan-Rissanen algorithm are
also introduced in the textbook.
Yule-Walker Estimation; AR model
-
Consider fitting an AR(\(p\)) model:
\(X_t - \phi X_{t-1} - \dotsm - \phi_p X_{t-p} = Z_t\),
\(\{Z_t\} \sim WN(0, \sigma^2)\) to \(x_1, \dotsm, x_n\)
-
Yule-Walker estimation of \(\boldsymbol{\phi}\) and \(\sigma^2\):
-
First \(p+1\) Yule-Walker equations:
\(\gamma(0) - \phi_1 \gamma(1) - \dotsm - \phi_p \gamma(p) = \sigma^2\)
\(\gamma(k) - \phi_1 \gamma(k-1) - \dotsm - \phi_p \gamma(k-p) = 0 \)
for \(k = 1, \dotsm, p\)
-
In matrix form,
\(
\begin{cases}
\sigma^2 = \gamma(0) - \boldsymbol{\phi}' \boldsymbol{\gamma}_p \\
\boldsymbol{\Gamma}_p \boldsymbol{\phi} = \boldsymbol{\gamma}_p
\end{cases}
\)
where \(\boldsymbol{\gamma}_p = (\gamma(1), \dotsm, \gamma(p))'\) &
\(\boldsymbol{\Gamma}_p = [\gamma(i - j)]_{i, j = 1}^p\)
-
Replacing \(\gamma(\cdot)\) by \(\hat{\gamma}(\cdot)\) and solving for
\(\hat{\boldsymbol{\phi}}\) and \(\hat{\sigma}^2\):
\(
\begin{cases}
\hat{\boldsymbol{\phi}} = \hat{\boldsymbol{\Gamma}}_p^{-1} \hat{\boldsymbol{\gamma}}_p \\
\hat{\sigma}^2 = \hat{\gamma}(0) - \hat{\boldsymbol{\phi}}' \hat{\boldsymbol{\gamma}}_p
\end{cases}
\)
Remarks
-
Yule-Walker estimation is a method of moments.
-
The fitted model is causal.
-
Sample ACVF and ACVF from the fitted model agree at lag up to \(p\).
-
Sometimes it is more convenient to work with sample ACF instead of ACVF.
That is, \(\hat{\boldsymbol{\phi}} = \hat{\boldsymbol{R}}_p^{-1} \hat{\boldsymbol{\rho}}_p \)
and \(\hat{\sigma}^2 = \hat{\gamma}(0)[1 - \hat{\boldsymbol{\phi}}' \hat{\boldsymbol{\rho}}_p]\)
where \(\hat{\boldsymbol{\rho}}_p = (\hat{\rho}(1), \dotsm, \hat{\rho}(p))'\)
& \(\hat{\boldsymbol{R}}_p = [\hat{\rho}(i-j)]_{i,j=1}^p\).
-
Asymptotic behavior of the Yule-Walker estimators:
-
\(\hat{\boldsymbol{\phi}} \sim AN(\boldsymbol{\phi}, n^{-1} \sigma^2 \boldsymbol{\Gamma}_p^{-1})\)
-
\(\hat{\boldsymbol{\phi}}\) is asymptotically efficient
(i.e., it has the same asymptotic variance as MLE).
-
\(\hat{\sigma}^2 \overset{P}{\rightarrow} \sigma^2\).
Order Selection for AR Models
-
To fit an AR model, we need to select \(p\).
-
Two techniques can be used:
-
Select \(p\) as the smallest value \(m\) such that
\[
\mid \hat{\alpha}(k) \mid \leq \frac{1.96}{\sqrt{n}} \text{ for } k > m
\]
-
Select \(p\) by minimizing an information criterion, e.g., AIC
\(\text{AIC} \overset{\text{def}}{=} -2 \log{(\text{MGL})} + 2 (p + 1)\)
where MGL is maximum Gaussian Likelihood.
Examples
-
Sunspot numbers for years 1770-1869: \(S_1, \dotsm, S_{100}\)
Consider fitting an AR(2) model \(X_t - \phi_1 X_{t-1} - \phi_2 X_{t-2} = Z_t\),
\(\{Z_t\} \sim WN(0, \sigma^2)\) to the mean-corrected data \(X_t = S_t - 46.93, t = 1, \dotsm, 100\).
Given \(\hat{\gamma}(0) = 1382.2, \hat{\gamma}(1) =1114.4, \hat{\gamma}(2) = 591.73\) and
\(\hat{\gamma}(3) = 96.216\), we can find the Yule-Walker estimates of \(\phi_1, \phi_2\) and \(\sigma^2\).
-
Dow-Jones Utilities Index, Aug. 28 - Dec. 18, 1972.
Solving Yule-Walker Equations Using Durbin-Levinson Algorithm
-
Selection of order \(p\) based on AIC requires fitting AR models of
successively increasing orders \(1, 2, \dotsm\).
-
In this regard, Durbin-Levinson algorithm can be used.
-
Suppose \(X_1, \dotsm X_n\) are observations of a zero-mean stationary time series.
Durbin-Levinson Algorithm for Fitting AR Models
-
Provided \(\hat{\gamma}(0) > 0\), the fitted AR(\(m\)) process (\(m < n\))
by the Yule-Walker method is
\(X_t - \hat{\phi}_{m1}X_{t-1} - \dotsm - \hat{\phi}_{mm} X_{t-m} = Z_t\),
\(\{Z_t\} \sim WN(0, \hat{v}_m)\),
where \(\hat{\boldsymbol{\phi}}_m\)\(= (\hat{\phi}_{m1}, \dotsm, \hat{\phi}_{mm})'\)
\(=\)\( \hat{\boldsymbol{R}}_m^{-1} \hat{\boldsymbol{\rho}}_m \)
and \( \hat{v}_m = \hat{\gamma}(0)[1 - \hat{\boldsymbol{\phi}}_m' \hat{\boldsymbol{\rho}}_m] \).
-
Recall that \(\hat{\boldsymbol{\phi}}_m = \hat{\boldsymbol{R}}_m^{-1} \hat{\boldsymbol{\rho}}_m\)
is the coefficient vector of the BLP \(P_m X_{m+1} = \boldsymbol{\phi}_m' \boldsymbol{X}_m\) of \(X_{m+1}\)
based on \(\boldsymbol{X}_m = (X_m, \dotsm, X_1)'\), and the MSE of the BLP is
\(v_m = \gamma(0)[1 - \boldsymbol{\phi}_m' \boldsymbol{\rho}_m] \).
-
So, we can apply the Durbin-Levinson algorithm to fitting AR models of orders \(1, 2, \dotsm\) to the data.
-
\(\hat{\boldsymbol{\phi}}_1, \hat{\boldsymbol{\phi}}_2, \dotsm\), and
\( \hat{v}_1, \hat{v}_2, \dotsm \), are computed recursively from Sample ACVF
in the same way \(\boldsymbol{\phi}_1, \boldsymbol{\phi}_2, \dotsm \), and
\(v_1, v_2, \dotsm\), are computed from ACVF.
-
If \(\hat{\gamma}(0) > 0\), then for \(m = 1, 2, \dotsm, n-1\),
\( \hat{v}_0 \) \(= \hat{\gamma}(0)\),
\( \hat{\phi}_{mm} \)
\(= \frac{1}{\hat{v}_{m-1}} [\hat{\gamma}(m) - \sum_{j=1}^{m-1} \hat{\phi}_{m-1, j} \hat{\gamma}(m-j)] \),
\( \begin{bmatrix}\hat{\phi}_{m1} \\ \vdots \\ \hat{\phi}_{m, m-1}\end{bmatrix} \)
\( = \begin{bmatrix} \hat{\phi}_{m-1, 1} \\ \vdots \\ \hat{\phi}_{m-1,m-1} \end{bmatrix} - \hat{\phi}_{mm} \begin{bmatrix} \hat{\phi}_{m-1, m-1} \\ \vdots \\ \hat{\phi}_{m-1,1} \end{bmatrix}\),
\( \hat{v}_{m} \)\(= \hat{v}_{m-1}(1 - \hat{\phi}_{mm}^2)\).
Remarks
-
Use of Durbin-Levinson algorithm bypasses the matrix inversion required in
computing Yule-Walker estimators.
-
It also gives the sample PACF at lags \(1, 2, \dotsm\);
namely,
\(\hat{\alpha}(1) = \hat{\phi}_{11}, \hat{\alpha}(2) = \hat{\phi}_{22}, \dots\).
Confidence Interval for \(\phi_{pj}\)
-
The Yule-Walker estimator \(\hat{\boldsymbol{\phi}}_p \sim AN(\boldsymbol{\phi}_p, n^{-1}\sigma^2 \boldsymbol{\Gamma}_p^{-1})\),
where \(\hat{\boldsymbol{\phi}}_p = (\hat{\phi}_{p1}, \dotsm, \hat{\phi}_{pp})'\)
and \(\boldsymbol{\phi}_p = (\phi_{p1}, \dotsm, \phi_{pp})'\).
-
Let \(\hat{v}_p\) be the estimator of \(\sigma^2\).
-
Let \(\Phi_{1 - \alpha / 2}\) denote the \((1 - \alpha / 2)\)-quantile of \(\mathcal{N}(0,1)\).
-
For large \(n\), an approximate \(100(1-\alpha)\%\) CI for \(\phi_{pj}\) is given by
\[
\hat{\phi}_{pj} \pm \Phi_{1- \alpha / 2} \sqrt{\hat{v}_{jj} / n},
\]
where \(\hat{v}_{jj} is the j^{\text{th}}\) diagonal element of
\( \hat{v}_p \hat{\boldsymbol{\Gamma}}_p^{-1} \).
Confidence Interval for \(\phi_{p}\)
-
Moreover, confidence region for \(\boldsymbol{\phi}_p\) can be obtained.
-
Let \(\chi_{1-\alpha}^2 (p)\) denote the \((1-\alpha)\)-quantile of \(\chi^2(p)\).
-
For large \(n\), the region
\( \{ \boldsymbol{\phi} \in \mathbb{R}^p: (\boldsymbol{\phi} - \hat{\boldsymbol{\phi}}_p)' \hat{\boldsymbol{\Gamma}}_p (\boldsymbol{\phi} - \hat{\boldsymbol{\phi}}_p) \leq n^{-1}\hat{v}_p \chi_{1-\alpha}^2 (p)\}\)
contains \(\boldsymbol{\phi}_p\) with probability close to \(1 - \alpha\).
Example
El Niño and fish population (Shumway and Stoffer)
\(X_t =\) monthly number of new fish furnished by the Pacific Environmental
Fisheries Group, for a period of 453 months ranging over the years 1950-1987.
The central Pacific warms every three to seven years due to the El Niño effect,
which has been blamed, in particular, for the 1997 floods in the mid-western portions
of the United States.
Yule-Walker Method; ARMA Model
-
ARMA(\(p, q\)) model:
\(X_t - \phi_1 X_{t-1} - \dotsm - \phi_p X_{t-p} = Z_t + \theta_1 Z_{t-1} + \dotsm + \theta_q Z_{t-q}\)
-
Yule-Walker equations:
-
For \(0 \leq k < m\), where \(m = \max(p, q+1)\):
\(\gamma(k) - \phi_1 \gamma(k-1) - \dotsm - \phi_p \gamma(k-p) = \sigma^2 \sum_{j=0}^{q-k} \theta_{k+j} \psi_j\)
-
For \(m \leq k \leq p + q\):
\(\gamma(k) - \phi_1 \gamma(k-1) - \dotsm - \phi_p \gamma(k-p) = 0\)
-
Replacing \(\gamma(\cdot)\) with \(\hat{\gamma}(\cdot)\), we solve for Yule-Walker estimates
\(\hat{\boldsymbol{\phi}}, \hat{\boldsymbol{\theta}}\), and \(\hat{\sigma}^2\).
Remarks
-
\(\psi_j\) must first be expressed in terms of \(\boldsymbol{\phi}\) and \(\boldsymbol{\theta}\),
using \(\phi(z) \psi(z) = \theta(z)\).
-
The \(1^{\text{st}}\) set of equations is nonlinear in \(\boldsymbol{\phi}\) and \(\boldsymbol{\theta}\),
leading to possible nonexistence and/or non-uniqueness of solutions.
-
For MA and ARMA processes, the Yule-Walker estimates \(\hat{\boldsymbol{\phi}}\) and \(\hat{\boldsymbol{\theta}}\)
are less efficient than MLE.
Example
...
Innovations Algorithm
-
Yule-Walker estimates for AR models can be found by applying the Durbin-Levinson
algorithm with ACVF replaced by sample ACVF.
-
Innovations algorithm estimates for MA models are obtained in the same fashion.
Fitted Innovations MA(\(m\)) Model
-
Given the mean-corrected data \(X_1, \dots, X_n\),
the fitted innovations MA(\(m\)) model,
\(m = 1, 2, \dotsm, n-1\) is
\(X_t = Z_t + \hat{\theta}_{m1}Z_{t-1} + \dotsm + \hat{\theta}_{mm} Z_{t-m}\),
\(\{Z_t\} \sim WN(0, \hat{v}_m)\).
-
The innovation estimates \(\hat{\boldsymbol{\theta}}_m := (\hat{\theta}_{m1}, \dotsm, \hat{\theta}_{mm})'\)
and \(\hat{v}_m\) are obtained from the innovations algorithm with ACVF replaced by sample ACVF.
-
To be specific,
\( \hat{v}_0 \)\(= \hat{\gamma}(0)\),
\( \hat{\theta}_{m, m-k} \)\(= \frac{1}{\hat{v}_k}[\hat{\gamma}(m-k) - \sum_{j=0}^{k-1} \hat{\theta}_{k, k-j} \hat{\theta}_{m, m-j} \hat{v}_j]\)
for \(k = 0, 1, \dotsm, m-1\),
\(\hat{v}_m\)\(= \hat{\gamma}(0) - \sum_{j=0}^{m-1} \hat{\theta}_{m, m-j}^2 \hat{v}_j\).
Order Selection for MA Models
We provide 3 techniques for selecting q when fitting an MA model:
-
Select \(q\) as the smallest value \(m\) such that \(\hat{\rho}(k)\) is
not significantly different from zero for all \(k > m\).
-
We may examine the innovations estimators \(\hat{\boldsymbol{\theta}}_m, m = 1, 2, \dotsm\),
to assess the appropriateness of an MA model and estimate \(q\).
See textbook
-
Select \(q\) by minimizing an information criterion, e.g., AIC:
\(\text{AIC} = -2 \log(\text{MGL}) + 2(q+1)\)
where MGL is maximum Gaussian likelihood.
Example
...
Asymptotic Behavior of \(\hat{\boldsymbol{\theta}}_m\)
Confidence Regions for \(\boldsymbol{\theta}_q\)
Innovations Algorithm Estimators When \(p > 0\) and \(q > 0\)
See textbook
2. Maximum Likelihood Estimation
3. Relative Efficiency of Estimators