Stationary Processes
Last Update: October 3, 2024
1. Basic Properties
Let \(\{X_t\}\) be a stationary time series with
- mean \(\mu\);
- ACVF \(\gamma(h)\) and ACF \(\rho(h)\), \(h = 0, \pm1, \pm2, \dotsm\).
Basic Properties of ACVF
- \( \gamma(0) \ge 0 \)
-
\( \mid \gamma(h) \mid \le \gamma(0) \)
\( 0 \le Var(X_{t+h}, X_t) = 2 \gamma(0) + 2 \gamma(h) \Rightarrow \mid \gamma(h) \mid \le \gamma(0) \)
- \( \gamma(h) = \gamma(-h) \)
Basic Properties of ACF
- \( \rho(0) = 1 \)
- \( \mid \rho(h) \mid \le 1 \)
- \( \rho(h) = \rho(-h) \)
Nonnegative Definite Function
A real-valued function \(\kappa\) defined on the integers is
nonnegative definite if
\[
\sum_{i=1}^n \sum_{j=1}^n a_i \kappa(i-j) a_j \ge 0
\]
for all positive integers \(n\) and vectors \( \mathbf{a} = (a_1, \dotsm, a_n)' \)
with real-valued components \(a_i\).
Theorem 2.1.1
-
\(\kappa(\cdot) \) is the ACVF of a stationary time series if & only if
- \(\kappa(\cdot) \) is an even function, and
- \(\kappa(\cdot) \) is nonnegative definite.
- Note, (2) is hard to verify, so the theorem is usually used for disproof.
Problem 2.2 of HW 3 will use necessity of the theorem.
Remark
-
To show that \(\kappa(\cdot)\) is the ACVF of a stationary process,
it is often simpler to find the process that \(\kappa(\cdot)\) as its
ACVF than to verify (2.).
-
Example: which of the following functions are ACVF?
- \(\kappa(h) = (-1)^{\mid h \mid}\)
- \(\kappa(h) = 1 + \cos(\pi h / 2) + \cos (\pi h /4)\)
-
\(
\kappa(h) =
\begin{cases}
1 & \text{if } h = 0\\
0.4 & \text{if } h = \pm 1\\
0 & \text{otherwise}
\end{cases}
\)
Role of ACVF & ACF in Time Series Forecasting
- ACVF & ACF provide useful measure of dependence among time series data.
- Hence, they play an important role in time series forecasting.
Illustration Example
-
Suppose \(\{X_t\}\) is a stationary Gaussian time series with
mean \(\mu\), ACVF \(\gamma(\cdot)\), and ACF \(\rho(\cdot)\).
- Suppose we have observed \(X_n\).
- We want to forecast \( X_{n+h}, h \ge 1 \), bases on \(X_n\).
Best MSE Predictor
-
Criterion of the best prediction
-
Find the predictor that minimizes
\( \mathbb{E}[( X_{n+h} - f(X_n) )^2] \)
over all possible functions \(f\).
-
Such predictor, say \(m(X_n)\) ,is called the best MSE predictor.
(mean squared error)
-
What is the best MSE predictor of \(X_{n+h}\) based on \(X_{n}\)?
Answer: \(\mathbb{E}(X_{n+h} \mid X_{n})\).
Example
-
As \(\{X_t\} \)is a Gaussian time series,
\(
\begin{pmatrix}
X_{n+h} \\
X_n
\end{pmatrix}
\sim \mathcal{N}
(
\begin{pmatrix}
\mu \\
\mu
\end{pmatrix},
\begin{pmatrix}
\gamma(0) & \gamma(h)\\
\gamma(h) & \gamma(0)
\end{pmatrix}
)
\)
-
So, we have
\(X_{n+h} \mid X_n \sim \mathcal{N} (\mu + \rho(h)(X_n - \mu), \gamma(0)(1-\rho(h)^2))\)
-
It follows that
\( m(X_n) = \mathbb{E}(X_{n+h} \mid X_n) = \mu + \rho(h)(X_n - \mu) \)
-
The corresponding MSE is
\( \mathbb{E}[X_{n+h} - m(X_n)]^2 = \gamma(0)(1-\rho(h)^2) \)
Remarks
- If \( \{X_t\} \) is a Gaussian time series, calculation of the best MSE predictor is no problem.
- However, if \( \{X_t\} \) is not a Gaussian time series, then the calculation in general is complicated.
- So, instead of looking for the best MSE predictor, we can look for the best linear predictor.
Best Linear Predictor (BLP)
-
Criterion of the best prediction
-
Find the predictor that minimizes
\( \mathbb{E}[( X_{n+h} - f(X_n) )^2] \)
over all linear functions \(f\) of the form \(a X_n + b\).
-
Such predictor, say \(l(X_n)\) ,is called the best linear predictor.
-
Finding BLP is equivalent to finding \(a\) & \(b\) to minimize
\( S(a, b) = \mathbb{E}[( X_{n+h} - a X_n - b)^2]\).
Example
\(f(X_n) = aX_n + b\), where \(\{X_t\}\) is stationary with \(\mathbb{E}(X_t)=\mu\), ACVF and ACF
-
\(\frac{\partial S(a,b)}{\partial b} = \mathbb{E}(-2(X_{n+h} - aX_n -b)) \overset{set}= 0 \)
\(\Rightarrow b = \mu(1-a) \)
-
Rewrite \(S(a,b) = \dotsm = \mathbb{E}[(X_{n+h} - \mu) - a(X_n - \mu)]^2 \)
Then \(\frac{\partial S(a,b)}{\partial a} = \dotsm \overset{set}= 0 \)
\(\Rightarrow a = \rho(h)\)
So, BLP is \(l(X_n) = \rho(h) X_n + \mu (1 - \rho(h)) = \mu + \rho(X_n - \mu) \)
Moreover, the corresponding MSE is \(\mathbb{E}[X_{n+h} - m(X_n)]^2 = \gamma(0)(1-\rho(h)^2)\)
Remarks
- For Gaussian time series, best MSE predictor = BLP.
- In general, best MSE predictor gives smaller MSE than BLP.
-
BLP only depends on the mean & ACVF of time series.
- So, it can be calculated without detailed knowledge of joint distributions.
-
For non-Gaussian time series,
it avoids the possible difficulty of computing conditional expectations
even when joint distributions are known.
2. Linear Processes
\( \{X_t\} \) is a linear process if
\(X_t = \sum_{j = -\infty}^{\infty} \psi_j Z_{t-j}, \{Z_t\} \sim WN(0,\sigma^2) \)
where \( \sum_{j = -\infty}^{\infty} \mid \psi_j \mid < \infty \). (absolutely sumable)
Let \(\psi(z) = \sum_{j = -\infty}^{\infty} \psi_j z^j \), so we can write
\( X_t = \psi(B)Z_t \)
Remarks
The condition \(\sum_{j=-\infty}^{\infty} \mid \psi_j \mid < \infty\) ensures that,
for each fixed \(t\), the infinite sum in the definition converges with probability
one (aka "almost surely," c.f. Remark 1 of Section 2.2).
-
For each \(t\), the infinite sum converges absolutely
(i.e., \(\sum_{j=-\infty}^{\infty} \mid \psi_j Z_{t-j} \mid < \infty\) )
with probability one.
-
It also ensures that \(\sum_{j=-\infty}^{\infty} \psi_j^2 < \infty\)
and hence (see Appendix C) that the infinite sum converges in mean square.
MA(\(\infty\)) Processes
A linear process with \(\psi_j = 0\) for all \(j < 0\), i.e.,
\[
X_t = \sum_{j=0}^\infty \psi_j Z_{t-j}, \{Z_t\} \sim WN(0, \sigma^2)
\]
is called an MA(\(\infty\)) process.
Properties of Linear Processes
- \( \mathbb{E}(X_t) = 0 \)
-
\( \gamma(h) = \sigma^2 \sum_{j = -\infty}^{\infty} \psi_j \psi_{j+h} \)
(proof in textbook)
- So, a linear process is weakly stationary.
- A linear process is strictly stationary if \(\{Z_t\} \sim WN(0, \sigma^2) \) is replaced by \(\{Z_t\} \sim iid \)
Examples
-
MA(1)
\(X_t = Z_t + \theta Z_{t-1} \) where \(\{Z_t\} \sim WN(0, \sigma^2)\)
\(\psi_0 = 1, \psi_1 = \theta \) & \( \psi_j = 0 \) if \(j \ne 0\) or \(1\).
\(\gamma(h) = \sigma^2 \sum_j \psi_j \psi_{j+h} \)
\(=
\begin{cases}
\sigma^2 \sum_j \psi_j^2 = \sigma^2(1 + \theta^2) & h = 0\\
\sigma^2 \sum_j \psi_j \psi_{j+h} = \sigma^2 \theta & h = \pm 1\\
\sigma^2 \sum_j \psi_j \psi_{j+h} = 0 & \text{otherwise}
\end{cases}
\)
-
AR(1)
xxx
3. Introduction to ARMA Processes
-
\(\{X_t\}\) is an ARMA(\(p, q\)) process if \(\{X_t\}\) is stationary solution to
\(X_t - \phi_1 X_{t-1} - \dotsm - \phi_p X_{t-p}\)
\(= Z_t + \theta_1 Z_{t-1} + \dotsm + \theta_q Z_{t-q}\)
for every \(t\), where \(\{Z_t\} \sim WN(0, \sigma^2)\), \(\phi_p \ne 0\), \(\theta_q \ne 0\),
and \(\phi(z) = 1 - \phi_1 z - \dotsm - \phi_p z^p\) and \(\theta(z) = 1 + \theta_1 z + \dotsm + \theta_q z^q\)
have no common roots.
- \(\phi(z)\) is called the AR polynomial.
- \(\theta(z)\) is called the MA polynomial.
-
We can write the ARMA(\(p, q\)) equations in short as
\[
\phi(B) X_t = \theta(B) Z_t
\]
-
MA(\(q\)) process:
\[
X_t = Z_t + \theta_1 Z_{t-1} + \dotsm + \theta_1 Z_{t-q}
\]
-
AR(\(p\)) process:
\[
X_t - \phi_1 X_{t-1} - \dotsm - \phi_p X_{t-p} = Z_t
\]
-
\(\{X_t\}\) is an ARMA(\(p,q\)) process with mean \(\mu\) if \(\{X_t - \mu\}\)
is an ARMA(\(p,q\)) process.
Existence and Uniqueness
ARMA equations have a stationary solution, which is also unique,
if and only if \(\phi(z) \ne 0\) for \(\mid z \mid = 1\).
- Note that, \(z\) can be a complex number and \(\mid z \mid\) is the modulus of \(z\).
Causality
-
ARMA(\(p,q\)) process \(\{X_t\}\) is causal
if we can write
\[
X_t = \sum_{j=0}^\infty \psi_j Z_{t-j},
\]
with \( \sum_{j=0}^\infty \mid \psi_j \mid < \infty \).
-
\(\{X_t\}\) is causal \(\iff \phi(z) \ne 0\) for \( \mid z \mid \le 1\)
- To check for causality, find the roots of \( \phi(z) = 0\)
-
If there exist any roots inside or on the unit circle,
\(\{X_t\}\) is noncausal; otherwise, \(\{X_t\}\) is causal.
-
Examples:
-
MA(2) process: \(X_t = Z_t - 0.4 Z_{t-1} + 0.04 Z_{t-2}\)
\(\{X_t\}\) is causal by definition.
-
AR(2) process: \(X_t - 0.7 X_{t-1} + 0.1 Z_{t-2} = Z_t\)
\(\phi(z) = 1 - 0.7z + 0.1 z^2 = (1-0.5z)(1-0.2z) \overset{set}= 0\)
So, roots of \(\phi(z)\) are \(z_1 = 2\) and \(z_2 = 5\).
Both are outside the unit circle.
So, \(\{X_t\}\) is causal.
-
ARMA(1,1) process: \(X_t - 0.5 X_{t-1} = Z_t + 0.4 Z_{t-1}\)
\(\phi(z) = 1 - 0.5z \overset{set}= 0\)
\(\Rightarrow\) root of \(\phi(z)\) is \(z = 2\), outside the unit circle.
So, \(\{X_t\}\) is causal.
Invertibility
-
ARMA(\(p,q\)) process \(\{X_t\}\) is invertible
if we can write
\[
Z_t = \sum_{j=0}^\infty \pi_j X_{t-j},
\]
with \( \sum_{j=0}^\infty \mid \pi_j \mid < \infty \).
-
\(\{X_t\}\) is invertible \(\iff \theta(z) \ne 0\) for \( \mid z \mid \le 1\)
- To check for invertibility, find the roots of \( \theta(z) = 0\)
-
If there exist any roots inside or on the unit circle,
\(\{X_t\}\) is non-invertible; otherwise, \(\{X_t\}\) is invertible.
-
Examples:
-
MA(2) process: \(X_t = Z_t - 0.4 Z_{t-1} + 0.04 Z_{t-2}\)
\( \theta(z) = 1 - 0.4z + 0.04 z^2 = (1-0.2z)^2 \overset{set}= 0 \)
\( \Rightarrow\) roots of \(\theta(z)\) are \(z_{1,2}=5\), outside unit circle.
So, \(\{X_t\}\) is invertible.
-
AR(2) process: \(X_t - 0.7 X_{t-1} + 0.1 Z_{t-2} = Z_t\)
\(\{X_t\}\) is invertible by definition.
-
ARMA(1,1) process: \(X_t - 0.5 X_{t-1} = Z_t + 0.4 Z_{t-1}\)
... \(\{X_t\}\) is invertible.
Remark
We shall assume causality and invertibility in this course unless we state otherwise.
4. Properties of Sample Mean and ACF
-
Suppose that \(X_1, \dotsm, X_n\) are observed data from a
stationary process \(\{X_t\}\) with mean \(\mu\), ACVF \(\gamma(\cdot)\),
and ACF \(\rho(\cdot)\).
-
Sample mean: \(\bar{X} = \frac{1}{n} \sum_{t=1}^n X_t\)
-
\(\bar{X}\) is the moment estimator of \(\mu\).
Properties of \(\bar{X}\)
- \( \mathbb{E}(\bar{X}) = \mu \)
-
\(Var(\bar{X}) = \frac{1}{n} \sum_{\mid h \mid < n} (1 - \frac{\mid h \mid}{n}) \gamma(h) \)
\(
Var(\bar{X}) = Var(\frac{1}{n} \sum_{t=1}^n X_t) = \frac{1}{n^2} Cov(\sum_{t=1}^n X_t, \sum_{s=1}^n X_s)
= \frac{1}{n^2} \sum_{t=1}^n \sum_{s=1}^n Cov(X_t, X_s) = ...
\)
(see photo - 15:34 Sep 24)
-
\(Var(\bar{X}) \rightarrow 0\) if \(\gamma(h) \rightarrow 0\) as \(h \rightarrow \infty\)
-
\(n Var(\bar{X}) \rightarrow \sum_{h = - \infty}^\infty \gamma(h) \) if \( \sum_{h = - \infty}^\infty \mid \gamma(h) \mid < \infty \)
-
For a large class of time series models,
\(
\sqrt{n}(\bar{X} - \mu) \overset{approx.}\sim \mathcal{N}(0, \sum_{\mid h \mid < n} (1 - \frac{\mid h \mid}{n}) \gamma(h))
\)
-
Equivalently,
\(
\sqrt{n}(\bar{X} - \mu) \overset{approx.}\sim \mathcal{N}(0, \sum_{h = - \infty}^\infty \gamma(h))
\)
CI for \(\mu\)
-
An approximate 95% CI for \(\mu\) is given by
\[
\bar{X} \pm 1.96 \frac{\sqrt{\hat{v}}}{\sqrt{n}},
\]
where \(\hat{v}\) is an estimator of \(v = \sum_{h=-\infty}^\infty \gamma(h)\),
for example,
- \(\hat{v} = \sum_{\mid h \mid < \sqrt{n}} (1 - \frac{\mid h \mid}{n})\hat{\gamma}(h) \).
- \(\hat{v} = 2 \pi \hat{f}(0)\), where \( \hat{f}(0) \) estimates the spectral density valued at frequency 0 (see Chapter 4)
-
Example:
AR(1) with mean \(\mu\): \( X_t - \mu = \phi(X_{t-1} - \mu) + Z_t \),
where \(\mid \phi \mid < 1\) and \( \{Z_t\} \sim WN(0, \sigma^2) \)
Sample ACVF & ACF
-
Sample ACVF:
\(
\hat{\gamma}(h) = \frac{1}{n}\sum_{t=1}^{n - \mid h \mid}(X_{t+\mid h \mid} - \bar{X})(X_t - \bar{X}), -n < h < n
\)
-
Sample ACF:
\(
\hat{\rho}(h) = \frac{\hat{\gamma}(h)}{\hat{\gamma(0)}}
\)
-
\(\hat{\gamma}(h)\) and \(\hat{\rho}(h)\) estimate \(\gamma(h)\) and \(\rho(h)\), respectively.
Sample Covariance Matrix
\[
\hat{\boldsymbol{\Gamma}}_k =
\begin{bmatrix}
\hat{\gamma}(0) & \hat{\gamma}(1) & \dots & \hat{\gamma}(k-1) \\
\hat{\gamma}(1) & \hat{\gamma}(0) & \dots & \hat{\gamma}(k-2) \\
\vdots & \vdots & \ddots & \vdots \\
\hat{\gamma}(k-1) & \hat{\gamma}(k-2) & \dots & \hat{\gamma}(0)
\end{bmatrix}
\]
-
\(\hat{\boldsymbol{\Gamma}}_k\) is nonnegative definite for all \(k \ge 1\).
-
Sample autocorrelation matrix:
\[
\hat{\boldsymbol{R}}_k = \hat{\boldsymbol{\Gamma}}_k / \hat{\gamma}(0)
\]
Sampling Distribution of \( \hat{\rho}(\cdot) \)
-
For linear time series models,
\(
\hat{\boldsymbol{\rho}} \overset{approx.}\sim \mathcal{N}(\boldsymbol{\rho}, \boldsymbol{W} / n),
\)
where \( \hat{\boldsymbol{\rho}} = (\hat{\rho}(1), \dotsm, \hat{\rho}(h))' \),
\(\boldsymbol{\rho} = (\rho(1), \dotsm, \rho(h))' \),
and \(\boldsymbol{W}\) is a matrix whose \((i, j)\) element is given by Bartlett's formula;
namely,
\[
\begin{align*}
& w_{ij} \\
= & \sum_{k=1}^\infty \{ \rho(k+i) + \rho(k-i) - 2\rho(k)\rho(i) \} \\
& \times \{ \rho(k+j) + \rho(k-j) - 2\rho(k)\rho(j) \}
\end{align*}
\]
-
Examples:
-
\(iid \) noise: \(\{X_t\} \sim iid(0, \sigma^2)\)
(see photo - 15:11 Sep 26)
-
MA(1): \(X_t = Z_t + \theta Z_{t-1}\), \(\{Z_t\} \sim WN(0, \sigma^2)\)
\(X_t = Z_t + \theta Z_{t-1} \Rightarrow \gamma(l) = (1 + \theta^2)\sigma^2 \mathbb{I}_{l=0} + \theta \sigma^2 \mathbb{I}_{l = \pm 1}\)
\( \hat{\rho}(i) \overset{approx.}\sim \mathcal{N}(\rho(i), \frac{w_{ii}}{n}) \), where
(1) if \(i = 1\),
\(w_{ii} = \dotsm = 1 - 3 \rho^2(1) + 4 \rho^4(1)\)
(2) if \(i > 1\),
\(w_{ii} = \dotsm = 1 + 2\rho^2(1)\)
(see photo - 15:22 Sep 26)
-
MA(\(q\)): \(X_t = Z_t + \theta_1 Z_{t-1} + \dotsm + \theta_q Z_{t-q}\), \(\{Z_t\} \sim WN(0, \sigma^2)\)
\(\hat{\rho}(i) \overset{approx.}\sim \mathcal{N}(0, \frac{1 + 2 \rho^2(1) + 2 \rho^2(p)}{n}) \) for \(i > q\).
-
AR(1): \( X_t = \phi X_{t-1} + Z_t \), \(\mid \phi \mid < 1\) & \(\{Z_t\} \sim WN(0, \sigma^2)\)
-
R Examples:
- A simulated MA(1) series with \(\theta = -0.9\)
- Lake Huron residuals
5. Forecasting Stationary Time Series
- Let \(\{X_t\}\) be a stationary time series with known mean \(\mu\), ACVF \(\gamma(\cdot)\), and ACF \(\rho(\cdot)\).
- Our goal is to find \(P_n X_{n+h}\), the BLP of \(X_{n+h}\) in terms of \(1, X_1, \dotsm, X_n\).
-
We write \( P_n X_{n+h} = a_0 + a_1 X_n + \dotsm + a_n X_1 \),
where \(a_0, a_1, \dotsm, a_n\) minimize
\(
S(a_0, a_1, \dotsm, a_n) = \mathbb{E}(X_{n+h} - a_0, -a_1 X_n - \dotsm - a_n X_1)^2
\).
Result
-
The BLP is gicen by
\(
P_n X_{n+h} = \mu + a_1 (X_n - \mu) + \dotsm + a_n (X_1 - \mu)
\)
where \(\boldsymbol{a}_n = (a_1, \dotsm, a_n)'\) is determined by
\( \boldsymbol{\Gamma}_n \boldsymbol{a}_n = \boldsymbol{\gamma}_n(h) \)
-
\(\boldsymbol{\Gamma}_n = [\gamma(i - j)]_{i,j=1}^n\)
-
\(\boldsymbol{\gamma}_n(h) = (\gamma(h), \gamma(h+1), \dotsm, \gamma(h + n -1))' \)
-
Moreover, the corresponding MSE is given by
\(\mathbb{E}(X_{n+h} - P_n X_{n+h})^2 = \gamma(0) - \boldsymbol{a}_n' \boldsymbol{\gamma}_n(h)\)
Example
One-step prediction of AR(1): \(X_t = \phi X_{t-1} + Z_t\)
\(
P_n X_{n+1} = a_1 X_n + a_2 X_{n-1} + \dotsm + a_n X_1
\)
, where \((a_1, a_2, \dotsm, a_n)'\) is determined by
(see photo - 15:53 Sep 26)
Remark
\(P_n X_{n+h}\) is chosen such that
- \(\mathbb{E}(X_{n+h} - P_n X_{n+h}) = 0\)
- \(\mathbb{E}[(X_{n+h} - P_n X_{n+h})X_j] = 0, j = 1, \dotsm, n\)
Properties of \(P_n\)
-
We shall refer to \(P_n\) as the prediction operator based
on the finite past, \( \{ X_1, \dotsm, X_n \} \).
-
Let \(U\) and \(V\) be random variables with finite variance
and let \(a\), \(b\), and \(c\) be constants. Then,
-
\( \mathbb{E}(U - P_n U) = 0 \)
-
\( \mathbb{E}[(U - P_n U)X_j] = 0\), \(j = 1, \dotsm, n \)
-
\( P_n (aU + bV + c) = aP_n U + b P_n V + c \)
-
\( P_n U = U \) if \(U\) is a linear combination of \(1, X_1, \dotsm, X_n\)
-
\( P_n U = \mathbb{E}(U) \) if \(Cov(U, X_j) = 0\) for all \(j = 1, \dotsm, n \)
-
Examples:
-
AR(1): \(X_t = \phi X_{t-1} + Z_t\)
where \(\{Z_t\} \sim WN(0, \sigma^2)\).
\(P_n X_{n+1} = \phi X_n\) with MSE \( = \sigma^2\)
method by using properties of \(P_n\):
\(X_{n+1} = \phi X_n + Z_{n+1}\)
\(
P_n X_{n+1}
= P_n (\phi X_n + Z_{n+1})
= \phi P_n X_n + P_n Z_{n+1}
= \phi X_n + \mathbb{E}(Z_{n+1})
= \phi X_n
\)
-
One-step prediction of AR(\(p\)):
\( X_t = \phi_1 X_{t-1} + \dotsm + \phi_p X_{t-p} + Z_t \)
Goal: Find \(P_n X_{n+1}\)
If \(n \ge p\),
\(
P_n X_{n+1}
= P_n(\phi_1 X_{n} + \phi_2 X_{n-1} + \dotsm + \phi_p X_{n+1-p} + Z_{n+1})
= \phi_1 P_n X_n + \phi_2 P_n X_{n-1} + \dotsm + \phi_p P_n X_{n+1-p} + P_n Z_{n+1}
= \phi_1 X_n + \phi_2 X_{n-1} + \dotsm + \phi_p X_{n+1-p} + \mathbb{E} (Z_{n+1})
= \phi_1 X_n + \phi_2 X_{n-1} + \dotsm + \phi_p X_{n+1-p}
\)
\(MSE = \mathbb{E}[(X_{n+1} - P_n X_{n+1})^2] = \mathbb{E}(Z_{n+1}^2) = Var(Z_{n+1}) = \sigma^2\)
If \(n < p\), use "general" result.
-
\(h\)-step prediction of AR(1) with nonzero mean \(\mu\):
\( X_t - \mu = \phi (X_{t-1} - \mu) + Z_t \)
Goal: Find \(P_n X_{n+h}\), \(h = 1, 2, \dotsm\).
\(
X_{n+h} - \mu = \phi (X_{n+h-1} - \mu) + Z_{n+h}
\)
\(
P_n(X_{n+h} - \mu) = P_n(\phi (X_{n+h-1} - \mu) + Z_{n+h})
\)
\(
P_nX_{n+h} - \mu = \phi (P_n X_{n+h-1} - \mu) + P_n Z_{n+h}
= \dotsm
\)
(see photo - 15:26 Oct 1)
Prediction of Second-Order Random Variables
(self-study: Example 2.5.1 and Example 2.5.2)
Recursive Prediction Algorithms
-
To determine \(P_n X_{n+h}\), the direct approach requires solving a system of \(n\) linear equations.
-
For large \(n\) this may be difficult and time-consuming.
-
It would be helpful if \(P_n X_{n+1}\) could be simplify the calculation of \(P_{n+1} X_{n+2}\).
-
Prediction algorithms that utilize this idea are said to be recursive.
-
We'll introduce two recursive algorithms, the Durbin-Levinson algorithm and the
Innovations algorithm, for determining the one-step predictors \(P_n X_{n+1}\).
-
The algorithms can be extended to compute the \(h\)-step predictors \(P_n X_{n+h}\), \(h \ge 1\).
Durbin-Levinson Algorithm
-
Without loss of generality, we consider a stationary process \(\{X_t\}\) with mean 0 and ACVF \(\gamma(\cdot)\).
-
Write
\(
\begin{cases}
P_n X_{n+1} = \phi_{n1}X_n + \dotsm + \phi_{nn}X_1\\
v_n = \mathbb{E}(X_{n+1} - P_n X_{n+1})^2
\end{cases}
\)
-
The algorithm recursively computes \(\phi_{n1}, \dotsm, \phi_{nn}\) and
\(v_n\) from \(\phi_{n-1,1}, \dotsm, \phi_{n-1,n-1}\) and \(v_{n-1}\).
-
We start with \(v_0 = \gamma(0)\).
-
For \(n = 1, 2, \dotsm, \phi_{n1}, \dotsm, \phi_{nn}\) and \(v_n\) satisfy
\(
\phi_{nn} = \frac{1}{v_{n-1}}[\gamma(n) - \sum_{j=1}^{n-1} \phi_{n-1, j} \gamma(n-j)]
\)
and
\(
\begin{bmatrix}
\phi_{n1}\\
\vdots\\
\phi_{n, n-1}
\end{bmatrix}
=
\begin{bmatrix}
\phi_{n-1,1}\\
\vdots\\
\phi_{n-1, n-1}
\end{bmatrix}
- \phi_{nn}
\begin{bmatrix}
\phi_{n-1,n-1}\\
\vdots\\
\phi_{n-1, 1}
\end{bmatrix}
\),
\(
v_n = v_{n-1}(1 - \phi_{nn}^2)
\).
-
Example:
Prediction of an AR(1) process:
\( X_t = \phi X_{t-1} + Z_t \), \(\{Z_t\} \sim WN(0, \sigma^2)\)
(see photo - 15:51 Oct 1)
Innovations Algorithm
- Innovations algorithm is applicable even if the process is nonstationary.
-
Suppose \(\{X_t\}\) is a process with mean zero and ACVF \(\kappa(i, j) = \mathbb{E}(X_i X_j)\)
such that the matrix \( [\kappa(i,j)]_{i,j=1}^n \) is nonsingular for each \(n = 1,2,\dotsm\).
-
Write \(\hat{X}_1 = 0\) and \(\hat{X}_{n+1} = P_n X_{n+1}\), \(n = 1, 2, \dotsm\).
-
The innovations (one-step prediction errors),
\(X_1 - \hat{X}_1, \dotsm, X_n - \hat{X}_n\), are orthogonal (problem 2.20) in the sense
that \(\mathbb{E}(X_i - \hat{X}_i)(X_j - \hat{X}_j) = 0\) for \(i \neq j\).
- Write
\[
\begin{cases}
\hat{X}_{n+1} = \sum_{i=1}^n \theta_{ni}(X_{n-i+1} - \hat{X}_{n-i+1})\\
v_n = \mathbb{E}(X_{n+1} - \hat{X}_{n+1})^2
\end{cases}
\]
-
The algorithm recursively computes \(\theta_{n1}, \dotsm, \theta_{nn}\)
and \(v_n\) from \(\theta_{n-1,1}, \dotsm, \theta_{n-1,n-1}\) and \(v_{n-1}\).
Prediction of Stationary Time Series in Terms of Infinitely Many Past Values
-
Let \(\tilde{P}_n X_{n+h}\) denote the BLP of \(X_{n+h}\) in terms of 1 and \(\{X_s, -\infty < s \leq n\}\).
-
We refer to \(\tilde{P}_n\) as the
prediction operator based on the infinite past, \(\{X_s, -\infty < s \leq n\}\).
-
When \(n\) is large, we may approximate \(P_n X_{n+h}\) by \(\tilde{P}_n X_{n+h}\) to simplify calculation of \(P_n X_{n+h}\)
for MA and ARMA series.
Computation of \(\tilde{P}_n X_{n+h}\)
- Suppose \(\{X_t\}\) is a zero-mean stationary time series with ACVF \(\gamma(\cdot)\)
-
We write \(\tilde{P}_n X_{n+h} = \sum_{j=1}^\infty a_j X_{n+1-j}\)
-
Then, the problem is equivalent to finding \(a_1, a_2, \dotsm\) to minimize
\( \mathbb{E}(X_{n+h} - \sum_{j=1}^\infty a_j X_{n+1-j})^2 \)
- However, it involves an infinite set of linear equations.
-
To get around it, the properties of \(\tilde{P}_n\) can be used for the calculation of
\(\tilde{P}_n X_{n+h}\), especially when \(\{X_t\}\) is an MA or ARMA process.
Properties of \(\tilde{P}_n\)
Let \(U\) and \(V\) be random variables with finite variance
and let \(a\), \(b\), and \(c\) be constants. Then,
- \( \mathbb{E}(U - \tilde{P}_n U) = 0 \)
- \( \mathbb{E}[(U - \tilde{P}_n U)X_j] = 0 \), \(j \leq n\)
- \( \tilde{P}_n(aU + bV + c) = a \tilde{P}_n U + b \tilde{P}_n V + c\)
- \( \tilde{P}_n U = U \) if \(U\) is a linear combination of \(X_j, j \leq n\)
- \( \tilde{P}_n U = \mathbb{E}(U) \) if \(Cov(U, X_j) = 0\) for all \(j \leq n\)
Example
One-step prediction of MA(1): \(X_t = Z_t + \theta Z_{t-1}\), \(\{Z_t\} \sim WN(0, \sigma^2)\)
(see photo - 15:13 Oct 3)