Wangsheng Wu

Notes

Chapter 2

1. Basic Properties
2. Linear Processes
3. Introduction to ARMA Processes
Causality
Invertibility
4. Properties of Sample Mean and ACF
5. Forecasting Stationary Time Series

Stationary Processes

Last Update: October 3, 2024

1. Basic Properties

Let \(\{X_t\}\) be a stationary time series with

mean \(\mu\);
ACVF \(\gamma(h)\) and ACF \(\rho(h)\), \(h = 0, \pm1, \pm2, \dotsm\).

Basic Properties of ACVF

\( \gamma(0) \ge 0 \)
\( \mid \gamma(h) \mid \le \gamma(0) \)
\( 0 \le Var(X_{t+h}, X_t) = 2 \gamma(0) + 2 \gamma(h) \Rightarrow \mid \gamma(h) \mid \le \gamma(0) \)
\( \gamma(h) = \gamma(-h) \)

Basic Properties of ACF

\( \rho(0) = 1 \)
\( \mid \rho(h) \mid \le 1 \)
\( \rho(h) = \rho(-h) \)

Nonnegative Definite Function

A real-valued function \(\kappa\) defined on the integers is nonnegative definite if \[ \sum_{i=1}^n \sum_{j=1}^n a_i \kappa(i-j) a_j \ge 0 \] for all positive integers \(n\) and vectors \( \mathbf{a} = (a_1, \dotsm, a_n)' \) with real-valued components \(a_i\).

Theorem 2.1.1

\(\kappa(\cdot) \) is the ACVF of a stationary time series if & only if
1. \(\kappa(\cdot) \) is an even function, and
2. \(\kappa(\cdot) \) is nonnegative definite.
Note, (2) is hard to verify, so the theorem is usually used for disproof.

Problem 2.2 of HW 3 will use necessity of the theorem.

Remark

To show that \(\kappa(\cdot)\) is the ACVF of a stationary process, it is often simpler to find the process that \(\kappa(\cdot)\) as its ACVF than to verify (2.).
Example: which of the following functions are ACVF?
- \(\kappa(h) = (-1)^{\mid h \mid}\)
- \(\kappa(h) = 1 + \cos(\pi h / 2) + \cos (\pi h /4)\)
- \( \kappa(h) = \begin{cases} 1 & \text{if } h = 0\\ 0.4 & \text{if } h = \pm 1\\ 0 & \text{otherwise} \end{cases} \)

Role of ACVF & ACF in Time Series Forecasting

ACVF & ACF provide useful measure of dependence among time series data.
Hence, they play an important role in time series forecasting.

Illustration Example

Suppose \(\{X_t\}\) is a stationary Gaussian time series with mean \(\mu\), ACVF \(\gamma(\cdot)\), and ACF \(\rho(\cdot)\).
Suppose we have observed \(X_n\).
We want to forecast \( X_{n+h}, h \ge 1 \), bases on \(X_n\).

Best MSE Predictor

Criterion of the best prediction
- Find the predictor that minimizes \( \mathbb{E}[( X_{n+h} - f(X_n) )^2] \) over all possible functions \(f\).
Such predictor, say \(m(X_n)\) ,is called the best MSE predictor. (mean squared error)
What is the best MSE predictor of \(X_{n+h}\) based on \(X_{n}\)?
Answer: \(\mathbb{E}(X_{n+h} \mid X_{n})\).

Example

As \(\{X_t\} \)is a Gaussian time series,
\( \begin{pmatrix} X_{n+h} \\ X_n \end{pmatrix} \sim \mathcal{N} ( \begin{pmatrix} \mu \\ \mu \end{pmatrix}, \begin{pmatrix} \gamma(0) & \gamma(h)\\ \gamma(h) & \gamma(0) \end{pmatrix} ) \)
So, we have
\(X_{n+h} \mid X_n \sim \mathcal{N} (\mu + \rho(h)(X_n - \mu), \gamma(0)(1-\rho(h)^2))\)
It follows that
\( m(X_n) = \mathbb{E}(X_{n+h} \mid X_n) = \mu + \rho(h)(X_n - \mu) \)
The corresponding MSE is
\( \mathbb{E}[X_{n+h} - m(X_n)]^2 = \gamma(0)(1-\rho(h)^2) \)

Remarks

If \( \{X_t\} \) is a Gaussian time series, calculation of the best MSE predictor is no problem.
However, if \( \{X_t\} \) is not a Gaussian time series, then the calculation in general is complicated.
So, instead of looking for the best MSE predictor, we can look for the best linear predictor.

Best Linear Predictor (BLP)

Criterion of the best prediction
- Find the predictor that minimizes \( \mathbb{E}[( X_{n+h} - f(X_n) )^2] \) over all linear functions \(f\) of the form \(a X_n + b\).
Such predictor, say \(l(X_n)\) ,is called the best linear predictor.
Finding BLP is equivalent to finding \(a\) & \(b\) to minimize \( S(a, b) = \mathbb{E}[( X_{n+h} - a X_n - b)^2]\).

Example

\(f(X_n) = aX_n + b\), where \(\{X_t\}\) is stationary with \(\mathbb{E}(X_t)=\mu\), ACVF and ACF

\(\frac{\partial S(a,b)}{\partial b} = \mathbb{E}(-2(X_{n+h} - aX_n -b)) \overset{set}= 0 \)
\(\Rightarrow b = \mu(1-a) \)
Rewrite \(S(a,b) = \dotsm = \mathbb{E}[(X_{n+h} - \mu) - a(X_n - \mu)]^2 \)
Then \(\frac{\partial S(a,b)}{\partial a} = \dotsm \overset{set}= 0 \)
\(\Rightarrow a = \rho(h)\)

So, BLP is \(l(X_n) = \rho(h) X_n + \mu (1 - \rho(h)) = \mu + \rho(X_n - \mu) \)
Moreover, the corresponding MSE is \(\mathbb{E}[X_{n+h} - m(X_n)]^2 = \gamma(0)(1-\rho(h)^2)\)

Remarks

For Gaussian time series, best MSE predictor = BLP.
In general, best MSE predictor gives smaller MSE than BLP.
BLP only depends on the mean & ACVF of time series.
- So, it can be calculated without detailed knowledge of joint distributions.
- For non-Gaussian time series, it avoids the possible difficulty of computing conditional expectations even when joint distributions are known.

2. Linear Processes

\( \{X_t\} \) is a linear process if
\(X_t = \sum_{j = -\infty}^{\infty} \psi_j Z_{t-j}, \{Z_t\} \sim WN(0,\sigma^2) \)
where \( \sum_{j = -\infty}^{\infty} \mid \psi_j \mid < \infty \). (absolutely sumable)
Let \(\psi(z) = \sum_{j = -\infty}^{\infty} \psi_j z^j \), so we can write
\( X_t = \psi(B)Z_t \)

Remarks

The condition \(\sum_{j=-\infty}^{\infty} \mid \psi_j \mid < \infty\) ensures that, for each fixed \(t\), the infinite sum in the definition converges with probability one (aka "almost surely," c.f. Remark 1 of Section 2.2).

For each \(t\), the infinite sum converges absolutely (i.e., \(\sum_{j=-\infty}^{\infty} \mid \psi_j Z_{t-j} \mid < \infty\) ) with probability one.
It also ensures that \(\sum_{j=-\infty}^{\infty} \psi_j^2 < \infty\) and hence (see Appendix C) that the infinite sum converges in mean square.

MA(\(\infty\)) Processes

A linear process with \(\psi_j = 0\) for all \(j < 0\), i.e., \[ X_t = \sum_{j=0}^\infty \psi_j Z_{t-j}, \{Z_t\} \sim WN(0, \sigma^2) \] is called an MA(\(\infty\)) process.

Properties of Linear Processes

\( \mathbb{E}(X_t) = 0 \)
\( \gamma(h) = \sigma^2 \sum_{j = -\infty}^{\infty} \psi_j \psi_{j+h} \) (proof in textbook)
So, a linear process is weakly stationary.
A linear process is strictly stationary if \(\{Z_t\} \sim WN(0, \sigma^2) \) is replaced by \(\{Z_t\} \sim iid \)

Examples

MA(1)
\(X_t = Z_t + \theta Z_{t-1} \) where \(\{Z_t\} \sim WN(0, \sigma^2)\)
\(\psi_0 = 1, \psi_1 = \theta \) & \( \psi_j = 0 \) if \(j \ne 0\) or \(1\).
\(\gamma(h) = \sigma^2 \sum_j \psi_j \psi_{j+h} \)
\(= \begin{cases} \sigma^2 \sum_j \psi_j^2 = \sigma^2(1 + \theta^2) & h = 0\\ \sigma^2 \sum_j \psi_j \psi_{j+h} = \sigma^2 \theta & h = \pm 1\\ \sigma^2 \sum_j \psi_j \psi_{j+h} = 0 & \text{otherwise} \end{cases} \)
AR(1)
xxx

3. Introduction to ARMA Processes

\(\{X_t\}\) is an ARMA(\(p, q\)) process if \(\{X_t\}\) is stationary solution to
\(X_t - \phi_1 X_{t-1} - \dotsm - \phi_p X_{t-p}\)
\(= Z_t + \theta_1 Z_{t-1} + \dotsm + \theta_q Z_{t-q}\)
for every \(t\), where \(\{Z_t\} \sim WN(0, \sigma^2)\), \(\phi_p \ne 0\), \(\theta_q \ne 0\), and \(\phi(z) = 1 - \phi_1 z - \dotsm - \phi_p z^p\) and \(\theta(z) = 1 + \theta_1 z + \dotsm + \theta_q z^q\) have no common roots.
- \(\phi(z)\) is called the AR polynomial.
- \(\theta(z)\) is called the MA polynomial.
We can write the ARMA(\(p, q\)) equations in short as \[ \phi(B) X_t = \theta(B) Z_t \]
MA(\(q\)) process: \[ X_t = Z_t + \theta_1 Z_{t-1} + \dotsm + \theta_1 Z_{t-q} \]
AR(\(p\)) process: \[ X_t - \phi_1 X_{t-1} - \dotsm - \phi_p X_{t-p} = Z_t \]
\(\{X_t\}\) is an ARMA(\(p,q\)) process with mean \(\mu\) if \(\{X_t - \mu\}\) is an ARMA(\(p,q\)) process.

Existence and Uniqueness

ARMA equations have a stationary solution, which is also unique, if and only if \(\phi(z) \ne 0\) for \(\mid z \mid = 1\).

Note that, \(z\) can be a complex number and \(\mid z \mid\) is the modulus of \(z\).

Causality

ARMA(\(p,q\)) process \(\{X_t\}\) is causal if we can write \[ X_t = \sum_{j=0}^\infty \psi_j Z_{t-j}, \] with \( \sum_{j=0}^\infty \mid \psi_j \mid < \infty \).
\(\{X_t\}\) is causal \(\iff \phi(z) \ne 0\) for \( \mid z \mid \le 1\)
- To check for causality, find the roots of \( \phi(z) = 0\)
- If there exist any roots inside or on the unit circle, \(\{X_t\}\) is noncausal; otherwise, \(\{X_t\}\) is causal.
Examples:
- MA(2) process: \(X_t = Z_t - 0.4 Z_{t-1} + 0.04 Z_{t-2}\)
  \(\{X_t\}\) is causal by definition.
- AR(2) process: \(X_t - 0.7 X_{t-1} + 0.1 Z_{t-2} = Z_t\)
  \(\phi(z) = 1 - 0.7z + 0.1 z^2 = (1-0.5z)(1-0.2z) \overset{set}= 0\)
  So, roots of \(\phi(z)\) are \(z_1 = 2\) and \(z_2 = 5\). Both are outside the unit circle.
  So, \(\{X_t\}\) is causal.
- ARMA(1,1) process: \(X_t - 0.5 X_{t-1} = Z_t + 0.4 Z_{t-1}\)
  \(\phi(z) = 1 - 0.5z \overset{set}= 0\)
  \(\Rightarrow\) root of \(\phi(z)\) is \(z = 2\), outside the unit circle.
  So, \(\{X_t\}\) is causal.

Invertibility

ARMA(\(p,q\)) process \(\{X_t\}\) is invertible if we can write \[ Z_t = \sum_{j=0}^\infty \pi_j X_{t-j}, \] with \( \sum_{j=0}^\infty \mid \pi_j \mid < \infty \).
\(\{X_t\}\) is invertible \(\iff \theta(z) \ne 0\) for \( \mid z \mid \le 1\)
- To check for invertibility, find the roots of \( \theta(z) = 0\)
- If there exist any roots inside or on the unit circle, \(\{X_t\}\) is non-invertible; otherwise, \(\{X_t\}\) is invertible.
Examples:
- MA(2) process: \(X_t = Z_t - 0.4 Z_{t-1} + 0.04 Z_{t-2}\)
  \( \theta(z) = 1 - 0.4z + 0.04 z^2 = (1-0.2z)^2 \overset{set}= 0 \)
  \( \Rightarrow\) roots of \(\theta(z)\) are \(z_{1,2}=5\), outside unit circle.
  So, \(\{X_t\}\) is invertible.
- AR(2) process: \(X_t - 0.7 X_{t-1} + 0.1 Z_{t-2} = Z_t\)
  \(\{X_t\}\) is invertible by definition.
- ARMA(1,1) process: \(X_t - 0.5 X_{t-1} = Z_t + 0.4 Z_{t-1}\)
  ... \(\{X_t\}\) is invertible.

Remark

We shall assume causality and invertibility in this course unless we state otherwise.

4. Properties of Sample Mean and ACF

Suppose that \(X_1, \dotsm, X_n\) are observed data from a stationary process \(\{X_t\}\) with mean \(\mu\), ACVF \(\gamma(\cdot)\), and ACF \(\rho(\cdot)\).
Sample mean: \(\bar{X} = \frac{1}{n} \sum_{t=1}^n X_t\)
\(\bar{X}\) is the moment estimator of \(\mu\).

Properties of \(\bar{X}\)

\( \mathbb{E}(\bar{X}) = \mu \)
\(Var(\bar{X}) = \frac{1}{n} \sum_{\mid h \mid < n} (1 - \frac{\mid h \mid}{n}) \gamma(h) \)
\( Var(\bar{X}) = Var(\frac{1}{n} \sum_{t=1}^n X_t) = \frac{1}{n^2} Cov(\sum_{t=1}^n X_t, \sum_{s=1}^n X_s) = \frac{1}{n^2} \sum_{t=1}^n \sum_{s=1}^n Cov(X_t, X_s) = ... \)
(see photo - 15:34 Sep 24)
- \(Var(\bar{X}) \rightarrow 0\) if \(\gamma(h) \rightarrow 0\) as \(h \rightarrow \infty\)
- \(n Var(\bar{X}) \rightarrow \sum_{h = - \infty}^\infty \gamma(h) \) if \( \sum_{h = - \infty}^\infty \mid \gamma(h) \mid < \infty \)
For a large class of time series models,
\( \sqrt{n}(\bar{X} - \mu) \overset{approx.}\sim \mathcal{N}(0, \sum_{\mid h \mid < n} (1 - \frac{\mid h \mid}{n}) \gamma(h)) \)
Equivalently,
\( \sqrt{n}(\bar{X} - \mu) \overset{approx.}\sim \mathcal{N}(0, \sum_{h = - \infty}^\infty \gamma(h)) \)

CI for \(\mu\)

An approximate 95% CI for \(\mu\) is given by \[ \bar{X} \pm 1.96 \frac{\sqrt{\hat{v}}}{\sqrt{n}}, \] where \(\hat{v}\) is an estimator of \(v = \sum_{h=-\infty}^\infty \gamma(h)\), for example,
- \(\hat{v} = \sum_{\mid h \mid < \sqrt{n}} (1 - \frac{\mid h \mid}{n})\hat{\gamma}(h) \).
- \(\hat{v} = 2 \pi \hat{f}(0)\), where \( \hat{f}(0) \) estimates the spectral density valued at frequency 0 (see Chapter 4)
Example:
AR(1) with mean \(\mu\): \( X_t - \mu = \phi(X_{t-1} - \mu) + Z_t \), where \(\mid \phi \mid < 1\) and \( \{Z_t\} \sim WN(0, \sigma^2) \)

Sample ACVF & ACF

Sample ACVF:
\( \hat{\gamma}(h) = \frac{1}{n}\sum_{t=1}^{n - \mid h \mid}(X_{t+\mid h \mid} - \bar{X})(X_t - \bar{X}), -n < h < n \)
Sample ACF:
\( \hat{\rho}(h) = \frac{\hat{\gamma}(h)}{\hat{\gamma(0)}} \)
\(\hat{\gamma}(h)\) and \(\hat{\rho}(h)\) estimate \(\gamma(h)\) and \(\rho(h)\), respectively.

Sample Covariance Matrix

\[ \hat{\boldsymbol{\Gamma}}_k = \begin{bmatrix} \hat{\gamma}(0) & \hat{\gamma}(1) & \dots & \hat{\gamma}(k-1) \\ \hat{\gamma}(1) & \hat{\gamma}(0) & \dots & \hat{\gamma}(k-2) \\ \vdots & \vdots & \ddots & \vdots \\ \hat{\gamma}(k-1) & \hat{\gamma}(k-2) & \dots & \hat{\gamma}(0) \end{bmatrix} \]

\(\hat{\boldsymbol{\Gamma}}_k\) is nonnegative definite for all \(k \ge 1\).
Sample autocorrelation matrix:
\[ \hat{\boldsymbol{R}}_k = \hat{\boldsymbol{\Gamma}}_k / \hat{\gamma}(0) \]

Sampling Distribution of \( \hat{\rho}(\cdot) \)

For linear time series models,
\( \hat{\boldsymbol{\rho}} \overset{approx.}\sim \mathcal{N}(\boldsymbol{\rho}, \boldsymbol{W} / n), \)
where \( \hat{\boldsymbol{\rho}} = (\hat{\rho}(1), \dotsm, \hat{\rho}(h))' \), \(\boldsymbol{\rho} = (\rho(1), \dotsm, \rho(h))' \), and \(\boldsymbol{W}\) is a matrix whose \((i, j)\) element is given by Bartlett's formula; namely,
\[ \begin{align*} & w_{ij} \\ = & \sum_{k=1}^\infty \{ \rho(k+i) + \rho(k-i) - 2\rho(k)\rho(i) \} \\ & \times \{ \rho(k+j) + \rho(k-j) - 2\rho(k)\rho(j) \} \end{align*} \]
Examples:
- \(iid \) noise: \(\{X_t\} \sim iid(0, \sigma^2)\)
  (see photo - 15:11 Sep 26)
- MA(1): \(X_t = Z_t + \theta Z_{t-1}\), \(\{Z_t\} \sim WN(0, \sigma^2)\)
  \(X_t = Z_t + \theta Z_{t-1} \Rightarrow \gamma(l) = (1 + \theta^2)\sigma^2 \mathbb{I}_{l=0} + \theta \sigma^2 \mathbb{I}_{l = \pm 1}\)
  \( \hat{\rho}(i) \overset{approx.}\sim \mathcal{N}(\rho(i), \frac{w_{ii}}{n}) \), where
  (1) if \(i = 1\),
  \(w_{ii} = \dotsm = 1 - 3 \rho^2(1) + 4 \rho^4(1)\)
  (2) if \(i > 1\),
  \(w_{ii} = \dotsm = 1 + 2\rho^2(1)\)
  (see photo - 15:22 Sep 26)
- MA(\(q\)): \(X_t = Z_t + \theta_1 Z_{t-1} + \dotsm + \theta_q Z_{t-q}\), \(\{Z_t\} \sim WN(0, \sigma^2)\)
  \(\hat{\rho}(i) \overset{approx.}\sim \mathcal{N}(0, \frac{1 + 2 \rho^2(1) + 2 \rho^2(p)}{n}) \) for \(i > q\).
- AR(1): \( X_t = \phi X_{t-1} + Z_t \), \(\mid \phi \mid < 1\) & \(\{Z_t\} \sim WN(0, \sigma^2)\)
R Examples:
- A simulated MA(1) series with \(\theta = -0.9\)
- Lake Huron residuals

5. Forecasting Stationary Time Series

Let \(\{X_t\}\) be a stationary time series with known mean \(\mu\), ACVF \(\gamma(\cdot)\), and ACF \(\rho(\cdot)\).
Our goal is to find \(P_n X_{n+h}\), the BLP of \(X_{n+h}\) in terms of \(1, X_1, \dotsm, X_n\).
We write \( P_n X_{n+h} = a_0 + a_1 X_n + \dotsm + a_n X_1 \), where \(a_0, a_1, \dotsm, a_n\) minimize
\( S(a_0, a_1, \dotsm, a_n) = \mathbb{E}(X_{n+h} - a_0, -a_1 X_n - \dotsm - a_n X_1)^2 \).

Result

The BLP is gicen by
\( P_n X_{n+h} = \mu + a_1 (X_n - \mu) + \dotsm + a_n (X_1 - \mu) \)
where \(\boldsymbol{a}_n = (a_1, \dotsm, a_n)'\) is determined by
\( \boldsymbol{\Gamma}_n \boldsymbol{a}_n = \boldsymbol{\gamma}_n(h) \)
- \(\boldsymbol{\Gamma}_n = [\gamma(i - j)]_{i,j=1}^n\)
- \(\boldsymbol{\gamma}_n(h) = (\gamma(h), \gamma(h+1), \dotsm, \gamma(h + n -1))' \)
Moreover, the corresponding MSE is given by
\(\mathbb{E}(X_{n+h} - P_n X_{n+h})^2 = \gamma(0) - \boldsymbol{a}_n' \boldsymbol{\gamma}_n(h)\)

Example

One-step prediction of AR(1): \(X_t = \phi X_{t-1} + Z_t\)
\( P_n X_{n+1} = a_1 X_n + a_2 X_{n-1} + \dotsm + a_n X_1 \) , where \((a_1, a_2, \dotsm, a_n)'\) is determined by
(see photo - 15:53 Sep 26)

Remark

\(P_n X_{n+h}\) is chosen such that

\(\mathbb{E}(X_{n+h} - P_n X_{n+h}) = 0\)
\(\mathbb{E}[(X_{n+h} - P_n X_{n+h})X_j] = 0, j = 1, \dotsm, n\)

Properties of \(P_n\)

We shall refer to \(P_n\) as the prediction operator based on the finite past, \( \{ X_1, \dotsm, X_n \} \).
Let \(U\) and \(V\) be random variables with finite variance and let \(a\), \(b\), and \(c\) be constants. Then,
- \( \mathbb{E}(U - P_n U) = 0 \)
- \( \mathbb{E}[(U - P_n U)X_j] = 0\), \(j = 1, \dotsm, n \)
- \( P_n (aU + bV + c) = aP_n U + b P_n V + c \)
- \( P_n U = U \) if \(U\) is a linear combination of \(1, X_1, \dotsm, X_n\)
- \( P_n U = \mathbb{E}(U) \) if \(Cov(U, X_j) = 0\) for all \(j = 1, \dotsm, n \)
Examples:
- AR(1): \(X_t = \phi X_{t-1} + Z_t\)
  where \(\{Z_t\} \sim WN(0, \sigma^2)\). \(P_n X_{n+1} = \phi X_n\) with MSE \( = \sigma^2\)
  method by using properties of \(P_n\):
  \(X_{n+1} = \phi X_n + Z_{n+1}\)
  \( P_n X_{n+1} = P_n (\phi X_n + Z_{n+1}) = \phi P_n X_n + P_n Z_{n+1} = \phi X_n + \mathbb{E}(Z_{n+1}) = \phi X_n \)
- One-step prediction of AR(\(p\)):
  \( X_t = \phi_1 X_{t-1} + \dotsm + \phi_p X_{t-p} + Z_t \)
  Goal: Find \(P_n X_{n+1}\)
  If \(n \ge p\),
  \( P_n X_{n+1} = P_n(\phi_1 X_{n} + \phi_2 X_{n-1} + \dotsm + \phi_p X_{n+1-p} + Z_{n+1}) = \phi_1 P_n X_n + \phi_2 P_n X_{n-1} + \dotsm + \phi_p P_n X_{n+1-p} + P_n Z_{n+1} = \phi_1 X_n + \phi_2 X_{n-1} + \dotsm + \phi_p X_{n+1-p} + \mathbb{E} (Z_{n+1}) = \phi_1 X_n + \phi_2 X_{n-1} + \dotsm + \phi_p X_{n+1-p} \)
  \(MSE = \mathbb{E}[(X_{n+1} - P_n X_{n+1})^2] = \mathbb{E}(Z_{n+1}^2) = Var(Z_{n+1}) = \sigma^2\)
  If \(n < p\), use "general" result.
- \(h\)-step prediction of AR(1) with nonzero mean \(\mu\):
  \( X_t - \mu = \phi (X_{t-1} - \mu) + Z_t \)
  Goal: Find \(P_n X_{n+h}\), \(h = 1, 2, \dotsm\).
  \( X_{n+h} - \mu = \phi (X_{n+h-1} - \mu) + Z_{n+h} \)
  \( P_n(X_{n+h} - \mu) = P_n(\phi (X_{n+h-1} - \mu) + Z_{n+h}) \)
  \( P_nX_{n+h} - \mu = \phi (P_n X_{n+h-1} - \mu) + P_n Z_{n+h} = \dotsm \)
  (see photo - 15:26 Oct 1)

Prediction of Second-Order Random Variables

(self-study: Example 2.5.1 and Example 2.5.2)

Recursive Prediction Algorithms

To determine \(P_n X_{n+h}\), the direct approach requires solving a system of \(n\) linear equations.
For large \(n\) this may be difficult and time-consuming.
It would be helpful if \(P_n X_{n+1}\) could be simplify the calculation of \(P_{n+1} X_{n+2}\).
Prediction algorithms that utilize this idea are said to be recursive.
We'll introduce two recursive algorithms, the Durbin-Levinson algorithm and the Innovations algorithm, for determining the one-step predictors \(P_n X_{n+1}\).
The algorithms can be extended to compute the \(h\)-step predictors \(P_n X_{n+h}\), \(h \ge 1\).

Durbin-Levinson Algorithm

Without loss of generality, we consider a stationary process \(\{X_t\}\) with mean 0 and ACVF \(\gamma(\cdot)\).
Write
\( \begin{cases} P_n X_{n+1} = \phi_{n1}X_n + \dotsm + \phi_{nn}X_1\\ v_n = \mathbb{E}(X_{n+1} - P_n X_{n+1})^2 \end{cases} \)
The algorithm recursively computes \(\phi_{n1}, \dotsm, \phi_{nn}\) and \(v_n\) from \(\phi_{n-1,1}, \dotsm, \phi_{n-1,n-1}\) and \(v_{n-1}\).
We start with \(v_0 = \gamma(0)\).
For \(n = 1, 2, \dotsm, \phi_{n1}, \dotsm, \phi_{nn}\) and \(v_n\) satisfy
\( \phi_{nn} = \frac{1}{v_{n-1}}[\gamma(n) - \sum_{j=1}^{n-1} \phi_{n-1, j} \gamma(n-j)] \) and \( \begin{bmatrix} \phi_{n1}\\ \vdots\\ \phi_{n, n-1} \end{bmatrix} = \begin{bmatrix} \phi_{n-1,1}\\ \vdots\\ \phi_{n-1, n-1} \end{bmatrix} - \phi_{nn} \begin{bmatrix} \phi_{n-1,n-1}\\ \vdots\\ \phi_{n-1, 1} \end{bmatrix} \), \( v_n = v_{n-1}(1 - \phi_{nn}^2) \).
Example:
Prediction of an AR(1) process:
\( X_t = \phi X_{t-1} + Z_t \), \(\{Z_t\} \sim WN(0, \sigma^2)\)
(see photo - 15:51 Oct 1)

Innovations Algorithm

Innovations algorithm is applicable even if the process is nonstationary.
Suppose \(\{X_t\}\) is a process with mean zero and ACVF \(\kappa(i, j) = \mathbb{E}(X_i X_j)\) such that the matrix \( [\kappa(i,j)]_{i,j=1}^n \) is nonsingular for each \(n = 1,2,\dotsm\).
Write \(\hat{X}_1 = 0\) and \(\hat{X}_{n+1} = P_n X_{n+1}\), \(n = 1, 2, \dotsm\).
The innovations (one-step prediction errors), \(X_1 - \hat{X}_1, \dotsm, X_n - \hat{X}_n\), are orthogonal (problem 2.20) in the sense that \(\mathbb{E}(X_i - \hat{X}_i)(X_j - \hat{X}_j) = 0\) for \(i \neq j\).
Write

\[ \begin{cases} \hat{X}_{n+1} = \sum_{i=1}^n \theta_{ni}(X_{n-i+1} - \hat{X}_{n-i+1})\\ v_n = \mathbb{E}(X_{n+1} - \hat{X}_{n+1})^2 \end{cases} \]

The algorithm recursively computes \(\theta_{n1}, \dotsm, \theta_{nn}\) and \(v_n\) from \(\theta_{n-1,1}, \dotsm, \theta_{n-1,n-1}\) and \(v_{n-1}\).

Prediction of Stationary Time Series in Terms of Infinitely Many Past Values

Let \(\tilde{P}_n X_{n+h}\) denote the BLP of \(X_{n+h}\) in terms of 1 and \(\{X_s, -\infty < s \leq n\}\).
We refer to \(\tilde{P}_n\) as the prediction operator based on the infinite past, \(\{X_s, -\infty < s \leq n\}\).
When \(n\) is large, we may approximate \(P_n X_{n+h}\) by \(\tilde{P}_n X_{n+h}\) to simplify calculation of \(P_n X_{n+h}\) for MA and ARMA series.

Computation of \(\tilde{P}_n X_{n+h}\)

Suppose \(\{X_t\}\) is a zero-mean stationary time series with ACVF \(\gamma(\cdot)\)
We write \(\tilde{P}_n X_{n+h} = \sum_{j=1}^\infty a_j X_{n+1-j}\)
Then, the problem is equivalent to finding \(a_1, a_2, \dotsm\) to minimize
\( \mathbb{E}(X_{n+h} - \sum_{j=1}^\infty a_j X_{n+1-j})^2 \)
However, it involves an infinite set of linear equations.
To get around it, the properties of \(\tilde{P}_n\) can be used for the calculation of \(\tilde{P}_n X_{n+h}\), especially when \(\{X_t\}\) is an MA or ARMA process.

Properties of \(\tilde{P}_n\)

Let \(U\) and \(V\) be random variables with finite variance and let \(a\), \(b\), and \(c\) be constants. Then,

\( \mathbb{E}(U - \tilde{P}_n U) = 0 \)
\( \mathbb{E}[(U - \tilde{P}_n U)X_j] = 0 \), \(j \leq n\)
\( \tilde{P}_n(aU + bV + c) = a \tilde{P}_n U + b \tilde{P}_n V + c\)
\( \tilde{P}_n U = U \) if \(U\) is a linear combination of \(X_j, j \leq n\)
\( \tilde{P}_n U = \mathbb{E}(U) \) if \(Cov(U, X_j) = 0\) for all \(j \leq n\)

Example

One-step prediction of MA(1): \(X_t = Z_t + \theta Z_{t-1}\), \(\{Z_t\} \sim WN(0, \sigma^2)\)
(see photo - 15:13 Oct 3)