Wangsheng's World
Stationary Processes
Last Update: October 3, 2024
1. Basic Properties
Let \(\{X_t\}\) be a stationary time series with
  • mean \(\mu\);
  • ACVF \(\gamma(h)\) and ACF \(\rho(h)\), \(h = 0, \pm1, \pm2, \dotsm\).
Basic Properties of ACVF
  • \( \gamma(0) \ge 0 \)
  • \( \mid \gamma(h) \mid \le \gamma(0) \)
    \( 0 \le Var(X_{t+h}, X_t) = 2 \gamma(0) + 2 \gamma(h) \Rightarrow \mid \gamma(h) \mid \le \gamma(0) \)
  • \( \gamma(h) = \gamma(-h) \)
Basic Properties of ACF
  • \( \rho(0) = 1 \)
  • \( \mid \rho(h) \mid \le 1 \)
  • \( \rho(h) = \rho(-h) \)
Nonnegative Definite Function
A real-valued function \(\kappa\) defined on the integers is nonnegative definite if \[ \sum_{i=1}^n \sum_{j=1}^n a_i \kappa(i-j) a_j \ge 0 \] for all positive integers \(n\) and vectors \( \mathbf{a} = (a_1, \dotsm, a_n)' \) with real-valued components \(a_i\).
Theorem 2.1.1
  • \(\kappa(\cdot) \) is the ACVF of a stationary time series if & only if
    1. \(\kappa(\cdot) \) is an even function, and
    2. \(\kappa(\cdot) \) is nonnegative definite.
  • Note, (2) is hard to verify, so the theorem is usually used for disproof.
Problem 2.2 of HW 3 will use necessity of the theorem.
Remark
  • To show that \(\kappa(\cdot)\) is the ACVF of a stationary process, it is often simpler to find the process that \(\kappa(\cdot)\) as its ACVF than to verify (2.).
  • Example: which of the following functions are ACVF?
    • \(\kappa(h) = (-1)^{\mid h \mid}\)
    • \(\kappa(h) = 1 + \cos(\pi h / 2) + \cos (\pi h /4)\)
    • \( \kappa(h) = \begin{cases} 1 & \text{if } h = 0\\ 0.4 & \text{if } h = \pm 1\\ 0 & \text{otherwise} \end{cases} \)
Role of ACVF & ACF in Time Series Forecasting
  • ACVF & ACF provide useful measure of dependence among time series data.
  • Hence, they play an important role in time series forecasting.
Illustration Example
  • Suppose \(\{X_t\}\) is a stationary Gaussian time series with mean \(\mu\), ACVF \(\gamma(\cdot)\), and ACF \(\rho(\cdot)\).
  • Suppose we have observed \(X_n\).
  • We want to forecast \( X_{n+h}, h \ge 1 \), bases on \(X_n\).
Best MSE Predictor
  • Criterion of the best prediction
    • Find the predictor that minimizes \( \mathbb{E}[( X_{n+h} - f(X_n) )^2] \) over all possible functions \(f\).
  • Such predictor, say \(m(X_n)\) ,is called the best MSE predictor. (mean squared error)
  • What is the best MSE predictor of \(X_{n+h}\) based on \(X_{n}\)?
    Answer: \(\mathbb{E}(X_{n+h} \mid X_{n})\).
Example
  • As \(\{X_t\} \)is a Gaussian time series,
    \( \begin{pmatrix} X_{n+h} \\ X_n \end{pmatrix} \sim \mathcal{N} ( \begin{pmatrix} \mu \\ \mu \end{pmatrix}, \begin{pmatrix} \gamma(0) & \gamma(h)\\ \gamma(h) & \gamma(0) \end{pmatrix} ) \)
  • So, we have
    \(X_{n+h} \mid X_n \sim \mathcal{N} (\mu + \rho(h)(X_n - \mu), \gamma(0)(1-\rho(h)^2))\)
  • It follows that
    \( m(X_n) = \mathbb{E}(X_{n+h} \mid X_n) = \mu + \rho(h)(X_n - \mu) \)
  • The corresponding MSE is
    \( \mathbb{E}[X_{n+h} - m(X_n)]^2 = \gamma(0)(1-\rho(h)^2) \)
Remarks
  • If \( \{X_t\} \) is a Gaussian time series, calculation of the best MSE predictor is no problem.
  • However, if \( \{X_t\} \) is not a Gaussian time series, then the calculation in general is complicated.
  • So, instead of looking for the best MSE predictor, we can look for the best linear predictor.
Best Linear Predictor (BLP)
  • Criterion of the best prediction
    • Find the predictor that minimizes \( \mathbb{E}[( X_{n+h} - f(X_n) )^2] \) over all linear functions \(f\) of the form \(a X_n + b\).
  • Such predictor, say \(l(X_n)\) ,is called the best linear predictor.
  • Finding BLP is equivalent to finding \(a\) & \(b\) to minimize \( S(a, b) = \mathbb{E}[( X_{n+h} - a X_n - b)^2]\).
Example
\(f(X_n) = aX_n + b\), where \(\{X_t\}\) is stationary with \(\mathbb{E}(X_t)=\mu\), ACVF and ACF
  1. \(\frac{\partial S(a,b)}{\partial b} = \mathbb{E}(-2(X_{n+h} - aX_n -b)) \overset{set}= 0 \)
    \(\Rightarrow b = \mu(1-a) \)
  2. Rewrite \(S(a,b) = \dotsm = \mathbb{E}[(X_{n+h} - \mu) - a(X_n - \mu)]^2 \)
    Then \(\frac{\partial S(a,b)}{\partial a} = \dotsm \overset{set}= 0 \)
    \(\Rightarrow a = \rho(h)\)
So, BLP is \(l(X_n) = \rho(h) X_n + \mu (1 - \rho(h)) = \mu + \rho(X_n - \mu) \)
Moreover, the corresponding MSE is \(\mathbb{E}[X_{n+h} - m(X_n)]^2 = \gamma(0)(1-\rho(h)^2)\)
Remarks
  • For Gaussian time series, best MSE predictor = BLP.
  • In general, best MSE predictor gives smaller MSE than BLP.
  • BLP only depends on the mean & ACVF of time series.
    • So, it can be calculated without detailed knowledge of joint distributions.
    • For non-Gaussian time series, it avoids the possible difficulty of computing conditional expectations even when joint distributions are known.
2. Linear Processes
\( \{X_t\} \) is a linear process if
\(X_t = \sum_{j = -\infty}^{\infty} \psi_j Z_{t-j}, \{Z_t\} \sim WN(0,\sigma^2) \)
where \( \sum_{j = -\infty}^{\infty} \mid \psi_j \mid < \infty \). (absolutely sumable)
Let \(\psi(z) = \sum_{j = -\infty}^{\infty} \psi_j z^j \), so we can write
\( X_t = \psi(B)Z_t \)
Remarks
The condition \(\sum_{j=-\infty}^{\infty} \mid \psi_j \mid < \infty\) ensures that, for each fixed \(t\), the infinite sum in the definition converges with probability one (aka "almost surely," c.f. Remark 1 of Section 2.2).
  • For each \(t\), the infinite sum converges absolutely (i.e., \(\sum_{j=-\infty}^{\infty} \mid \psi_j Z_{t-j} \mid < \infty\) ) with probability one.
  • It also ensures that \(\sum_{j=-\infty}^{\infty} \psi_j^2 < \infty\) and hence (see Appendix C) that the infinite sum converges in mean square.
MA(\(\infty\)) Processes
A linear process with \(\psi_j = 0\) for all \(j < 0\), i.e., \[ X_t = \sum_{j=0}^\infty \psi_j Z_{t-j}, \{Z_t\} \sim WN(0, \sigma^2) \] is called an MA(\(\infty\)) process.
Properties of Linear Processes
  • \( \mathbb{E}(X_t) = 0 \)
  • \( \gamma(h) = \sigma^2 \sum_{j = -\infty}^{\infty} \psi_j \psi_{j+h} \) (proof in textbook)
  • So, a linear process is weakly stationary.
  • A linear process is strictly stationary if \(\{Z_t\} \sim WN(0, \sigma^2) \) is replaced by \(\{Z_t\} \sim iid \)
Examples
  • MA(1)
    \(X_t = Z_t + \theta Z_{t-1} \) where \(\{Z_t\} \sim WN(0, \sigma^2)\)
    \(\psi_0 = 1, \psi_1 = \theta \) & \( \psi_j = 0 \) if \(j \ne 0\) or \(1\).
    \(\gamma(h) = \sigma^2 \sum_j \psi_j \psi_{j+h} \)
    \(= \begin{cases} \sigma^2 \sum_j \psi_j^2 = \sigma^2(1 + \theta^2) & h = 0\\ \sigma^2 \sum_j \psi_j \psi_{j+h} = \sigma^2 \theta & h = \pm 1\\ \sigma^2 \sum_j \psi_j \psi_{j+h} = 0 & \text{otherwise} \end{cases} \)
  • AR(1)
    xxx
3. Introduction to ARMA Processes
  • \(\{X_t\}\) is an ARMA(\(p, q\)) process if \(\{X_t\}\) is stationary solution to
    \(X_t - \phi_1 X_{t-1} - \dotsm - \phi_p X_{t-p}\)
    \(= Z_t + \theta_1 Z_{t-1} + \dotsm + \theta_q Z_{t-q}\)
    for every \(t\), where \(\{Z_t\} \sim WN(0, \sigma^2)\), \(\phi_p \ne 0\), \(\theta_q \ne 0\), and \(\phi(z) = 1 - \phi_1 z - \dotsm - \phi_p z^p\) and \(\theta(z) = 1 + \theta_1 z + \dotsm + \theta_q z^q\) have no common roots.
    • \(\phi(z)\) is called the AR polynomial.
    • \(\theta(z)\) is called the MA polynomial.
  • We can write the ARMA(\(p, q\)) equations in short as \[ \phi(B) X_t = \theta(B) Z_t \]
  • MA(\(q\)) process: \[ X_t = Z_t + \theta_1 Z_{t-1} + \dotsm + \theta_1 Z_{t-q} \]
  • AR(\(p\)) process: \[ X_t - \phi_1 X_{t-1} - \dotsm - \phi_p X_{t-p} = Z_t \]
  • \(\{X_t\}\) is an ARMA(\(p,q\)) process with mean \(\mu\) if \(\{X_t - \mu\}\) is an ARMA(\(p,q\)) process.
Existence and Uniqueness
ARMA equations have a stationary solution, which is also unique, if and only if \(\phi(z) \ne 0\) for \(\mid z \mid = 1\).
  • Note that, \(z\) can be a complex number and \(\mid z \mid\) is the modulus of \(z\).
Causality
  • ARMA(\(p,q\)) process \(\{X_t\}\) is causal if we can write \[ X_t = \sum_{j=0}^\infty \psi_j Z_{t-j}, \] with \( \sum_{j=0}^\infty \mid \psi_j \mid < \infty \).
  • \(\{X_t\}\) is causal \(\iff \phi(z) \ne 0\) for \( \mid z \mid \le 1\)
    • To check for causality, find the roots of \( \phi(z) = 0\)
    • If there exist any roots inside or on the unit circle, \(\{X_t\}\) is noncausal; otherwise, \(\{X_t\}\) is causal.
  • Examples:
    • MA(2) process: \(X_t = Z_t - 0.4 Z_{t-1} + 0.04 Z_{t-2}\)
      \(\{X_t\}\) is causal by definition.
    • AR(2) process: \(X_t - 0.7 X_{t-1} + 0.1 Z_{t-2} = Z_t\)
      \(\phi(z) = 1 - 0.7z + 0.1 z^2 = (1-0.5z)(1-0.2z) \overset{set}= 0\)
      So, roots of \(\phi(z)\) are \(z_1 = 2\) and \(z_2 = 5\). Both are outside the unit circle.
      So, \(\{X_t\}\) is causal.
    • ARMA(1,1) process: \(X_t - 0.5 X_{t-1} = Z_t + 0.4 Z_{t-1}\)
      \(\phi(z) = 1 - 0.5z \overset{set}= 0\)
      \(\Rightarrow\) root of \(\phi(z)\) is \(z = 2\), outside the unit circle.
      So, \(\{X_t\}\) is causal.
Invertibility
  • ARMA(\(p,q\)) process \(\{X_t\}\) is invertible if we can write \[ Z_t = \sum_{j=0}^\infty \pi_j X_{t-j}, \] with \( \sum_{j=0}^\infty \mid \pi_j \mid < \infty \).
  • \(\{X_t\}\) is invertible \(\iff \theta(z) \ne 0\) for \( \mid z \mid \le 1\)
    • To check for invertibility, find the roots of \( \theta(z) = 0\)
    • If there exist any roots inside or on the unit circle, \(\{X_t\}\) is non-invertible; otherwise, \(\{X_t\}\) is invertible.
  • Examples:
    • MA(2) process: \(X_t = Z_t - 0.4 Z_{t-1} + 0.04 Z_{t-2}\)
      \( \theta(z) = 1 - 0.4z + 0.04 z^2 = (1-0.2z)^2 \overset{set}= 0 \)
      \( \Rightarrow\) roots of \(\theta(z)\) are \(z_{1,2}=5\), outside unit circle.
      So, \(\{X_t\}\) is invertible.
    • AR(2) process: \(X_t - 0.7 X_{t-1} + 0.1 Z_{t-2} = Z_t\)
      \(\{X_t\}\) is invertible by definition.
    • ARMA(1,1) process: \(X_t - 0.5 X_{t-1} = Z_t + 0.4 Z_{t-1}\)
      ... \(\{X_t\}\) is invertible.
Remark
We shall assume causality and invertibility in this course unless we state otherwise.
4. Properties of Sample Mean and ACF
  • Suppose that \(X_1, \dotsm, X_n\) are observed data from a stationary process \(\{X_t\}\) with mean \(\mu\), ACVF \(\gamma(\cdot)\), and ACF \(\rho(\cdot)\).
  • Sample mean: \(\bar{X} = \frac{1}{n} \sum_{t=1}^n X_t\)
  • \(\bar{X}\) is the moment estimator of \(\mu\).
Properties of \(\bar{X}\)
  • \( \mathbb{E}(\bar{X}) = \mu \)
  • \(Var(\bar{X}) = \frac{1}{n} \sum_{\mid h \mid < n} (1 - \frac{\mid h \mid}{n}) \gamma(h) \)
    \( Var(\bar{X}) = Var(\frac{1}{n} \sum_{t=1}^n X_t) = \frac{1}{n^2} Cov(\sum_{t=1}^n X_t, \sum_{s=1}^n X_s) = \frac{1}{n^2} \sum_{t=1}^n \sum_{s=1}^n Cov(X_t, X_s) = ... \)
    (see photo - 15:34 Sep 24)
    • \(Var(\bar{X}) \rightarrow 0\) if \(\gamma(h) \rightarrow 0\) as \(h \rightarrow \infty\)
    • \(n Var(\bar{X}) \rightarrow \sum_{h = - \infty}^\infty \gamma(h) \) if \( \sum_{h = - \infty}^\infty \mid \gamma(h) \mid < \infty \)
  • For a large class of time series models,
    \( \sqrt{n}(\bar{X} - \mu) \overset{approx.}\sim \mathcal{N}(0, \sum_{\mid h \mid < n} (1 - \frac{\mid h \mid}{n}) \gamma(h)) \)
  • Equivalently,
    \( \sqrt{n}(\bar{X} - \mu) \overset{approx.}\sim \mathcal{N}(0, \sum_{h = - \infty}^\infty \gamma(h)) \)
CI for \(\mu\)
  • An approximate 95% CI for \(\mu\) is given by \[ \bar{X} \pm 1.96 \frac{\sqrt{\hat{v}}}{\sqrt{n}}, \] where \(\hat{v}\) is an estimator of \(v = \sum_{h=-\infty}^\infty \gamma(h)\), for example,
    • \(\hat{v} = \sum_{\mid h \mid < \sqrt{n}} (1 - \frac{\mid h \mid}{n})\hat{\gamma}(h) \).
    • \(\hat{v} = 2 \pi \hat{f}(0)\), where \( \hat{f}(0) \) estimates the spectral density valued at frequency 0 (see Chapter 4)
  • Example:
    AR(1) with mean \(\mu\): \( X_t - \mu = \phi(X_{t-1} - \mu) + Z_t \), where \(\mid \phi \mid < 1\) and \( \{Z_t\} \sim WN(0, \sigma^2) \)
Sample ACVF & ACF
  • Sample ACVF:
    \( \hat{\gamma}(h) = \frac{1}{n}\sum_{t=1}^{n - \mid h \mid}(X_{t+\mid h \mid} - \bar{X})(X_t - \bar{X}), -n < h < n \)
  • Sample ACF:
    \( \hat{\rho}(h) = \frac{\hat{\gamma}(h)}{\hat{\gamma(0)}} \)
  • \(\hat{\gamma}(h)\) and \(\hat{\rho}(h)\) estimate \(\gamma(h)\) and \(\rho(h)\), respectively.
Sample Covariance Matrix
\[ \hat{\boldsymbol{\Gamma}}_k = \begin{bmatrix} \hat{\gamma}(0) & \hat{\gamma}(1) & \dots & \hat{\gamma}(k-1) \\ \hat{\gamma}(1) & \hat{\gamma}(0) & \dots & \hat{\gamma}(k-2) \\ \vdots & \vdots & \ddots & \vdots \\ \hat{\gamma}(k-1) & \hat{\gamma}(k-2) & \dots & \hat{\gamma}(0) \end{bmatrix} \]
  • \(\hat{\boldsymbol{\Gamma}}_k\) is nonnegative definite for all \(k \ge 1\).
  • Sample autocorrelation matrix:
    \[ \hat{\boldsymbol{R}}_k = \hat{\boldsymbol{\Gamma}}_k / \hat{\gamma}(0) \]
Sampling Distribution of \( \hat{\rho}(\cdot) \)
  • For linear time series models,
    \( \hat{\boldsymbol{\rho}} \overset{approx.}\sim \mathcal{N}(\boldsymbol{\rho}, \boldsymbol{W} / n), \)
    where \( \hat{\boldsymbol{\rho}} = (\hat{\rho}(1), \dotsm, \hat{\rho}(h))' \), \(\boldsymbol{\rho} = (\rho(1), \dotsm, \rho(h))' \), and \(\boldsymbol{W}\) is a matrix whose \((i, j)\) element is given by Bartlett's formula; namely,
    \[ \begin{align*} & w_{ij} \\ = & \sum_{k=1}^\infty \{ \rho(k+i) + \rho(k-i) - 2\rho(k)\rho(i) \} \\ & \times \{ \rho(k+j) + \rho(k-j) - 2\rho(k)\rho(j) \} \end{align*} \]
  • Examples:
    • \(iid \) noise: \(\{X_t\} \sim iid(0, \sigma^2)\)
      (see photo - 15:11 Sep 26)
    • MA(1): \(X_t = Z_t + \theta Z_{t-1}\), \(\{Z_t\} \sim WN(0, \sigma^2)\)
      \(X_t = Z_t + \theta Z_{t-1} \Rightarrow \gamma(l) = (1 + \theta^2)\sigma^2 \mathbb{I}_{l=0} + \theta \sigma^2 \mathbb{I}_{l = \pm 1}\)
      \( \hat{\rho}(i) \overset{approx.}\sim \mathcal{N}(\rho(i), \frac{w_{ii}}{n}) \), where
      (1) if \(i = 1\),
      \(w_{ii} = \dotsm = 1 - 3 \rho^2(1) + 4 \rho^4(1)\)
      (2) if \(i > 1\),
      \(w_{ii} = \dotsm = 1 + 2\rho^2(1)\)
      (see photo - 15:22 Sep 26)
    • MA(\(q\)): \(X_t = Z_t + \theta_1 Z_{t-1} + \dotsm + \theta_q Z_{t-q}\), \(\{Z_t\} \sim WN(0, \sigma^2)\)
      \(\hat{\rho}(i) \overset{approx.}\sim \mathcal{N}(0, \frac{1 + 2 \rho^2(1) + 2 \rho^2(p)}{n}) \) for \(i > q\).
    • AR(1): \( X_t = \phi X_{t-1} + Z_t \), \(\mid \phi \mid < 1\) & \(\{Z_t\} \sim WN(0, \sigma^2)\)
  • R Examples:
    • A simulated MA(1) series with \(\theta = -0.9\)
    • Lake Huron residuals
5. Forecasting Stationary Time Series
  • Let \(\{X_t\}\) be a stationary time series with known mean \(\mu\), ACVF \(\gamma(\cdot)\), and ACF \(\rho(\cdot)\).
  • Our goal is to find \(P_n X_{n+h}\), the BLP of \(X_{n+h}\) in terms of \(1, X_1, \dotsm, X_n\).
  • We write \( P_n X_{n+h} = a_0 + a_1 X_n + \dotsm + a_n X_1 \), where \(a_0, a_1, \dotsm, a_n\) minimize
    \( S(a_0, a_1, \dotsm, a_n) = \mathbb{E}(X_{n+h} - a_0, -a_1 X_n - \dotsm - a_n X_1)^2 \).
Result
  • The BLP is gicen by
    \( P_n X_{n+h} = \mu + a_1 (X_n - \mu) + \dotsm + a_n (X_1 - \mu) \)
    where \(\boldsymbol{a}_n = (a_1, \dotsm, a_n)'\) is determined by
    \( \boldsymbol{\Gamma}_n \boldsymbol{a}_n = \boldsymbol{\gamma}_n(h) \)
    • \(\boldsymbol{\Gamma}_n = [\gamma(i - j)]_{i,j=1}^n\)
    • \(\boldsymbol{\gamma}_n(h) = (\gamma(h), \gamma(h+1), \dotsm, \gamma(h + n -1))' \)
  • Moreover, the corresponding MSE is given by
    \(\mathbb{E}(X_{n+h} - P_n X_{n+h})^2 = \gamma(0) - \boldsymbol{a}_n' \boldsymbol{\gamma}_n(h)\)
Example
One-step prediction of AR(1): \(X_t = \phi X_{t-1} + Z_t\)
\( P_n X_{n+1} = a_1 X_n + a_2 X_{n-1} + \dotsm + a_n X_1 \) , where \((a_1, a_2, \dotsm, a_n)'\) is determined by
(see photo - 15:53 Sep 26)
Remark
\(P_n X_{n+h}\) is chosen such that
  • \(\mathbb{E}(X_{n+h} - P_n X_{n+h}) = 0\)
  • \(\mathbb{E}[(X_{n+h} - P_n X_{n+h})X_j] = 0, j = 1, \dotsm, n\)
Properties of \(P_n\)
  • We shall refer to \(P_n\) as the prediction operator based on the finite past, \( \{ X_1, \dotsm, X_n \} \).
  • Let \(U\) and \(V\) be random variables with finite variance and let \(a\), \(b\), and \(c\) be constants. Then,
    • \( \mathbb{E}(U - P_n U) = 0 \)
    • \( \mathbb{E}[(U - P_n U)X_j] = 0\), \(j = 1, \dotsm, n \)
    • \( P_n (aU + bV + c) = aP_n U + b P_n V + c \)
    • \( P_n U = U \) if \(U\) is a linear combination of \(1, X_1, \dotsm, X_n\)
    • \( P_n U = \mathbb{E}(U) \) if \(Cov(U, X_j) = 0\) for all \(j = 1, \dotsm, n \)
  • Examples:
    • AR(1): \(X_t = \phi X_{t-1} + Z_t\)
      where \(\{Z_t\} \sim WN(0, \sigma^2)\). \(P_n X_{n+1} = \phi X_n\) with MSE \( = \sigma^2\)
      method by using properties of \(P_n\):
      \(X_{n+1} = \phi X_n + Z_{n+1}\)
      \( P_n X_{n+1} = P_n (\phi X_n + Z_{n+1}) = \phi P_n X_n + P_n Z_{n+1} = \phi X_n + \mathbb{E}(Z_{n+1}) = \phi X_n \)
    • One-step prediction of AR(\(p\)):
      \( X_t = \phi_1 X_{t-1} + \dotsm + \phi_p X_{t-p} + Z_t \)
      Goal: Find \(P_n X_{n+1}\)
      If \(n \ge p\),
      \( P_n X_{n+1} = P_n(\phi_1 X_{n} + \phi_2 X_{n-1} + \dotsm + \phi_p X_{n+1-p} + Z_{n+1}) = \phi_1 P_n X_n + \phi_2 P_n X_{n-1} + \dotsm + \phi_p P_n X_{n+1-p} + P_n Z_{n+1} = \phi_1 X_n + \phi_2 X_{n-1} + \dotsm + \phi_p X_{n+1-p} + \mathbb{E} (Z_{n+1}) = \phi_1 X_n + \phi_2 X_{n-1} + \dotsm + \phi_p X_{n+1-p} \)
      \(MSE = \mathbb{E}[(X_{n+1} - P_n X_{n+1})^2] = \mathbb{E}(Z_{n+1}^2) = Var(Z_{n+1}) = \sigma^2\)
      If \(n < p\), use "general" result.
    • \(h\)-step prediction of AR(1) with nonzero mean \(\mu\):
      \( X_t - \mu = \phi (X_{t-1} - \mu) + Z_t \)
      Goal: Find \(P_n X_{n+h}\), \(h = 1, 2, \dotsm\).
      \( X_{n+h} - \mu = \phi (X_{n+h-1} - \mu) + Z_{n+h} \)
      \( P_n(X_{n+h} - \mu) = P_n(\phi (X_{n+h-1} - \mu) + Z_{n+h}) \)
      \( P_nX_{n+h} - \mu = \phi (P_n X_{n+h-1} - \mu) + P_n Z_{n+h} = \dotsm \)
      (see photo - 15:26 Oct 1)
Prediction of Second-Order Random Variables
(self-study: Example 2.5.1 and Example 2.5.2)
Recursive Prediction Algorithms
  • To determine \(P_n X_{n+h}\), the direct approach requires solving a system of \(n\) linear equations.
  • For large \(n\) this may be difficult and time-consuming.
  • It would be helpful if \(P_n X_{n+1}\) could be simplify the calculation of \(P_{n+1} X_{n+2}\).
  • Prediction algorithms that utilize this idea are said to be recursive.
  • We'll introduce two recursive algorithms, the Durbin-Levinson algorithm and the Innovations algorithm, for determining the one-step predictors \(P_n X_{n+1}\).
  • The algorithms can be extended to compute the \(h\)-step predictors \(P_n X_{n+h}\), \(h \ge 1\).
Durbin-Levinson Algorithm
  • Without loss of generality, we consider a stationary process \(\{X_t\}\) with mean 0 and ACVF \(\gamma(\cdot)\).
  • Write
    \( \begin{cases} P_n X_{n+1} = \phi_{n1}X_n + \dotsm + \phi_{nn}X_1\\ v_n = \mathbb{E}(X_{n+1} - P_n X_{n+1})^2 \end{cases} \)
  • The algorithm recursively computes \(\phi_{n1}, \dotsm, \phi_{nn}\) and \(v_n\) from \(\phi_{n-1,1}, \dotsm, \phi_{n-1,n-1}\) and \(v_{n-1}\).
  • We start with \(v_0 = \gamma(0)\).
  • For \(n = 1, 2, \dotsm, \phi_{n1}, \dotsm, \phi_{nn}\) and \(v_n\) satisfy
    \( \phi_{nn} = \frac{1}{v_{n-1}}[\gamma(n) - \sum_{j=1}^{n-1} \phi_{n-1, j} \gamma(n-j)] \) and \( \begin{bmatrix} \phi_{n1}\\ \vdots\\ \phi_{n, n-1} \end{bmatrix} = \begin{bmatrix} \phi_{n-1,1}\\ \vdots\\ \phi_{n-1, n-1} \end{bmatrix} - \phi_{nn} \begin{bmatrix} \phi_{n-1,n-1}\\ \vdots\\ \phi_{n-1, 1} \end{bmatrix} \), \( v_n = v_{n-1}(1 - \phi_{nn}^2) \).
  • Example:
    Prediction of an AR(1) process:
    \( X_t = \phi X_{t-1} + Z_t \), \(\{Z_t\} \sim WN(0, \sigma^2)\)
    (see photo - 15:51 Oct 1)
Innovations Algorithm
  • Innovations algorithm is applicable even if the process is nonstationary.
  • Suppose \(\{X_t\}\) is a process with mean zero and ACVF \(\kappa(i, j) = \mathbb{E}(X_i X_j)\) such that the matrix \( [\kappa(i,j)]_{i,j=1}^n \) is nonsingular for each \(n = 1,2,\dotsm\).
  • Write \(\hat{X}_1 = 0\) and \(\hat{X}_{n+1} = P_n X_{n+1}\), \(n = 1, 2, \dotsm\).
  • The innovations (one-step prediction errors), \(X_1 - \hat{X}_1, \dotsm, X_n - \hat{X}_n\), are orthogonal (problem 2.20) in the sense that \(\mathbb{E}(X_i - \hat{X}_i)(X_j - \hat{X}_j) = 0\) for \(i \neq j\).
  • Write
\[ \begin{cases} \hat{X}_{n+1} = \sum_{i=1}^n \theta_{ni}(X_{n-i+1} - \hat{X}_{n-i+1})\\ v_n = \mathbb{E}(X_{n+1} - \hat{X}_{n+1})^2 \end{cases} \]
  • The algorithm recursively computes \(\theta_{n1}, \dotsm, \theta_{nn}\) and \(v_n\) from \(\theta_{n-1,1}, \dotsm, \theta_{n-1,n-1}\) and \(v_{n-1}\).
Prediction of Stationary Time Series in Terms of Infinitely Many Past Values
  • Let \(\tilde{P}_n X_{n+h}\) denote the BLP of \(X_{n+h}\) in terms of 1 and \(\{X_s, -\infty < s \leq n\}\).
  • We refer to \(\tilde{P}_n\) as the prediction operator based on the infinite past, \(\{X_s, -\infty < s \leq n\}\).
  • When \(n\) is large, we may approximate \(P_n X_{n+h}\) by \(\tilde{P}_n X_{n+h}\) to simplify calculation of \(P_n X_{n+h}\) for MA and ARMA series.
Computation of \(\tilde{P}_n X_{n+h}\)
  • Suppose \(\{X_t\}\) is a zero-mean stationary time series with ACVF \(\gamma(\cdot)\)
  • We write \(\tilde{P}_n X_{n+h} = \sum_{j=1}^\infty a_j X_{n+1-j}\)
  • Then, the problem is equivalent to finding \(a_1, a_2, \dotsm\) to minimize
    \( \mathbb{E}(X_{n+h} - \sum_{j=1}^\infty a_j X_{n+1-j})^2 \)
  • However, it involves an infinite set of linear equations.
  • To get around it, the properties of \(\tilde{P}_n\) can be used for the calculation of \(\tilde{P}_n X_{n+h}\), especially when \(\{X_t\}\) is an MA or ARMA process.
Properties of \(\tilde{P}_n\)
Let \(U\) and \(V\) be random variables with finite variance and let \(a\), \(b\), and \(c\) be constants. Then,
  • \( \mathbb{E}(U - \tilde{P}_n U) = 0 \)
  • \( \mathbb{E}[(U - \tilde{P}_n U)X_j] = 0 \), \(j \leq n\)
  • \( \tilde{P}_n(aU + bV + c) = a \tilde{P}_n U + b \tilde{P}_n V + c\)
  • \( \tilde{P}_n U = U \) if \(U\) is a linear combination of \(X_j, j \leq n\)
  • \( \tilde{P}_n U = \mathbb{E}(U) \) if \(Cov(U, X_j) = 0\) for all \(j \leq n\)
Example
One-step prediction of MA(1): \(X_t = Z_t + \theta Z_{t-1}\), \(\{Z_t\} \sim WN(0, \sigma^2)\)
(see photo - 15:13 Oct 3)