Wangsheng Wu

Notes

Chapter 1

0. Brief Review on Moments
1. Examples of Time Series
2. Objectives of Time Series Analysis
3. Simple Time Series Models
4. Stationary Models and ACF
5. Trend and Seasonality Removal
6. Testing the Estimated Noise Sequence

Introduction

Last Update: September 17, 2024

(All text updated. Some graphs are to be inserted.)

0. Brief Review on Moments

Expectation: \( \mu_X = \mathbb{E}(X) \)
Variance: \( \sigma_X^2 = Var(X) = \mathbb{E}[(X - \mu_X)^2] \)
Covariance and correlation:
- \( \gamma(X, Y) = Cov(X, Y) = \mathbb{E}[(X - \mu_X)(Y - \mu_Y)] \)
- \( \rho(X, Y) = Cor(X, Y) = \frac{Cov(X, Y)}{\sigma_X \sigma_Y} \) if \( 0 \lt \sigma_X, \sigma_Y \lt \infty \)

Properties of Expectation

\( \mathbb{E}(aX + b) = a\mathbb{E}(X) + b \)
If \( P(X \ge a) = 1 \), then \( \mathbb{E}(X) \ge a \); if \( P(X \le b) = 1 \), then \( \mathbb{E}(X) \le b \)
\( \mathbb{E}(X_1 + \dots + X_n) = \mathbb{E}(X_1) + \dots + \mathbb{E}(X_n) \)
If \(X\) and \(Y\) are independent, then \(\mathbb{E}(XY) = \mathbb{E}(X) \mathbb{E}(Y) \)

Properties of Variance

\( Var(X) = \mathbb{E}(X^2) - [\mathbb{E}(X)]^2 \)
\( Var(aX + b) = a^2 Var(X) \)
\( Var(X) = 0 \) if and only if there exists a constant \(c\) such that \(P(X = c) = 1 \)
If \(X\) and \(Y\) are independent, then \(Var(X + Y) = Var(X) + Var(Y) \)
Note that independence of \(X\) and \(Y\) does NOT imply \( Var(XY) = Var(X)Var(Y) \) unless \(\mathbb{E}(X) = \mathbb{E}(Y) = 0 \)

Properties of Covariance and Correlation

\( Cov(X, Y) \le \sigma_X \sigma_Y \) and hence \( \mid \rho(X, Y) \mid \le 1 \)
\( Cov(X, Y) = \mathbb{E}(XY) - \mathbb{E}(X) \mathbb{E}(Y) \)
If \( X \) and \( Y \) are independent, then \( Cov(X, Y) = 0 \)
The Converse if NOT true (for example, \( X \sim \mathcal{N}(0,1) \) & \( Y = X^2 \))
\( Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y) \)
\( Cov(aX + bY, cW + dV) = ac Cov(X,W) + ad Cov(X,V) + bc Cov(Y,W) + bd Cov(Y,V) \)

Moments

\( \mathbb{E}(X^k) \) is called the \(k^{th}\) moment of \(X\).
So, expectation is the \(1^{st}\) moment.
\( \mathbb{E}[(X - \mu_X)^k] \) is called the \(k^{th}\) central moment of \(X\).
So, variance is the \(2^{nd}\) central moment.
If \(X \sim \mathcal{N}(0,1) \), then \( \mathbb{E}(X^{2k-1}) = 0 \) and \( \mathbb{E}(X^{2k}) = (2k-1)(2k-3) \dotsm 1 \) for \(k = 1, 2, \dotsm \).

Mixed Moments

Let \(X_1, \dotsm, X_m\) be \(m\) random variables.
For any integers \(k_i \ge 0, i = 1, \dotsm, m\) let \(k = \sum_{i=1}^m k_i\).
Then,
- \( \mathbb{E}(X_1^{k_1} \dots X_m^{k_m}) \) is called the mixed moment of order \(k\).
- \( \mathbb{E}[(X_1 - \mu_{X_1})^{k_1} \dots (X_m - \mu_{X_m})^{k_m}] \) is called the central mixed moment of order \(k\).
  So, covariance is the central mixed moment of order \(k\).

Conditional Expectation & Variance

Conditional Expectation: \( \mu_{Y \mid X} = \mathbb{E}(Y \mid X) \)
Conditional Variance: \( \sigma_{Y \mid X}^2 = Var(Y \mid X) \overset{def}= \mathbb{E}[(Y - \mathbb{E}(Y \mid X))^2 \mid X] \)
Properties:
- \( \mathbb{E}(Y) = \mathbb{E}[\mathbb{E}(Y \mid X)] \) (iterated expectation)
- \(Var(Y) = \mathbb{E}[Var(Y \mid X)] + Var[\mathbb{E}(Y \mid X)] \)

Example #1 (p10)

Suppose \(X \sim Unif[0,1] \), and \(Y \mid X \sim Unif[0,X] \).
Find:

\(\mathbb{E}(Y \mid X) \)
\(\mathbb{E}(Y) \)
\( Var(Y \mid X) \)
\( Var(Y) \)

Example #2 (p11)

Suppose \(X\) and \(Y\) are 2 random variables with \(\mathbb{E}(Y) = \mu \) and \( \mathbb{E}(Y) \lt \infty \).

Show that \(c = \mu\) minimizes \(\mathbb{E}(Y - c)^2\).
Show that \(f(X) = \mathbb{E}(Y \mid X) \) minimizes \(\mathbb{E}[(Y-f(X))^2 \mid X]\).
Show that \(f(X) = \mathbb{E}(Y \mid X) \) also minimizes \(\mathbb{E}[(Y-f(X))^2]\).

1. Examples of Time Series

A time series is a set of observations \(x_t\), each being recorded at a specifit time \(t\).

\(x_t\) could be discrete or continuous for a given \(t\)
\(x_t\) could be univariate or multivariate for a given \(t\)
\(t\) could be discrete (discrete-time time series) or continuous (continuous-time time series)
recording intervals could be regular or irregular
\(t\) could be univariate or multivariate

Time Series Plots

We examine a time series plot for:

trend over time
seasonal/cyclical/periodic component
changing variability over time
dependence
structural breaks
missing data, outlying observations, etc.

Example 1

Australian red wine sales; wine.txt

Example 2

Monthly accidental deaths, USA, 1973 - 1978; deaths.txt

Example 3

Dow-Jones Index (closing prices on 251 consecutive trading days, 9/10/93 - 8/26/94); dowj2.csv

Example 4

Populalaon of the USA, 1790-1990; uspop.txt

Nature of Time Series Data

Data collected over time are usually dependent.
- It requires more complicated modeling techniques than those used for analyzing independent data.
- On the other hand, the dependence can be exploited in predicting future values.
Ignoring dependence leads to improper inferences, poor prediction, ...

2. Objectives of Time Series Analysis

Modeling Paradigm (Box-Jenkins Framework)

Model Specification:
set up a family of probability models to best represent data
Parameter Estimation:
estimate parameters of the chosen model
Model Diagnostics:
check the fitted model for the goodness of fit

Applications of Models

The resulting model

provides a compact description of given data, and can be used to interpret features therein
can be used for inference, e.g. confidence intervals and hypothesis tests
can be used for forecasting

3. Simple Time Series Models

A time series model for the objective data \(\{x_t\}\) is a specification of the joint distributions (or possibly only the means and covariances) of a sequence of random variables \(\{X_t\}\) , of which \(\{x_t\}\) is postulated to be a realization.

Complete Probabilistic Model vs. \(2^{nd}\)-order Property Specification

Complete probabilistic model
- specifies all joint distributions of \( (X_1, \dotsm, X_n)' \), \( n = 1, 2, \dots \)
- is rarely used because far too many parameters to be estimated
\(2^{nd}\)-order preperty specification
- studies means, \( \mu_t = \mathbb{E}(X_t) \) and \(2^{nd}\)-order moments, \(\mathbb{E}(X_{t+h}X_t), t = 0, 1, \dots \)
Much of distributional information can be described by the first two moments.
For multivariate normal, the two ways of modeling are equivalent.

Some Zero-Mean Models

\(iid\) noise (\(iid\) random variables with zero mean)
White Noise
Random Walk
Moving Average (MA)
Autoregression (AR)

Models with Trend

\(X_t = m_t + Y_t\), where \(m_t\) is a slowly varying function called the trend function
- For example, \(m_t = a_0 + a_1 t + a_2 t^2\)
- Estimation of \(a_i\)'s can be carried out using the least squares method,
  i.e., by minimizing \( \sum_{t=1}^n (x_t - m_t)^2 \)
Example: Population of the USA, 1790-1990; uspop.txt

Models with Seasonality

\(X_t = s_t + Y_t\), where \(s_t\) is a periodic function of \(t\) with period \(d\) (i.e., \( s_{t+d} = s_t \))
- Convenient choice:
  \( s_t = a_0 + \sum_{j=1}^k [a_j \cos(\lambda_j t) + b_j \sin(\lambda_j t)] \)
  where \(a_j\), \(b_j\) are unknown parameters, and \(\lambda_j\) are fixed frequencies, multiple of \(2\pi / d\).
- For monthly data, \(d=12\) and hence typically a \(\lambda = 2 \pi / 12 \).
Example: Monthly accidental deaths, USA, 1973-1978; deaths.txt

General Steps in Modeling

Plot time series and check for, say, changing variability and trend/seasonal component
Remove changing variability, trend, and seasonality, if any, to get a stationary residual series
Choose a model to fit the stationary series (by Box-Jenkins methodology)
- The model should capture dependence structure of the series.
Conduct inference and forecast

4. Stationary models and ACF

We consider modeling serial dependence of a time series in the stationary case.
Two versions of stationarity
- strict stationarity: joint probability distributions do not change with time
- weak stationarity: first- and second-order moment properties do not change with time

Strict Stationarity

\( \{X_t\} \) is strictly stationary if, for any positive integer \(k\) and integers \(t_1, \dotsm, t_k\), and \(h\),
\((X_{t_1}, X_{t_2}, \dotsm, X_{t_k})' =_d (X_{t_1+h}, X_{t_2+h}, \dotsm, X_{t_k+h})' \)
where \(=_d\) denotes equal in distribution
Using \(k=1\),
- \(X_1 =_d X_2 =_d \dotsm =_d X_d \) (identically distributed)
- means are all identical if they exist (rules out trend and seasonality)
- variances are all identical if they exist (rules out the changing variability)
Using \(k=2\),
- for all \(t\) and \(h\), \( (X_t, X_{t+1})' =_d (X_{t+h}, X_{t+1+h})'\)
  Hence, \(Cov(X_t, X_{t+1}) = Cov(X_{t+h}, X_{t+1+h}) \) if variances exist.
- for all \(t\), \(h\), and \(l\), \( (X_t, X_{t+1})' =_d (X_{t+h}, X_{t+l+h})'\)
  Hence, \(Cov(X_t, X_{t+l}) = Cov(X_{t+h}, X_{t+l+h}) \) if variances exist.
Using \(k \ge 3\) gets increasingly complicated
So, strict stationarity is a very strong modeling assumption

Properties of Strictly Stationary Time Series

The \(X_t\)'s are identically distributed
\( (X_t, X_{t+h})' \) and \( (X_1, X_{1+h})' \) are identically distributed for all integers \(t\) and \(h\)
\(iid\) sequences are strictly stationary

Weak Stationarity

Let \(\{X_t\}\) be a time series with \(Var(X_t) \lt \infty\)
Let \( \mu_X(t) = \mathbb{E}(X_t) \) denote mean function, and \(\gamma_X(t+h, t) = Cov(X_{t+h}, X_t) \) denote covariance function
\(\{X_t\}\) is weakly stationary if
- \( \mu_X(t) \) is independent of \(t\)
- \(\gamma_X(t+h, t)\) is independent of \(t\) for each \(h\)
To show weak stationarity, we verify that
- \(Var(X_t) \lt \infty \)
- \(\mathbb{E}(X_t)\) does not depend on \(t\)
- \(Cov(X_{t+h}, X_t)\) may depend on \(h\) but not \(t\)

Examples

\( X_t = Z_0 \cos{t}, \{Z_t\} \sim WN(0, \sigma^2) \)
1. \( \mu_X = 0\) (independent of \(t\));
2. \( \gamma_X = ... = \cos^2{t} \sigma^2 \) (depends on \(t\)).
Hence, it is not weakly stationary.
\( X_t = Z_t + 0.5Z_{t-1}, \{Z_t\} \sim WN(0, \sigma^2) \)
1. \( \mu_X = \mathbb{E}(Z_t) + 0.5\mathbb{E}(Z_{t-1}) = 0 \) (independent of \(t\));
2. \( \gamma_X = Cov(Z_t + 0.5Z_{t-1}, Z_{t+h} + 0.5Z_{t+h-1}) = Cov(Z_t, Z_{t+h}) + 0.5Cov(Z_t, Z_{t+h-1}) + 0.5Cov(Z_{t-1}, Z_{t+h}) + 0.5^2Cov(Z_{t-1}, Z_{t+h-1}) = \sigma^2(1.25 \mathbb{I}_{h=0} + 0.5 \mathbb{I}_{h= \pm 1}) \)
  (independent of \(t\));
3. \( Var(X_t) \lt \infty \).
Hence, \( \{X_t\} \) is weakly stationary.

Remarks

White noise sequences are weakly stationary
If \( \{X_t\} \) is strictly stationary and \( \mathbb{E}(X_t^2) \lt \infty \), then \( \{X_t\} \) is weakly stationary (Problem 1.3, HW1)
Weak stationarity does not imply strict stationarity, except for the Gaussian case
From now on, we mean “weak stationarity" when we say stationarity, unless otherwise stated

ACVF and ACF

Let \(\{X_t\}\) be a stationary time series
Its autocovariance function (ACVF) is defined as
\( \gamma_X(h) = Cov(X_{t+h}, X_t) \)
Its autocorrelation function (ACF) is defined as
\( \rho_X(h) = Cor(X_{t+h}, X_t) = \frac{\gamma_X(h)}{\gamma_X(0)} \)

Examples

\(iid\) noise with finite variance:
\( \{X_t\} \sim iid (0, \sigma^2) \)
\( \gamma = \sigma^2 \mathbb{I}(h = 0) \)
White noise:
\( \{Z_t\} \sim WN(0, \sigma^2) \)
\( \gamma = \sigma^2 \mathbb{I}(h = 0) \)

Note:

\(iid\) noise with finite variance \(\Rightarrow\) White Noise
\(iid\) noise does NOT imply WN; WN does NOT imply \(iid\) noise.

Random Walk
MA(1) (moving average of order 1) process:
\(X_t = Z_t + \theta Z_{t-1}, \{Z_t\} \sim WN (0, \sigma^2) \)
AR(1) (autoregressive of order 1) process:
stationary process \(\{X_t\}\) that satisfies the equations:
\(X_t = \phi X_{t-1} + Z_t, t = 0, \pm 1, \dotsm \)
where \(\{Z_t\} \sim WN (0, \sigma^2), \mid \phi \mid \lt 1 \) & \(Cov(Z_t, X_s) = 0 \) for each \(s \lt t\).

Sample Mean, ACVF, and ACF

Suppose \( x_1, ..., x_n \) are observed data
Sample mean: \( \bar{x} = \frac{1}{n} \sum_{t=1}^n x_t \)
Sample ACVF: \( \hat{\gamma}(h) = \frac{1}{n} \sum_{t-1}^{n - \mid h \mid} (x_{t + \mid h \mid} - \bar{x})(x_t - \bar{x}) \)
Sample ACF: \( \hat{\rho}(h) = \frac{\hat{\gamma}(h)}{\hat{\gamma}(0)} \)

Properties of Sample ACVF & ACF

Covariance matrix \( \hat{\Gamma}_n = [\hat{\gamma}(i - j)]_{i, j = 1}^n \) is nonnegative definite (n.n.d.) for all \(n \ge 1 \).
See \(\S\)2.4.2 for proof.
If data are observations from \(iid\) noise with finite \(4^{th}\) moment, then for \(n\) large enough, \(\hat{\rho}(h) \sim approx. \mathcal{N}(0, \frac{1}{n}) \) for \(h \ge 1\) and \(\hat{\rho}(h)\)'s are independent.

Examples

Sample ACF of simulated 200 observations from \(iid\) \(\mathcal{N}(0,1) \)
Sample ACF of wine.txt

5. Trend and seasonality removal

Method 1: estimation (and extraction) of trend and seasonality
- regression/smoothing
Method 2: differencing

Classical Decomposition Model

\[ X_t = m_t + s_t + Y_t \] where

\(m_t\) is trend component
\(s_t\) is seasonal component with period \(d\)
- \( s_{t+d} = s_t \) and \( \sum_{t=1}^d s_t = 0 \)
\(Y_t\) is random noise component with \( \mathbb{E}(Y_t) = 0\)

Preliminary transformation might be needed first to stabilize variability
Example:
Australian red wine sales; wine.txt

Example

Decomposition of monthly accidental deaths, USA, 1973-1978; deaths.txt

Regression/Smoothing

Polynomial and periodic regression smoothers
Moving average smoother
Kernel smoother

Regression Smoothers

Example:
Weekly mortality in Los Angeles County; smort (Shumway & Stoffer)
\(X_t = m_t + s_t + Y_t\), where
- \( m_t = a_0 + a_1 t + a_2 t^2 + a_3 t^3 \)
- \( s_t = b_1 \cos (2 \pi t / 52) + b_2 \sin (2 \pi t / 52) \)

Moving Average Smoother

Consider a model with trend only:
\(X_t = m_t + Y_t, \mathbb{E}(Y_t) = 0 \)
Smoothing by finite moving average: for \(q \ge 0\),
\( \hat{m}_t = \frac{1}{2 q + 1} \sum_{\mid j \mid \le q} X_{t-j} = \frac{1}{2 q + 1} \sum_{\mid j \mid \le q} m_{t-j} + \frac{1}{2 q + 1} \sum_{\mid j \mid \le q} Y_{t-j} \approx m_t + 0 \)
if \(m_t\) is approximately linear over \( [t-q, t+q] \).
Example:
Weekly mortality in Los Angeles County; cmort (Shumway & Stoffer)

Kernel Smoother

\(\hat{m}_t = \sum_{i=1}^n w_i (t) x_i \)
where \(w_i(t) = \frac{K((t-i)/b)}{\sum_{j=1}^n K((t-j)/b)} \) are weights with \(K\) being a kernel function and \(b\) being the bandwidth.
Example:
Weekly mortality in Los Angeles County; cmort (Shumway & Stoffer)

Differencing

Backward shift operator:
Difference operator \( \nabla \overset{def}= 1 - B \):
Seasonal difference operator \( \nabla_d \overset{def}= 1 - B^d \):
- \( \nabla_d S_t = (1 - B^d)S_t = S_t - S_{t-d} = 0 \)
- \( \nabla_d (m_t + s_t) = \nabla_d m_t + \nabla_d s_t = [(a_0 + a_1 t) - (a_0 + a_1 t - d a_1)] + 0 = d a_1 \)

Example

Differencing of monthly accidental deaths, USA, 1973 - 1978; deaths.txt

6. Testing the Estimated Noise Sequence

Suppose \(y_1, ..., y_n\) are observations from a sequence of random variables \(Y_1, ..., Y_n\)

Test of Randomness

\( H_0: Y_1, ..., Y_n \) are \(iid\)
Graphical method
- If \(H_0\) holds, then for large \( n, \hat{\rho}(1), ..., \hat{\rho}(h) \) are approximately \(iid\) \(\mathcal{N}(0, \frac{1}{n})\)
- So, we check the sample ACF plot of \(y_1, ..., y_n\) to see if \( \mid \hat{\rho}(h) \mid \le \frac{1.96}{\sqrt{n}} \) for \(h \le 40 \)
- Reject \(H_0\) if more than 2 or 3 \( \hat{\rho}(\cdot) \) (5%) fall outside bounds

Portmanteau tests

Box-Pierce test:
- Test statistic: \( Q = n \sum_{h=1}^k \hat{\rho}^2(h) \)
  Usually, \(k=20\) is selected.
- Under \(H_0\), \(Q \sim \chi_K^2\) approximately
- Reject \(H_0\) if \(Q \gt \chi_{K, 1-\alpha}^2\) at level \(\alpha\).
- R command: Box.test
Two refinements:
- Ljung-Box test, which is more accurate for small \(n\);
  R command: Box.test
- Mcleod-Li test, which further tests for \(iid\) of \(\{Y_t^2\}\);
  R command: Mcleod.Li.test (in R package 'TSA')

Other tests:
Turning Point Test, Difference-Sign Test, Rank Test, etc. (see textbook)

Test of Normality

\( H_0: Y_1, \dotsm, Y_n \) are normal
Graphical method: normal probability plot
Shapiro-Wilk test;
R command: shapiro.test
Jarque-Bera test;
R command: jarque.bera.test (in R package 'tseries')

Examples

Simulated 200 observations from \(iid\) \( \mathcal{N}(0, 1) \)
Level of Lake Huron, 1875-1972; lake.txt