Wangsheng's World
Introduction
Last Update: September 17, 2024
(All text updated. Some graphs are to be inserted.)
0. Brief Review on Moments
  • Expectation: \( \mu_X = \mathbb{E}(X) \)
  • Variance: \( \sigma_X^2 = Var(X) = \mathbb{E}[(X - \mu_X)^2] \)
  • Covariance and correlation:
    • \( \gamma(X, Y) = Cov(X, Y) = \mathbb{E}[(X - \mu_X)(Y - \mu_Y)] \)
    • \( \rho(X, Y) = Cor(X, Y) = \frac{Cov(X, Y)}{\sigma_X \sigma_Y} \) if \( 0 \lt \sigma_X, \sigma_Y \lt \infty \)
Properties of Expectation
  • \( \mathbb{E}(aX + b) = a\mathbb{E}(X) + b \)
  • If \( P(X \ge a) = 1 \), then \( \mathbb{E}(X) \ge a \); if \( P(X \le b) = 1 \), then \( \mathbb{E}(X) \le b \)
  • \( \mathbb{E}(X_1 + \dots + X_n) = \mathbb{E}(X_1) + \dots + \mathbb{E}(X_n) \)
  • If \(X\) and \(Y\) are independent, then \(\mathbb{E}(XY) = \mathbb{E}(X) \mathbb{E}(Y) \)
Properties of Variance
  • \( Var(X) = \mathbb{E}(X^2) - [\mathbb{E}(X)]^2 \)
  • \( Var(aX + b) = a^2 Var(X) \)
  • \( Var(X) = 0 \) if and only if there exists a constant \(c\) such that \(P(X = c) = 1 \)
  • If \(X\) and \(Y\) are independent, then \(Var(X + Y) = Var(X) + Var(Y) \)
    Note that independence of \(X\) and \(Y\) does NOT imply \( Var(XY) = Var(X)Var(Y) \) unless \(\mathbb{E}(X) = \mathbb{E}(Y) = 0 \)
Properties of Covariance and Correlation
  • \( Cov(X, Y) \le \sigma_X \sigma_Y \) and hence \( \mid \rho(X, Y) \mid \le 1 \)
  • \( Cov(X, Y) = \mathbb{E}(XY) - \mathbb{E}(X) \mathbb{E}(Y) \)
  • If \( X \) and \( Y \) are independent, then \( Cov(X, Y) = 0 \)
    The Converse if NOT true (for example, \( X \sim \mathcal{N}(0,1) \) & \( Y = X^2 \))
  • \( Var(X + Y) = Var(X) + Var(Y) + 2Cov(X, Y) \)
  • \( Cov(aX + bY, cW + dV) = ac Cov(X,W) + ad Cov(X,V) + bc Cov(Y,W) + bd Cov(Y,V) \)
Moments
  • \( \mathbb{E}(X^k) \) is called the \(k^{th}\) moment of \(X\).
    So, expectation is the \(1^{st}\) moment.
  • \( \mathbb{E}[(X - \mu_X)^k] \) is called the \(k^{th}\) central moment of \(X\).
    So, variance is the \(2^{nd}\) central moment.
  • If \(X \sim \mathcal{N}(0,1) \), then \( \mathbb{E}(X^{2k-1}) = 0 \) and \( \mathbb{E}(X^{2k}) = (2k-1)(2k-3) \dotsm 1 \) for \(k = 1, 2, \dotsm \).
Mixed Moments
  • Let \(X_1, \dotsm, X_m\) be \(m\) random variables.
  • For any integers \(k_i \ge 0, i = 1, \dotsm, m\) let \(k = \sum_{i=1}^m k_i\).
  • Then,
    • \( \mathbb{E}(X_1^{k_1} \dots X_m^{k_m}) \) is called the mixed moment of order \(k\).
    • \( \mathbb{E}[(X_1 - \mu_{X_1})^{k_1} \dots (X_m - \mu_{X_m})^{k_m}] \) is called the central mixed moment of order \(k\).
      So, covariance is the central mixed moment of order \(k\).
Conditional Expectation & Variance
  • Conditional Expectation: \( \mu_{Y \mid X} = \mathbb{E}(Y \mid X) \)
  • Conditional Variance: \( \sigma_{Y \mid X}^2 = Var(Y \mid X) \overset{def}= \mathbb{E}[(Y - \mathbb{E}(Y \mid X))^2 \mid X] \)
  • Properties:
    • \( \mathbb{E}(Y) = \mathbb{E}[\mathbb{E}(Y \mid X)] \) (iterated expectation)
    • \(Var(Y) = \mathbb{E}[Var(Y \mid X)] + Var[\mathbb{E}(Y \mid X)] \)
Example #1 (p10)
Suppose \(X \sim Unif[0,1] \), and \(Y \mid X \sim Unif[0,X] \).
Find:
  • \(\mathbb{E}(Y \mid X) \)
  • \(\mathbb{E}(Y) \)
  • \( Var(Y \mid X) \)
  • \( Var(Y) \)
Example #2 (p11)
Suppose \(X\) and \(Y\) are 2 random variables with \(\mathbb{E}(Y) = \mu \) and \( \mathbb{E}(Y) \lt \infty \).
  • Show that \(c = \mu\) minimizes \(\mathbb{E}(Y - c)^2\).
  • Show that \(f(X) = \mathbb{E}(Y \mid X) \) minimizes \(\mathbb{E}[(Y-f(X))^2 \mid X]\).
  • Show that \(f(X) = \mathbb{E}(Y \mid X) \) also minimizes \(\mathbb{E}[(Y-f(X))^2]\).
1. Examples of Time Series
A time series is a set of observations \(x_t\), each being recorded at a specifit time \(t\).
  • \(x_t\) could be discrete or continuous for a given \(t\)
  • \(x_t\) could be univariate or multivariate for a given \(t\)
  • \(t\) could be discrete (discrete-time time series) or continuous (continuous-time time series)
    recording intervals could be regular or irregular
  • \(t\) could be univariate or multivariate
Time Series Plots
We examine a time series plot for:
  • trend over time
  • seasonal/cyclical/periodic component
  • changing variability over time
  • dependence
  • structural breaks
  • missing data, outlying observations, etc.
Example 1
Australian red wine sales; wine.txt
Example 2
Monthly accidental deaths, USA, 1973 - 1978; deaths.txt
Example 3
Dow-Jones Index (closing prices on 251 consecutive trading days, 9/10/93 - 8/26/94); dowj2.csv
Example 4
Populalaon of the USA, 1790-1990; uspop.txt
Nature of Time Series Data
  • Data collected over time are usually dependent.
    • It requires more complicated modeling techniques than those used for analyzing independent data.
    • On the other hand, the dependence can be exploited in predicting future values.
  • Ignoring dependence leads to improper inferences, poor prediction, ...
2. Objectives of Time Series Analysis
Modeling Paradigm (Box-Jenkins Framework)
  • Model Specification:
    set up a family of probability models to best represent data
  • Parameter Estimation:
    estimate parameters of the chosen model
  • Model Diagnostics:
    check the fitted model for the goodness of fit
Applications of Models
The resulting model
  • provides a compact description of given data, and can be used to interpret features therein
  • can be used for inference, e.g. confidence intervals and hypothesis tests
  • can be used for forecasting
3. Simple Time Series Models
A time series model for the objective data \(\{x_t\}\) is a specification of the joint distributions (or possibly only the means and covariances) of a sequence of random variables \(\{X_t\}\) , of which \(\{x_t\}\) is postulated to be a realization.
Complete Probabilistic Model vs. \(2^{nd}\)-order Property Specification
  • Complete probabilistic model
    • specifies all joint distributions of \( (X_1, \dotsm, X_n)' \), \( n = 1, 2, \dots \)
    • is rarely used because far too many parameters to be estimated
  • \(2^{nd}\)-order preperty specification
    • studies means, \( \mu_t = \mathbb{E}(X_t) \) and \(2^{nd}\)-order moments, \(\mathbb{E}(X_{t+h}X_t), t = 0, 1, \dots \)
  • Much of distributional information can be described by the first two moments.
  • For multivariate normal, the two ways of modeling are equivalent.
Some Zero-Mean Models
  • \(iid\) noise (\(iid\) random variables with zero mean)
  • White Noise
  • Random Walk
  • Moving Average (MA)
  • Autoregression (AR)
Models with Trend
  • \(X_t = m_t + Y_t\), where \(m_t\) is a slowly varying function called the trend function
    • For example, \(m_t = a_0 + a_1 t + a_2 t^2\)
    • Estimation of \(a_i\)'s can be carried out using the least squares method,
      i.e., by minimizing \( \sum_{t=1}^n (x_t - m_t)^2 \)
  • Example: Population of the USA, 1790-1990; uspop.txt
Models with Seasonality
  • \(X_t = s_t + Y_t\), where \(s_t\) is a periodic function of \(t\) with period \(d\) (i.e., \( s_{t+d} = s_t \))
    • Convenient choice:
      \( s_t = a_0 + \sum_{j=1}^k [a_j \cos(\lambda_j t) + b_j \sin(\lambda_j t)] \)
      where \(a_j\), \(b_j\) are unknown parameters, and \(\lambda_j\) are fixed frequencies, multiple of \(2\pi / d\).
    • For monthly data, \(d=12\) and hence typically a \(\lambda = 2 \pi / 12 \).
  • Example: Monthly accidental deaths, USA, 1973-1978; deaths.txt
General Steps in Modeling
  • Plot time series and check for, say, changing variability and trend/seasonal component
  • Remove changing variability, trend, and seasonality, if any, to get a stationary residual series
  • Choose a model to fit the stationary series (by Box-Jenkins methodology)
    • The model should capture dependence structure of the series.
  • Conduct inference and forecast
4. Stationary models and ACF
  • We consider modeling serial dependence of a time series in the stationary case.
  • Two versions of stationarity
    • strict stationarity: joint probability distributions do not change with time
    • weak stationarity: first- and second-order moment properties do not change with time
Strict Stationarity
  • \( \{X_t\} \) is strictly stationary if, for any positive integer \(k\) and integers \(t_1, \dotsm, t_k\), and \(h\),
    \((X_{t_1}, X_{t_2}, \dotsm, X_{t_k})' =_d (X_{t_1+h}, X_{t_2+h}, \dotsm, X_{t_k+h})' \)
    where \(=_d\) denotes equal in distribution
  • Using \(k=1\),
    • \(X_1 =_d X_2 =_d \dotsm =_d X_d \) (identically distributed)
    • means are all identical if they exist (rules out trend and seasonality)
    • variances are all identical if they exist (rules out the changing variability)
  • Using \(k=2\),
    • for all \(t\) and \(h\), \( (X_t, X_{t+1})' =_d (X_{t+h}, X_{t+1+h})'\)
      Hence, \(Cov(X_t, X_{t+1}) = Cov(X_{t+h}, X_{t+1+h}) \) if variances exist.
    • for all \(t\), \(h\), and \(l\), \( (X_t, X_{t+1})' =_d (X_{t+h}, X_{t+l+h})'\)
      Hence, \(Cov(X_t, X_{t+l}) = Cov(X_{t+h}, X_{t+l+h}) \) if variances exist.
  • Using \(k \ge 3\) gets increasingly complicated
  • So, strict stationarity is a very strong modeling assumption
Properties of Strictly Stationary Time Series
  • The \(X_t\)'s are identically distributed
  • \( (X_t, X_{t+h})' \) and \( (X_1, X_{1+h})' \) are identically distributed for all integers \(t\) and \(h\)
  • \(iid\) sequences are strictly stationary
Weak Stationarity
  • Let \(\{X_t\}\) be a time series with \(Var(X_t) \lt \infty\)
  • Let \( \mu_X(t) = \mathbb{E}(X_t) \) denote mean function, and \(\gamma_X(t+h, t) = Cov(X_{t+h}, X_t) \) denote covariance function
  • \(\{X_t\}\) is weakly stationary if
    • \( \mu_X(t) \) is independent of \(t\)
    • \(\gamma_X(t+h, t)\) is independent of \(t\) for each \(h\)
  • To show weak stationarity, we verify that
    • \(Var(X_t) \lt \infty \)
    • \(\mathbb{E}(X_t)\) does not depend on \(t\)
    • \(Cov(X_{t+h}, X_t)\) may depend on \(h\) but not \(t\)
Examples
  • \( X_t = Z_0 \cos{t}, \{Z_t\} \sim WN(0, \sigma^2) \)
    1. \( \mu_X = 0\) (independent of \(t\));
    2. \( \gamma_X = ... = \cos^2{t} \sigma^2 \) (depends on \(t\)).
    Hence, it is not weakly stationary.
  • \( X_t = Z_t + 0.5Z_{t-1}, \{Z_t\} \sim WN(0, \sigma^2) \)
    1. \( \mu_X = \mathbb{E}(Z_t) + 0.5\mathbb{E}(Z_{t-1}) = 0 \) (independent of \(t\));
    2. \( \gamma_X = Cov(Z_t + 0.5Z_{t-1}, Z_{t+h} + 0.5Z_{t+h-1}) = Cov(Z_t, Z_{t+h}) + 0.5Cov(Z_t, Z_{t+h-1}) + 0.5Cov(Z_{t-1}, Z_{t+h}) + 0.5^2Cov(Z_{t-1}, Z_{t+h-1}) = \sigma^2(1.25 \mathbb{I}_{h=0} + 0.5 \mathbb{I}_{h= \pm 1}) \)
      (independent of \(t\));
    3. \( Var(X_t) \lt \infty \).
    Hence, \( \{X_t\} \) is weakly stationary.
Remarks
  • White noise sequences are weakly stationary
  • If \( \{X_t\} \) is strictly stationary and \( \mathbb{E}(X_t^2) \lt \infty \), then \( \{X_t\} \) is weakly stationary (Problem 1.3, HW1)
  • Weak stationarity does not imply strict stationarity, except for the Gaussian case
  • From now on, we mean “weak stationarity" when we say stationarity, unless otherwise stated
ACVF and ACF
  • Let \(\{X_t\}\) be a stationary time series
  • Its autocovariance function (ACVF) is defined as
    \( \gamma_X(h) = Cov(X_{t+h}, X_t) \)
  • Its autocorrelation function (ACF) is defined as
    \( \rho_X(h) = Cor(X_{t+h}, X_t) = \frac{\gamma_X(h)}{\gamma_X(0)} \)
Examples
  • \(iid\) noise with finite variance:
    \( \{X_t\} \sim iid (0, \sigma^2) \)
    \( \gamma = \sigma^2 \mathbb{I}(h = 0) \)
  • White noise:
    \( \{Z_t\} \sim WN(0, \sigma^2) \)
    \( \gamma = \sigma^2 \mathbb{I}(h = 0) \)
Note:
  • \(iid\) noise with finite variance \(\Rightarrow\) White Noise
  • \(iid\) noise does NOT imply WN; WN does NOT imply \(iid\) noise.
  • Random Walk
  • MA(1) (moving average of order 1) process:
    \(X_t = Z_t + \theta Z_{t-1}, \{Z_t\} \sim WN (0, \sigma^2) \)
  • AR(1) (autoregressive of order 1) process:
    stationary process \(\{X_t\}\) that satisfies the equations:
    \(X_t = \phi X_{t-1} + Z_t, t = 0, \pm 1, \dotsm \)
    where \(\{Z_t\} \sim WN (0, \sigma^2), \mid \phi \mid \lt 1 \) & \(Cov(Z_t, X_s) = 0 \) for each \(s \lt t\).
Sample Mean, ACVF, and ACF
  • Suppose \( x_1, ..., x_n \) are observed data
  • Sample mean: \( \bar{x} = \frac{1}{n} \sum_{t=1}^n x_t \)
  • Sample ACVF: \( \hat{\gamma}(h) = \frac{1}{n} \sum_{t-1}^{n - \mid h \mid} (x_{t + \mid h \mid} - \bar{x})(x_t - \bar{x}) \)
  • Sample ACF: \( \hat{\rho}(h) = \frac{\hat{\gamma}(h)}{\hat{\gamma}(0)} \)
Properties of Sample ACVF & ACF
  • Covariance matrix \( \hat{\Gamma}_n = [\hat{\gamma}(i - j)]_{i, j = 1}^n \) is nonnegative definite (n.n.d.) for all \(n \ge 1 \).
    See \(\S\)2.4.2 for proof.
  • If data are observations from \(iid\) noise with finite \(4^{th}\) moment, then for \(n\) large enough, \(\hat{\rho}(h) \sim approx. \mathcal{N}(0, \frac{1}{n}) \) for \(h \ge 1\) and \(\hat{\rho}(h)\)'s are independent.
Examples
  • Sample ACF of simulated 200 observations from \(iid\) \(\mathcal{N}(0,1) \)
  • Sample ACF of wine.txt
5. Trend and seasonality removal
  • Method 1: estimation (and extraction) of trend and seasonality
    • regression/smoothing
  • Method 2: differencing
Classical Decomposition Model
\[ X_t = m_t + s_t + Y_t \] where
  • \(m_t\) is trend component
  • \(s_t\) is seasonal component with period \(d\)
    • \( s_{t+d} = s_t \) and \( \sum_{t=1}^d s_t = 0 \)
  • \(Y_t\) is random noise component with \( \mathbb{E}(Y_t) = 0\)
Preliminary transformation might be needed first to stabilize variability
Example:
Australian red wine sales; wine.txt
Example
Decomposition of monthly accidental deaths, USA, 1973-1978; deaths.txt
Regression/Smoothing
  • Polynomial and periodic regression smoothers
  • Moving average smoother
  • Kernel smoother
Regression Smoothers
  • Example:
    Weekly mortality in Los Angeles County; smort (Shumway & Stoffer)
  • \(X_t = m_t + s_t + Y_t\), where
    • \( m_t = a_0 + a_1 t + a_2 t^2 + a_3 t^3 \)
    • \( s_t = b_1 \cos (2 \pi t / 52) + b_2 \sin (2 \pi t / 52) \)
Moving Average Smoother
  • Consider a model with trend only:
    \(X_t = m_t + Y_t, \mathbb{E}(Y_t) = 0 \)
  • Smoothing by finite moving average: for \(q \ge 0\),
    \( \hat{m}_t = \frac{1}{2 q + 1} \sum_{\mid j \mid \le q} X_{t-j} = \frac{1}{2 q + 1} \sum_{\mid j \mid \le q} m_{t-j} + \frac{1}{2 q + 1} \sum_{\mid j \mid \le q} Y_{t-j} \approx m_t + 0 \)
    if \(m_t\) is approximately linear over \( [t-q, t+q] \).
  • Example:
    Weekly mortality in Los Angeles County; cmort (Shumway & Stoffer)
Kernel Smoother
  • \(\hat{m}_t = \sum_{i=1}^n w_i (t) x_i \)
    where \(w_i(t) = \frac{K((t-i)/b)}{\sum_{j=1}^n K((t-j)/b)} \) are weights with \(K\) being a kernel function and \(b\) being the bandwidth.
  • Example:
    Weekly mortality in Los Angeles County; cmort (Shumway & Stoffer)
Differencing
  • Backward shift operator:
  • Difference operator \( \nabla \overset{def}= 1 - B \):
  • Seasonal difference operator \( \nabla_d \overset{def}= 1 - B^d \):
    • \( \nabla_d S_t = (1 - B^d)S_t = S_t - S_{t-d} = 0 \)
    • \( \nabla_d (m_t + s_t) = \nabla_d m_t + \nabla_d s_t = [(a_0 + a_1 t) - (a_0 + a_1 t - d a_1)] + 0 = d a_1 \)
Example
Differencing of monthly accidental deaths, USA, 1973 - 1978; deaths.txt
6. Testing the Estimated Noise Sequence
Suppose \(y_1, ..., y_n\) are observations from a sequence of random variables \(Y_1, ..., Y_n\)
Test of Randomness
  • \( H_0: Y_1, ..., Y_n \) are \(iid\)
  • Graphical method
    • If \(H_0\) holds, then for large \( n, \hat{\rho}(1), ..., \hat{\rho}(h) \) are approximately \(iid\) \(\mathcal{N}(0, \frac{1}{n})\)
    • So, we check the sample ACF plot of \(y_1, ..., y_n\) to see if \( \mid \hat{\rho}(h) \mid \le \frac{1.96}{\sqrt{n}} \) for \(h \le 40 \)
    • Reject \(H_0\) if more than 2 or 3 \( \hat{\rho}(\cdot) \) (5%) fall outside bounds
Portmanteau tests
  • Box-Pierce test:
    • Test statistic: \( Q = n \sum_{h=1}^k \hat{\rho}^2(h) \)
      Usually, \(k=20\) is selected.
    • Under \(H_0\), \(Q \sim \chi_K^2\) approximately
    • Reject \(H_0\) if \(Q \gt \chi_{K, 1-\alpha}^2\) at level \(\alpha\).
    • R command: Box.test
  • Two refinements:
    • Ljung-Box test, which is more accurate for small \(n\);
      R command: Box.test
    • Mcleod-Li test, which further tests for \(iid\) of \(\{Y_t^2\}\);
      R command: Mcleod.Li.test (in R package 'TSA')
Other tests:
Turning Point Test, Difference-Sign Test, Rank Test, etc. (see textbook)
Test of Normality
  • \( H_0: Y_1, \dotsm, Y_n \) are normal
  • Graphical method: normal probability plot
  • Shapiro-Wilk test;
    R command: shapiro.test
  • Jarque-Bera test;
    R command: jarque.bera.test (in R package 'tseries')
Examples
  • Simulated 200 observations from \(iid\) \( \mathcal{N}(0, 1) \)
  • Level of Lake Huron, 1875-1972; lake.txt