10. Time Series Analysis (Part 2 of 3)

Introduction
Sampling Issues (Aliasing, leakage and smearing)
Data windowing (tapering): dealing with leakage and smearing
Autospectral methods
    Single Taper Methods
         Blackman-Tukey Method
        Maximum Entropy Method
    Multi-Taper Method


Introduction

In previous lectures we learned methods to estimate the variance of a data set, and the correlation between two variables.  If the variability in the data set is time independent and lacks a specific frequency structure, then simple statistics such as the variance and correlation coefficient will be adequate to describe the relationship inherent in the data and between variables. If however, the data exhibits distinct serial correlation and this persistence arises from the interaction of multiple, periodic processes, then statistics such as the variance and the correlation coefficient are inadequate to capture the richness of information in the data set. What we would like, is a method that partitions or decomposes the variance in the data set into independent (orthogonal) components.

We shall see that the various forms of spectral analysis provides a means of achieving this objective with some degree of success.  These methods are refereed to as spectral analysis because they decompose the variance within the record as a function of frequency. This frequency spectrum is referred to as a power spectrum or the spectral density function for the time series.

The basis for these methods is the fourier transform, which allows us to relate a time series in the wave (or time domain) to a spectrum in the frequency domain (see notes for Week 9). Put simply, it is possible to represent a continuous time series as the sum of many periodic sinusoids of infinite length. (A basic assumption of the methods is "stationarity" which assumes that the process that has been sampled repeats infinitely.) In Fourier Analysis, each fourier frequency is associated with one sin and one cosine function. The frequency representation of a pure sinusoid is mathematically referred to as a Dirac delta function. This spectral line has infinite height and zero width. Parsival's Theorem states that the sum of the squared coefficients of these sinusoids is proportional to the total variance of the record. Thus, spectral decomposition by fourier analysis provides us with a means of characterizing the relative importance of the multiple processes with different frequencies that contribute to the total variance of a time series.

We can easily define the fourier frequencies associated with the longest and shortest period waves that contribute variance to the time series.  In theory, if the time series is N points long, the longest wavelength that can be resolved is half the record length (T=N/2). The associated frequency of this wave is f = 1/(N/2) = 2/N.  As stated in the previous notes (see notes for Week 9), the shortest period wave that can be observed has a period that is twice the sampling interval (T = 2dt). The frequency of this wavelength is called the Nyquist frequency: fn = 1/(2dt).

Before we can discuss the various forms of spectral analysis in detail we must consider how taking discreet samples from a time series influences our ability to estimate the power spectra.



Sampling Issues (Aliasing, leakage and smearing)

Taking discrete samples from a continuous processes affects the time series in three important ways.  These are referred to as aliasing, leakage and smearing.

Aliasing occurs when the sampling resolution that we employ is too coarse in time or space to capture the highest frequency process operating on the data (see aliasing figure). This is perhaps the most dangerous process that we face because once sampled, there is absolutely no way to recover the information at greater frequency that has been lost, short of collecting more data. Moreover, the act of aliasing the record may lead us to incorrect conclusions regarding the character of the processes at work. The only way to avoid aliasing is to sample at a frequency that is greater than the frequency of the process of interest. If we are conducting a planned experiment, and the highest frequency to be measured is know before hand, then we can calculate an appropriate sampling interval based on the Nyquist frequency.  If we do not know the appropriate frequency, then we can over sample the record by selecting a frequency that we suspect is much higher than the Nyquist, collect some sample data, analyze its spectra and determine the Nyquist and sampling interval empirically, before continuing our data collection efforts. If we are working with observational data, then the second approach is most appropriate unless prior knowledge of the system and processes at work is available.

Leakage is the translation of power (variance) from a discrete spectral peak to adjacent frequencies above and below the true frequency of the process (see leakage figure). The transferred variance shows up as smaller discrete peaks (referred to as side lobes). In severe cases, there may be many progressively smaller lobes at ever further frequencies. This is a numerical consequence due to finite record length that arises during the transformation of the data from the time domain to the frequency domain. If the side lobes fold into frequencies that contain true variance, the result will be a biased estimate of the variance at that frequency.

Smearing causes a narrow, discrete spectral peak to increase in width or "smear" into directly adjacent frequencies (see smearing figure). If two true spectral peaks are closely spaced, smearing can make it difficult to distinguish them. Like, leakage, this is a numerical consequence that is due to finite record length and arises during the transformation of the data from the time domain to the frequency domain.

The most fundamental means of dealing with leakage and smearing is to increase the record length that we are dealing with.  This gives us more realizations of the process of interest and thus helps to better constrain the spectra.  We shall see in the next lecture that increasing the record length is also necessary to decrease the statistical confidence interval of the spectral estimate. In addition to collecting longer records, unlike aliasing, there is a numerical means of dealing with leakage and smearing (windowing the data). Unfortunately, the two sources of error are linked in such a way that decreasing leakage in the spectrum, results in greater smearing.  Attempts to deal with these problems have provided much of the motivation for the development of the methods described below.



Data windowing (tapering): Dealing with leakage and smearing

The most fundamental of the spectral methods is the direct method. This method is based on the calculation of the fourier transform via Parsival's theorem.  Unfortunately, the spectrum generated by this method (referred to as the Periodogram) has serious drawbacks: severe leakage and the associated formation of spurious peaks. These drawbacks arises from the limitations imposed by discrete sampling and finite record length.  In the time domain, we can think of the operation of discrete sampling as multiplying a continuous time series by a series of binary operators (i.e. 1's at the sample locations and 0's at the data gaps). Just as the time series itself has a frequency representation, the sampling function also manifests itself in the frequency domain. In the frequency domain, the time series and the sampling function are convolved, the frequency domain analog to multiplication of two time series in the time domain. The act of convolving the time series with the sampling function is referred to as "windowing" the data. The sampling function is referred to as a "window", or a "taper".

The periodogram fails as an effective frequency transformation because of the poor frequency characteristics of the sampling window. Because the sampling window is rectangular in shape in the time domain, it is often referred to as the "Boxcar" window.  The sharp edges of the Boxcar window are the downfall of the periodogram: the frequency transform of the Boxcar has large side lobes associated with the higher order frequencies that are needed to attempt to represent the sharp edges.

This realization led researchers to develop new methods of spectral transformation that first transform the data before calculation of the frequency spectra. These methods also make use of more sophisticated data windows (fig.) that are "tapered" on the edges to decrease leakage. Because these methods do not operate on the raw data, they are referred to as indirect methods.



Autospectral methods

There are now three widely used methods of determining the autospectrum of a time series.  These methods are referred to as "autospectral" because they involve determination of the spectrum of a single time series. Although I have seen these methods referred to as "single spectrum" methods, this terminology should be avoided because it can lead to confuse with "Singular Spectrum Analysis", a recently developed temporal filtering method than is related to and useful in conjunction with spectral methods.  This method will be discussed further in the final time series lecture.  As an aside, if one time series is compared against a second time series, then the method is referred to as cross-spectral. Each of the methods we will discuss below, can be modified to generate cross spectra. This will be a topic of the final time series lecture.

Before the spectrum of a time series can be generated several pre-processing steps must be employed:

High frequency removal It is useful to remove any noisy outliers from the data set as these can influence the data greatly.  A simple means of doing this is the use three passes of a 3 standard deviation filter that removes samples greater than 3 standard deviations from the mean. Alternative methods are to smooth the data using a running average, or spline fit. Disadvantages of the later two methods are that outliers and noise will potentially influence the running average, or the smoothing applied may remove much of the higher frequency variability in the record that may be true signal.  Finally, these methods can affect the phase of the time series and its frequency components (i.e. the timing of the peaks and troughs in the record).  A new and exciting approach to the problem of noise reduction is to employ a Singular Spectrum Analysis. The advantage of this method as we will see next week is that it provides a reasonably objective means of separating noise from signal and has the added benefit that it does not affect the phase of the record.

Low frequency removal The presence of trends in the data can greatly affect the shape of the spectrum as the spectral methods interpret trends as low frequency cycles greater than the record length. In addition, the presence of a trend violates the assumption of stationarity in the record, namely that the record is infinitely periodic. Interestingly, the mean of the data set can be thought of as the lowest frequency in the data set. Accordingly, the mean must be removed from the time series prior to spectral transformation. Any remaining trends in the data are generally removed  by one of two methods: Detrending or Pre-whitening. Detrending uses simple linear regression to calculate a least squares fit to any trend versus time in the record.  This trend is then subtracted from the record.  Pre-whitening employs a first difference filter (i.e. the value at t-1 is subtracted from the value at t) to remove any low frequency information from the record. Both methods produce similar results, although the pre-whitened record must be "post-colored" to maintain the true spectral shape in the frequency domain.

Equal sampling and record length considerations The methods to be discussed below require that the processes be sampled at equal intervals.  This means that some form of interpolation is required if the sample interval is not constant to begin.  Unfortunately, interpolation of the record can lead to the introduction of some noise. Finally, the performance of these methods is greatly enhanced by longer record lengths.  We will discuss the reason for this in the final lecture on time series analysis.

Single Taper Methods

The two traditional indirect methods of spectral estimation make use of a single window or data taper. The first important step in these methods is the translation of the data in a way that will help to decrease the effect of leakage and smearing on the spectral estimate. The basic translation employed in both of these methods is the autocorrelation function which was discussed previously (see week 9). We see once again the importance of the fundamental statistical measures and how more complicated procedures are based on simple statistical building blocks.



Blackman-Tukey Method

The oldest of the indirect autospectral estimation methods is the Blackman-Tukey Method (BT method). This methods is a non-parametric or data adaptive method because it makes no assumption regarding the PDF of the time series. The Blackman-Tukey methods begins with the calculation of the autocorrelation function (ACF) of the time series (see week 9). the spectrum is then calculated from the fourier transform of the ACF. There are three primary considerations that must be addressed with this method, the first is to determine the number of lags at which to truncate the autocorrelation function. Truncation acts as a filter of the high frequency information in the record. The smaller the number of lags at which we truncate the record, the smoother the resulting spectrum. In practice, a general rule of thumb is to use a  maximum lag that is less than or equal to 1/3 the record length.  The second consideration is the choice of an appropriate data window. In practice a tapered window such as the Hamming window, provides a reasonable compromise between smearing and leakage. Finally the resulting spectra must be averaged to decrease the variance of the estimate.  There are several ways that this can be done. They will be addressed in the third time series lecture.



Maximum Entropy Method

The Maximum Entropy Method (or MEM) also makes use of the ACF, but it differs from the BT method in that the ACF is approximated by an autoregressive process, or AR model. Unlike the BT method, the MEM method is thus a parametric method. The order of the AR process determines the degree of smoothing applied to the spectrum.  The advantage of the MEM method over the BT method is that the spectra produced by MEM are smoother than BT spectra. The disadvantage of the method is that the ACF of the time series may not be well fit by the AR model. The result of this is inadequate separation of spectral peaks in low order MEM spectra, and the notorious generation of spurious peaks in higher order MEM spectra.  A recent advance to correct this problem is to use SSA as a means of pre-filtering the time series before calculating the MEM spectra. When employed properly, this can effectively remove spurious peaks and allow the separation of peaks in low order MEM spectra.



Multi-Taper Method

The Multi-Taper method is the most recently devised of the spectral estimation tools.  Like the BT Method it is a non-parametric or data adaptive method. Unlike the BT method, the MTM employs multiple tapers or windows, rather than a single taper, hence its name. These tapers are referred to as discrete prolate spheroidal sequences (DPSS), or Slepian tapers after one of the pioneers of the method. The tapers are eigenvectors derived from the time series itself. They represent an optimal set of windows designed to minimize spectral leakage. Once the tapers are determined, each is pre-multipled by the original time series, and power spectra are derived from these multiple windowed versions of the time series. A final ensemble MTM spectra is then calculated from the average of the individual spectra.
One of the advantages of the MTM method is that it allows the generation of spectra with much higher frequency resolution than the BT or MEM spectra. The use of multiple windows also provides spectra that are are stable than raw BT and MEM spectra.
The principle tradeoff in the MTM is between enhanced frequency resolution and stability of the resulting spectra (which is related to the number of tapers employed).  It can be shown that for a given time-bandwidth parameter (OmegaN; the product of the bandwidth and record length), the optimal number of taper is always less than twice the time-bandwidth parameter. In practice it is sufficient to set  k = 2*(OmegaN) -1 to obtain good results.  The time-bandwidth parameter generally takes integer values ranging from 2-4. The associated number of tapers for each case is thus:
 

Time-Bandwidth Parameter (OmegaN)
Number of tapers 
(K)
2 3
3 5
4 7

Use of a time-bandwidth parameter of OmegaN=1 would result in the selection of a single taper which defeats the advantages of the method. The greater the time-bandwidth parameter, the finer the resolution of the spectra. Such a spectra is well suited to capturing the high frequency components of the spectrum but may overly smooth the low frequency components of the record. Time-Bandwidth parameters greater than 4 may result in spectra that become increasingly unstable.

Other advantages of MTM spectra over the traditional methods are extensions that provide statistical tests to determine if the variance at a particular frequency arises from a pure sinusoidal line process, and significance testing of spectral peaks against a red noise background.