week11notes

11. Time Series Analysis (Part 3 of 3)

Extensions to Spectral Analysis
Confidence intervals
Significance Tests
Related Methods
    Singular Spectrum analysis (SSA)
    Cross-spectral analysis
        Coherence
        Phase

Extensions to autospectral analysis

We have already been introduced to the fundamental concepts behind spectral analysis, and have discussed several means of calculating power spectra. One of the key remaining questions is "How do we assign confidence intervals and significance levels to the spectra resulting from these methods?" Our answer to this question will let us place the spectral results in a more complete statistical context.

Confidence intervals

We begin by noting that, each Fourier frequency is associate with only two degrees of freedom: one sine and one cosine wave. This is true regardless of the record length because the sinusoids for each Fourier frequency are assumed to be purely periodic and thus must run the entire length of the record and beyond. Because each Fourier frequency has so few degrees of freedom, the uncertainty in the estimated power spectral density at each frequency is very large. In a raw spectra of this type, it is in fact as great as the estimated variance itself (i.e. a confidence interval of up to +/- 100%!). Because the variance within the record changes by several orders of magnitude across the frequency range, the size of the confidence interval varies as a function of frequency if plotted on a linear scale. If however, one calculates the log of the variance, referred to as spectral density or power, then the confidence interval is constant as a function of frequency. Spectra are often plotted on a semi-log y axis and a linear frequency axis for this very reason.

What is needed is a means of somehow increasing the degrees of freedom associated with the Fourier frequencies in the power spectra. But how? There are two strategies that have been employed with single taper methods to deal with this problem. The first is referred to as band averaging, the second is referred to as ensemble averaging. Each method has advantages and disadvantages.

Band Averaging: In the band averaging method, an autospectrum is calculated and then, adjacent Fourier frequencies are averaged to produce a band averaged spectrum. This allow us to decrease the width of the confidence interval on the spectra by pooling the degrees of freedom from nearby frequencies. As a rule of thumb, once we average 3 adjacent frequencies, any additional averaging does not greatly appreciably decrease the confidence interval and thus costs spectral resolution with no added benefit. The disadvantage of this method is that it costs us spectral resolution and thus can potentially bias our interpretation of the record. If we band average across a spectral doublet (two peaks that are very close in frequency) and perceive this as a single peak we may develop a hypothesis to explain the results that is simple wrong.

Ensemble Averaging: In the ensemble averaging method, we begin by cutting the original record into a number of sub-records. A separate spectrum is then calculated for each sub-record. These sub-records are then averaged to produce an ensemble averaged spectrum. The advantage of the ensemble averaged spectrum over the band averaged spectrum is that there is no loss of spectral resolution (bandwidth). As with the band averaging method averaging more than five sub-records leads to diminishing returns. In general, when using single taper methods, the total record is split into 3, 4 or 5 sub-records depending on the original record length. The disadvantages of the ensemble averaging method are it is more sensitive to any non-stationarity in the record than band averaging, and by cutting the record into shorter sub-records, the longest frequencies that can be analyzed is decreases. As an aside, the Multi-Taper Method generates an ensemble averaged spectrum. The use of multiple tapers allow the record to the broken into smaller lengths than in single taper methods. Thus between 3-7 sub-records are employed in the MTM depending on the choice of the time-bandwidth parameter, OmegaN.

Significance tests for peak amplitude and line frequencies

Now that we have discussed three means of decreasing the confidence limits on the spectrum (band averaging, ensemble averaging, MTM), is appropriate to consider tests for significance of spectral peaks. To do this requires knowledge of the spectral density of the peak, the magnitude of the confidence intervals, and a noise spectrum (or continuum) against which the peak can be tested.

The Peak Amplitude test

Choosing the appropriate continuum requires an assumption about the processes in operation and is thus the most difficulty part of this process. The simplest assumption against which to test peaks for significance is to assume a white noise background with mean or median power that is identical to that of the data record. If the confidence interval is added to this median background spectrum, we can test to see any peaks rise above the confidence. As with other hypothesis testing methods the user specifies the level of the confidence interval (i.e. 80%, 90%, 95%). The disadvantage of this method is that the underlying process may not be white at all. When that is the case we will greatly overestimate the significance level associated with the spectrum. In process such as those affecting the climate system, this is exactly the case. For climate records, it is thus appropriate to test against a red noise spectra with the same median power as the record.

The Line Frequency Test

White or Red noise tests for amplitude significance can be employed with the MTM as with the single taper methods. The MTM also provides a second type of significance test that has led to some confusion in the climate literature. This is the F-test for spectral frequencies or line frequency. This test uses an F-test to determine the probability that spectral density at any given frequency is associated with a discrete line process isolated only at that particular line frequency. This test thus assess the character of the distribution of power within the spectrum, not whether the amplitude of a spectral peak rises significantly above the background noise. In practice the F-test for line frequencies is often applied to all frequencies in the spectrum. The entire F-spectrum is then often reported superimposed on the power spectrum. Results plotted in this way can be difficult to interpret. A better approach is to report F-values for spectral peaks with significant amplitudes only.

Related Methods

We have in earlier examples how simple statistical methods can be combined and used to generate more complex statistical tools. Here we will discuss Singular Spectrum Analysis and cross-spectral analysis and discuss how they are related to spectral analysis.

Singular Spectrum Analysis (SSA)

Singular Spectrum Analysis (SSA) is a recently developed method that can act as a powerful extension to spectral analysis. SSA is a matrix based method similar to Principle Component Analysis (PCA) and R-Mode factor analysis. SSA takes its name from the fact that just as in modern applications of factor analysis methods, singular value decomposition (SVD) is usually employed to find the singular values (square roots of the eigenvalues) and the eigenvectors of the data matrix. A more descriptive name for this method would have been time-lagged PCA.

At this point, you may ask: "How can a matrix based method (which must require multiple variables) be used to analyze a single time series?" The answer to this question displays the elegance of the SSA method. Recall that PCA is based on finding the eigenvalues and eigenvectors of the correlation matrix of a number of variables. The "variables" that are employed in SSA are the lagged versions of the original time series that are used to calculate the lagged autocorrelation function. By comparing each lagged copy of the time series against all of the others, a temporally tagged correlation matrix can be built:

r(0,1)	r(1,1)	r(2,1)	r(3,1)	r(4,1)	r(5,1)
r(0,2)	r(1,2)	r(2,2)	r(3,2)	r(4,2)	r(5,2)
r(0,3)	r(1,3)	r(2,3)	r(3,3)	r(4,3)	r(5,3)
r(0,4)	r(1,4)	r(2,4)	r(3,4)	r(4,4)	r(5,4)
r(0,5)	r(1,5)	r(2,5)	r(3,5)	r(4,5)	r(5,5)

where r(1,m) represents the correlation between the lth and mth lagged version of the record. The first column in the matrix above thus corresponds to the Autocorrelation function of the time series (i.e. lag l=0 compared against lags m=1 ... k). The eigenvalues and eigenvectors of this temporally lagged correlation matrix are then found by SVD. The user must specify k (sometimes referred to as the window length) over which to calculate the lagged autocorrelation functions. Once the singular values and this the eigenvalues) are determined by SVD, the user must determine then number of eigenvalues to retain as in factor analysis. This provides an effective and objective means of filtering signal from noise. The eigenvalues are associated with temporal principle components (TPC's). These TPC's can be projected back onto the original time series to obtain "Reconstructed Components" (akin to factors) that can be summed to reconstruct the filtered record. an advantage of the RC's is that they retain the same phase information as the original record an so can be directly compared with the original record, or with potential forcing functions to establish phase relationships relative to the forcing.

SSA provides a means of (1) objectively filtering short, noisy time series, and (2) decomposing a time series into non-linear, but orthogonal principle components. Because of its noise filtering capabilities, SSA can be used to pre-filter the noise from a time series before spectral estimation. This leads to more stable spectra with fewer spurious peaks. Another advantage of the method is data adaptive or non-parametric. Recall that in spectral analysis, we decompose the time series using a basis set that consisted of a series of infinite sinusoids centered on the Fourier frequencies. This assumes that the processes generating the record are purely periodic and requires a strict assumption of stationarity. In SSA, the basis set is a series of temporal Principle Components derived from the structure of the time series itself. While these TPC's are orthogonal, they need not be strictly periodic so that the assumption of stationarity is somewhat relaxed. This can be helpful as many records do not exhibit strict stationarity. The cost of this freedom, however is the because the methods is distribution independent, significance tests based on assumptions of normality are inappropriate. Fortunately, new means of estimating the significance of the TPC's based on comparison against many realizations of random processes (Monto Carlo methods) are being developed. A second disadvantage of the methods is that it provides no direct means of calculating the dominant periodicity of the TPC's. A traditional spectral analysis must be employed for this purpose. We will discuss phase relationship more in the sections that follow.

Cross-spectral analysis

Thus far, we have discussed autospectral methods only. These methods allow as to break down the total variance in a time series into frequency components. These methods are thus a next logical step following determination of the variance of a single time series. Often however, we are interested in comparison of two or more variables. Previously, we learned that covariance and cross-correlation analysis allowed us to study how two variables may share common information or co-variance. These methods were applied to data sets that lacked any temporal variability.

Cross-spectral analysis provides us with a means of extending the frequency decomposition gained from autospectral analysis to study the frequency information contained in the covariance of two time series.

Coherence Spectrum

We learned earlier that the autocorrelation function is the basis for the indirect spectral estimation methods. The cross-correlation function is the basis for cross-spectral analysis. Mathematically, this method is essentially identical to autocorrelation analysis, but instead of comparing the record against lagged copies of itself, the record is compared against lagged copies of a different time series. The resulting spectrum is referred to as a coherence spectra. The coherence spectrum can be thought of as the frequency resolved extension of the squared correlation coefficient. While the squared correlation coefficient explains how much variance is shared between two variables, the coherence spectrum estimates how this variance is distributed as a function of frequency. Because the two records have different variances, the results are normalized to scale that ranges between 0 and 1. A value of zero indicates no common spectral density in the two records at that frequency. A value of 1 indicates that all of the spectral density at that frequency is shared by the two records.

Phase Spectrum

Cross spectral analysis provides us with even more useful information about the two time series, This is the phase spectrum of the two time series. This spectrum tell us whether cycles at a given frequency in one record happen earlier or later than cycles at the same frequency in the other record. Information of this type is critical for evaluating statistical cause-effect hypotheses. If for example, variable A is believed to case variable B, then changes in variable A must occur before changes in variable B. Moreover, this variance must be coherent between the two records. Thus the coherence and phase spectra are powerful tools for testing the relationship between the variables that comprise a mechanistic system. In the field of paleoclimate, for example, this type of methodology enabled the development of a mechanistic and quantitative "Earth Systems" approach to the study of glacial-interglacial cyclicity.