Assessment of Diagnostics for the Presence of Seasonality
Catherine C. Harvill Hood
Catherine Hood Consulting,
Email: cath@catherinechhood.net^{}
^{ }
^{ }
^{ }
^{ }
^{ }
Abstract
An important criterion when assessing the quality of a seasonally adjusted series is the presence of residual seasonality. X12ARIMA automatically checks for residual seasonality using spectral diagnostics which are generally considered reliable if there are at least 60 points in the series. However, this is longer than most quarterly series. This paper uses a large sample of both real and simulated quarterly and monthly series to look at various options available in X12 for spectral estimation and other diagnostics that are sometimes used to detect seasonality. The spectral diagnostics worked well for short quarterly series, especially with some of the newer options available in X12.
Keywords: seasonal
adjustment, spectral graphs
1. Introduction
One of the most important criteria when assessing the quality of a seasonally adjusted series, and more generally of a seasonal adjustment method, is the presence of residual seasonality. Residual seasonality can result from inadequacies in the adjustment procedures or ARIMA models chosen, and also possibly from seasonal effects that are difficult to estimate, such as series with highly variable effects. With indirectly adjusted aggregate series, residual seasonality can result when some of the component series are inadequately adjusted. Testing for the presence of seasonality is also useful when looking at raw data to see if seasonal adjustment is even necessary.
Not only is it an important issue, residual seasonality in the seasonally adjusted series or the irregular component can almost always be eliminated by using different seasonal adjustment options. In most cases, residual seasonality can be removed by shortening the span of the data used for seasonal adjustment or by changing the seasonal filters. For very long series, the same seasonal filter or model may not be able to adequately estimate the seasonality through the entire length of the series.
So far, X12ARIMA (U.S. Census Bureau, 2005) is the only software that automatically checks for residual seasonality. The test is done using spectral diagnostics, but spectral diagnostics could be used for any seasonal adjustment method. Without X12 or X13, a basic, but potentially costly, test to check for residual seasonality is to simply rerun a seasonal adjustment. If the software detects some seasonality, then there is a problem in the adjusted series.
Though the spectral diagnostics in X12ARIMA are very useful, they are generally considered reliable if there are at least 60 points in the series. For quarterly series, this would mean at least 15 years of data. Many economic time series do not have 15 years of data. Therefore, estimating residual seasonality for quarterly series presents a challenge. The default estimator for the spectrum in the currently released version of X12ARIMA is the 30thorder autoregressive, AR(30), model. This paper looks at using the 10thorder autoregressive, AR(10), model instead. Generally the AR(10) model would give too many false peaks for most monthly series, but could be practical for use in quarterly series.
Because it is a common, but not entirely recommended, practice to use the M & Q diagnostics available in X11ARIMA/X12 for the detection of residual seasonality, I also looked at these diagnostics. Another possible diagnostic for the detection of residual seasonality is a chisquare test on a set of fixed seasonal effects (seasonal dummy regressors). This is similar to the test in SEATS (Gomez and Maravall, 1997) where SEATS looks at the significance of individual months using individual ttest values (Maravall, 2004). These diagnostics are dependent on the ARIMA model used, so I investigated these diagnostics under many different ARIMA models.
2. Descriptions of the Diagnostics
2.1 Spectral Graphs
Spectral graphs of the detrended seasonally adjusted series and the detrended irregular component are included automatically with X12ARIMA output as the primary method to detect residual seasonality. For series that are not too short, the spectral graph is the most sensitive of all the tests for residual seasonality.
Any time series can be analyzed from two different points of view: time and frequency. The two approaches are complementary, giving us two different views of the same information. To analyze a series in the frequency domain, we measure the strength of the different frequencies in decibels. The graph of the frequencies versus the decibels is called the periodogram or the spectrum. Spectral analysis allows us to see the relationships between the frequencies. We can quantify the importance of certain frequencies of interest relative to frequencies of other components.
For a typical monthly economic time series with a significant seasonal component, there can be several different kinds of effects, for example, there could be a quarterly effect, a biannual effect, and an annual effect in the same series. However, for quarterly series, the possibilities are much smaller because of the timing of the data collection. In a quarterly time series with a significant seasonal component, the amplitudes that dominate the spectral graph are amplitudes associated with components that repeat every year (i.e., four quarters) or every two quarters. In X12ARIMA, the seasonal frequencies at ¼ cycles per quarter (an annual effect) and ½ cycles per quarter (a biannual effect) are marked in the graphs. If the series is seasonal, there would be peaks at one or both of the two seasonal frequencies.
In a spectral graph of a time series, the low frequencies at the left of the graph correspond to slowly changing components, like the trend, while the higher frequencies correspond to rapidly changing components, like the irregular. If a series has strong longterm trend movements, the low frequencies associated with the longterm tend movements can have amplitudes that dominate the spectrum. Therefore, some kind of prior detrending, such as differencing, is required before the spectrum calculations so we can see clearly the amplitudes at the seasonal frequencies.
To have a spectral estimator with enough resolution so that peaks can be sharply defined, X12ARIMA, by default, uses an AR(30) model fit to each output series and evaluated at 61 frequencies between 0 and ½ cycles/quarter. In the latest version of X12ARIMA (Version 0.3), the order of the AR model for the spectral estimate can be changed from the default.
The rule for a seasonal peak being flagged as “visually significant” by X12ARIMA is that the value at the seasonal frequencies must be six “stars” or asterisks higher than either neighboring frequency in the plot found in the output file of X12ARIMA. (For the plot, the range of the frequencies is divided in 52 parts, so each asterisk in the plot equals 1/52nd of the range.) The higher the peak is above its neighbors, the more important the peak. In addition to the “sixstar” rule, a seasonal peak also must be higher than the median of the frequencies in the graphs for X12ARIMA to flag the peak. For more information on how spectrum peaks are flagged, see Soukup and Findley (1999).
For more information on how the spectral diagnostics are used in both SEATS and X12ARIMA, see the papers by Maravall (2004) and Findley and Hood (2000).
2.2 D8 Ftest and M7 of X12ARIMA
Included after Table D8 in the X12ARIMA output are some ANOVA tables top help measure stable and moving seasonality. Generally for Ftests, one would use a cutoff value of 4.0, but because we have generally correlated data in the time series, a cutoff value of 7.0 is recommended by the authors of X11ARIMA (Lothian and Morry, 1978).
The first ANOVA table given tests for the “presence of seasonality assuming stability” using the variation between months. The Fvalue from this table is referred to in Table F2.I as the “Ftest for stable seasonality from Table D 8” and so has come to be known as the “D8 Ftest”. It is also sometimes referenced as FS, the Fvalue to test for seasonality. A second ANOVA table after Table D 8 labeled as the “moving seasonality test.” It is sometimes references as FM, the Fvalue to test for moving seasonality. M7 is a descriptive statistic based on the FS (the D8 Ftest) and FM. M7 was designed by the authors of X11ARIMA to determine whether seasonality can or cannot be identified by X11 (Lothian and Morry, 1978). Its interpretation is straightforward: a value greater than 1.0 indicates no identifiable seasonality.
It is a common misconception that the M7 and D8 Ftest available in X12 are for the detection of residual seasonality. Both diagnostics were meant to help the user decide if the original series is seasonal, and some users run the seasonally adjusted series back through X12ARIMA to look at the values for M7 and the D8 F, though the M7 diagnostic was not designed for this purpose. Also, the M7 is susceptible to the model used and to the type of adjustment used by X12ARIMA, so to be used for checking residual seasonality, it would require careful modeling of the seasonally adjusted series. The D8 Ftest is somewhat more stable than the M7 diagnostic, but is still dependent on the extreme values chosen in the X11 procedure.
2.3 Other Tests
In SEATS, Agustín Maravall has used a significance test on the individual months or quarters to determine if there is significant seasonality in the original series, to warrant seasonal adjustment. A test of the individual months could also be used to test for residual seasonality. X12ARIMA also contains builtin seasonal dummy variables. Produced as a result of the regression are ttests on the individual months or quarters and a chisquare test to test for the significant of the set of seasonal regressors. Though not used commonly by X12ARIMA users, both the ttests and chisquare values could be used. These tests also have the potential to be susceptible to changes in the model, and would require modeling of the already adjusted series. Therefore, research is required to test the stability of the diagnostic against changes in the model.
Agustín Maravall in his 2004 paper suggests the use of the KendallOrd nonparametric test (Kendall and Ord, 1990) and mentions some success with the diagnostic. However, nonparametric tests of this sort usually are not feasible for short or quarterly series.
3. Methods
3.1 Running the Series
I began the research with 314 seasonal quarterly economic time series that were long enough (more than four years) to run in X12ARIMA and SEATS. I ran all 314 series through X12ARIMA using seasonal dummy regressors along with the following nonseasonal ARIMA models: (2 1 0), (0 1 2), (0 1 0), (0 0 0), and the results from the Version 0.3 automatic modeling selection including the regressors. For comparison, I also looked at the results using the model from the automatic modeling procedure using no regressors. For all 6 models above, I used both the AR(30) spectral estimator and the AR(10) spectral estimator. I looked at spectral results using both the outlieradjusted original series and the prioradjusted original series.
I collected the spectrum peak “star” values, M7, Q2, D8 Fvalue, and the Chisquare pvalue for the seasonal regressors. Though I did look at spectral graphs for the series, I depended more on the number of stars listed in the diagnostic output file. I used Excel for most of the analysis, including X12Data (Feldpausch, 2003) and the Excel version of X12Rvw.
3.2 Simulated Series
To test the sensitivity of the diagnostics on various levels of residual seasonality, I constructed simulated series from a set of known trends, seasonal components and irregular components (Hood, Ashley, and Findley, 2000). This allows control of the various levels of each component, so I constructed some series with strong seasonality, some with no seasonality, and some with very little seasonality similar to the level of seasonality that we’d like to detect when looking for residual seasonality. The impact of changes in the models and options for the diagnostics can then be compared to what we know is the correct answer for the given series.
The focus for this part of the study was on the spectral diagnostics along with the tests for the seasonal regressors because these will be the diagnostics most useful generally. For example, applying the M7 diagnostic designed for the raw, original series in X12ARIMA, to a SEATS adjustment seemed a bit dubious.
4. Results
4.1 Results for the Real Series
For seasonal series, we would expect to see
peaks at the seasonal frequencies. Using
the AR(30) estimator with the series, because there are less than 60 points,
the peaks were often very short and flat and difficult to distinguish. The peaks are much easier to distinguish
using the AR(10) estimator. Assuming
that the AR order is held constant, the results for the outlieradjusted series
and the prioradjusted series were almost identical, so I focused on the
prioradjusted spectrum.
The results in the table below show there
is also an effect from the model used. For
the (0 0 0) model, which would generally be inappropriate for these series, the
outlier sets chosen automatically by the program were not ideal, and the
spectrum results generally suffered. The
peaks are labelled as either S1 (meaning ¼ cycles per quarter) or S2 (meaning ½
cycles per quarter). The ARIMA models
given in Table 1 are the nonseasonal part of the model. In this table and in all subsequent tables,
the models include fixed seasonal regressors for the seasonal part of the
model.
Table 1. Results for Spectral Peaks
ARIMA model
* 
Spectral AR order 
Peak 
# >6
stars 
# 56 stars 
# 05 stars 
Average
Stars (total # of
peaks) 
(2 1 0) 
10 
S1 
71 
21 
95 
5.3 (187) 


S2 
98 
12 
133 
5.4 (243) 
(2 1 0) 
30 
S1 
9 
9 
81 
2.6 (99) 


S2 
29 
6 
107 
3.9 (142) 
(0 0 0) 
10 
S1 
55 
9 
112 
4.1 (176) 


S2 
69 
14 
148 
4.7 (231) 
(0 0 0) 
30 
S1 
3 
10 
94 
2.2 (107) 


S2 
28 
3 
111 
3.6 (142) 
* Nonseasonal part of the ARIMA model. Fixed seasonal regressors used for all runs.
The peak information given in Table 1 is not mutually exclusive, meaning that some series would have had a peak for both S1 and S2 and some had only 1 peak. Results from the other models, i.e., (0 1 2) and (0 1 0), were similar to the results given above using (2 1 0).
All of the regression and ANOVAbased diagnostics (M7, D8 Ftest, and Chisquare test on the seasonal regressors), as with the spectrum, are dependent on the model chosen. The results for any of the diagnostics were inadequate using seasonal regressors and the (0 0 0) model, which I knew would be a bad model for seasonal series.
For these series, I expected the diagnostics would say the series were seasonal. I looked first at the number of the 314 seasonal series that would be rejected as being nonseasonal with the various models. It can be seen clearly in Table 2 below that a bad model, (0 0 0), affects all the diagnostics. There were fewer false rejections with the Chisquare test and the D8 Ftest using a cutoff value of 4.0.
Table 2. Results for M7, D8 Ftest, and Chisquare
Test

# of Series
Chosen as Nonseasonal 

ARIMA
Models 
M7>1 
D8F<7 
D8F<4 
ChiSq
p>5% 

(2 1 0) 

28 
41 
16 
13 
(2 1 2) 

37 
45 
18 
21 
(0 1 2) 

33 
43 
17 
13 
(0 1 0) 

36 
43 
23 
31 
(0 0 0) 

210 
200 
154 
236 
I also counted the number of series that
were in disagreement between the diagnostics were using different models,
excluding the (0 0 0) model. For most
series and most models, there was very little disagreement as to whether or not
a series was seasonal. An example of one
such comparison is given in Table 3.
Table 3.
Results for M7, D8 Ftest, and Chisquare Test

# of Series
in Disagreement 

Models for
Comparisons 
M7 
D8F>7 
D8F>4 
ChiSq 
(2 1 0) & (2 1 2) 
14 
16 
3 
16 
(2 1 0) & (0 1 2) 
10 
6 
3 
16 
(2 1 0) & (0 1 0) 
12 
14 
6 
24 
Though I ran all models for all series, in
the tables that follow I will only list the (2 1 0) model as an example of the
performance of the diagnostics under nonseasonal model that contains a first
difference.
4.2 Results for the Simulated Series
4.2.1 Results for AR(10) and AR(30) Spectral Estimators
For the series with strong seasonality, the AR(10) estimate in the spectrum gave us the best results, finding that all 144 series were in fact seasonal. Even with seasonal factors on the order of 50% to 150%, the AR(30) estimate did not always identify the seasonality present.
Table 4.
Results for AR(10) and AR(30) estimators in series with strong
seasonality
Models 
AR 
# of
seasonal series 
# of series
with visually significant peaks 
(2 1 0) 
10 
144 
144 (100%) 
(2 1 0) 
30 
144 
134 ( 93%) 
(0 0 0) 
10 
144 
144 (100%) 
(0 0 0) 
30 
144 
122 ( 85%) 
For the series with no seasonality, it was somewhat expected that there would be more false positives with the AR(30) estimator. In fact, the opposite was true. Therefore, for the rest of the study I focused on the AR(10) estimator.
Table 5.
Results for AR(10) and AR(30) estimators in series with no seasonality
Models 
AR 
# of
seasonal series 
# of series
with visually significant peaks 
(2 1 0) 
10 
0 
57 (40%) 
(2 1 0) 
30 
0 
65 (45%) 
(0 0 0) 
10 
0 
51 (35%) 
(0 0 0) 
30 
0 
72 (50%) 
4.2.2 Comparison of the Diagnostics
Comparing the spectral diagnostics to the D8 Ftest and the Chisquare test on the seasonal regressors, we can see that the model does have some effect on the results. For the strongly seasonal series, with a (0 0 0) model plus seasonal regressors model, the D8 Ftest and the Chisquare test both have trouble identifying the seasonality that is present, as shown in Table 6 below.
Table 6.
Results for series with strong seasonality

Model used 

Diagnostic 
(2 1 0) 
(0 0 0) 
D8 Ftest 
144 (100%) 
49 (34%) 
Chisquare 
144 (100%) 
0 ( 0%) 
AR10 Spectrum 
144 (100%) 
144 (100%) 
For the series with no seasonality, using a model with AR or MA terms, such as the (2 1 0) model plus the seasonal regressors, causes some of the series to be identified as seasonal. As seen in both Tables 6 and 7, the spectrum is less susceptible to the model than the other diagnostics.
Table 7.
Results for series with no seasonality

Model used 

Diagnostic 
(2 1 0) 
(0 0 0) 
D8 F 
56 (39%) 
0
( 0%) 
Chisq 
90 (62%) 
0
( 0%) 
AR10 Spectrum 
57 (40%) 
51 (35%) 
For the series with weak seasonality, I had hoped that all the diagnostics would identify the seasonality. As seen in Table 8, with a (2 1 0) plus seasonal regressors model, the D8 Ftest and Chisquare test did a reasonably good job, but failed to find any seasonality using a (0 0 0) plus seasonal regressors model. Therefore, the success or failure of these tests in finding residual seasonal would depend on the model selection. For the spectrum diagnostics, it was not so susceptible to the model, but it did not identify the weak seasonality as often as I had hoped, though still finding seasonality in 60% of the series.
Table 8.
Results for series with no seasonality

Model used 

Diagnostic 
(2 1 0) 
(0 0 0) 
D8 F 
111 (77%) 
0
( 0%) 
Chisq 
124 (86%) 
0
( 0%) 
AR10 Spectrum 
86 (60%) 
86 (60%) 
4.2.2 Monthly Series
There was also a very limited study on some simulated monthly series. Using the AR(10) estimator for the spectrum, I did have quite a few more false positives, as expected, for nonseasonal series.
5. Conclusions
For quarterly series with six to 15 years of data, an AR(10) estimator for the spectrum gives both fewer false positives (peaks for nonseasonal series) and more true positives (peaks for seasonal series) than the default AR(30) estimator. Therefore, use of the AR(10) estimator should be considered for use on quarterly series.
There has been some research on the “six star” rule for monthly series. Similar research might be useful for AR(10) estimators with quarterly series.
I also found the diagnostics were fairly stable between various ARIMA models. However, there can be a lot of disagreement in the diagnostics when comparing runs using an inappropriate model. The practice of using M7 and the D8 Ftest to look for residual seasonality in seasonally adjusted series is very much dependent on the model choice, and the practice should be discouraged.
Acknowledgements
This author wishes to thank the following people for their invaluable assistance on this project: Roxanne Feldpausch of the U.S. Census Bureau for her programming assistance and for providing updated copies of X12Data, Kathy McDonaldJohnson of the U.S. Census Bureau for her use of the program X12Rvw and in programming assistance in working with diagnostic files and Excel, Brian Monsell of the U.S. Census Bureau for providing the necessary modifications to X12ARIMA for testing the various spectrum options, David Findley of the U.S. Census Bureau for his advice and support, and Leo Harvill formerly of the Quillen College of Medicine for his support on this research project and other research through the years. Thank you.
Gómez, V. and A. Maravall (1997), Program TRAMO and SEATS: Instructions for the User, Beta Version, Banco de España.
Feldpausch, R. (2003), “X12Data: A Program to Convert Excel Spreadsheet Data in a Format X12 Can Handle,” Working paper, U.S. Census Bureau.
Findley, D.F. and C.C.H. Hood (2000), “X12ARIMA and Its
Application to Some Italian Indicator Series,” Seasonal Adjustment Procedures – Experiences and Perspectives,
Istituto Nazionale di Statistica,
Hood, C.C.H., J.D. Ashley, and D.F. Findley (2000), “An
Empirical Evaluation of the Performance of TRAMO/SEATS on Simulated Series,” in
Proceedings of the American Statistical
Association, Business and Economic Statistics Section, pp. 171176.
Lothian, J. and M. Morry (1978), "A Test of Quality Control Statistics for the X11ARIMA Seasonal Adjustment Program," Research Paper, Seasonal Adjustment and Time Series Staff, Statistics Canada.
Maravall, A. (2004), “An Application of the TRAMOSEATS Automatic procedure: Direct versus Indirect Adjustment”, Banco de España working paper.
Soukup, R.J. and D.F. Findley (1999), “On the Spectrum
Diagnostics Used by X12ARIMA to Indicate the Presence of Trading Day Effects
after Modeling or Adjustment,” Proceedings
of the Business and Economics Section,