Anselmo Chaves Neto
Universidade Federal do Paraná, Brazil
E-mail: anselmo@ufpr.br
Thais Mariane Biembengut Faria
Universidade Regional de Blumenau, Brazil
E-mail: thaismarianeb@gmail.com
Submission: 14/06/2014
Revision: 25/06/2014
Accept: 13/07/2014
ABSTRACT
The identification of the order p,q, of ARMA models is
a critical step in time-series modelling. In the classic Box and Jenkins method
of identification, the autocorrelation function (ACF) and the partial autocorrelation
(PACF) function should be estimated, but the classical expressions used to
measure the variability of the respective estimators are obtained on the basis
of asymptotic results. In addition, when having sets of few observations, the
traditional confidence intervals to test the null hypotheses display low
performance. The bootstrap method may be an alternative for
identifying the order of ARMA models, since it allows to obtain an
approximation of the distribution of the statistics involved in this step.
Therefore it is possible to obtain more accurate confidence intervals than
those obtained by the classical method of identification. In this paper we
propose a bootstrap procedure to identify the order of ARMA models. The
algorithm was tested on simulated time series from models of structures AR(1),
AR(2), AR(3), MA(1), MA(2), MA(3), ARMA(1,1) and ARMA (2,2). This way we
determined the sampling distributions of ACF and PACF, free from the Gaussian
assumption. The examples show that the bootstrap has good performance in
samples of all sizes and that it is superior to the asymptotic method for small
samples.
Let the
following be a stationary stochastic process in which
|
|
(1) |
The associated series
|
|
(2) |
where
The
ARMA(p,q) models are a class of ARIMA(p,d,q) that describe univariate,
stationary and unseasonal time series. These models are used in hydrology,
econometrics, and other fields. ARMA models can be used to predict behavior of
a time series and are widely used for prediction of economic and industrial
time series.
The
popular ARIMA method was introduced by Box and Jenkins (1976), and the
technique consists of an iterative cycle of three steps: identification, model
fit and suitability tests.
The
identification of the order p, q of the model is a sophisticated procedure that
requires a lot of data, and reasonable experience from the analyst. In this
step we compare the sample correlograms with the theoretical of various
structures, looking for desirable properties which identify a possible model
for the time series. This way, the estimated autocorrelation (ACF) and the partial
autocorrelation (PACF) functions should be estimated, but the classical
expressions used to measure the variability of the respective estimators are
obtained on the basis of asymptotic results. In addition, when having sets of
few observations, the traditional confidence intervals to test the null
hypotheses display low performance.
Another
problem is the difficulty in recognizing patterns in the ACF and PACF using the
Box and Jenkins method, so several alternative methods have been proposed in
the literature over the past decades.
Choi (1992)
evaluated and compared different procedures for the identification of models
such as the Corner method, the methods of extended sample autocorrelation
function (ESACF) and canonical correlation (SCAN). The main feature of these
identification methods is to point out a set of candidate models for a
posterior careful analysis. A major problem resides in the fact that the distribution
of the statistics involved in the identification of the order of the model is
rarely known, and in some procedures, the asymptotic variance is estimated by
the Bartlett’s formula based on the Gaussian assumption.
Recent
studies have employed neural networks and genetic algorithms as alternatives to
identify models which are free of assumptions about the nature of the
distribution of the involved statistics. Minerva (2001) and Ong (2005) have proposed
genetic algorithms for the identification of ARMA models. Rolf et al. (1997) have
used evolutionary algorithms to identify and estimate the parameters of the
model. Machado et al. (2012) have compared an algorithm of neuro-fuzzy back
propagation with automatic procedures for identifying Box and Jenkins models.
The
bootstrap method may be an alternative for identifying the order of ARMA
models, since it allows to obtain an approximation of the distribution of the
statistics involved in this step. Therefore it is possible to obtain more
accurate confidence intervals than those obtained by the classical method of
identification.
In
the last decades several studies have applied the bootstrap method in time series
with the objective of assessing the variability in the statistics needed, to
fit ARMA(p,q) models and also to build prediction intervals (SAAVEDRA; CAO,
1999; CAVALIERE; TAYLOR, 2008; SENSIER; DIJK, 2004; COSKUN; CEYHAN, 2013).
Although
the bootstrap method is well known, few studies have applied the method to
identify the order of ARMA(p,q) models. Paparoditis (1992) has studied the identification
of models by considering the vector of autocorrelation, and by applying the
bootstrap in the evaluation of the sampling distributions of the correspondent
involved statistics. Chaves Neto (1991) has identified the parameter space of
ARMA models with low order, where the classical method has poor performance,
and has proposed the bootstrap as an alternative to identify these models.
In this
work a moving blocks bootstrap algorithm was applied to obtain information
about the distribution of the statistics ACF and PACF involved in
identification of ARMA models. Therefore, confidence intervals free of the
Gaussian assumption were constructed, classically imposed to obtain the
variability of the referenced statistics.
A
simulation study evaluated the performance of the proposed algorithm to
identify the structure, comparing it with the classical Box and Jenkins method.
In
the classic procedure for the identification of the order of ARMA(p,q) models proposed
by Box and Jenkins (1994), the autocorrelation and partial autocorrelation functions
based on the time series are estimated.
The
autocorrelation function (ACF) for lag k is defined by
|
|
(3) |
where
In the
identification procedure, we compare the sample correlogram of ACF and PACF
with theoretical correlograms of various structures, looking for desirable
properties that identify a possible model for the series (MORETTIN, 2006).
In
addition to the difficulty in recognizing patterns in the sample correlograms,
another problem of this procedure is to verify, by means of a hypothesis test, whether
the sample ACF or PACF is zero beyond a certain lag k. Probability distributions of the statistics
Under
the assumption that the estimated parameter
The
asymptotic variance can be calculated by Bartlett's formula
|
|
(4) |
for the case of zero theoretical
correlations
|
|
(5) |
and if the size of the series is large, it is assumed that the
|
|
(6) |
we employ confidence intervals at the level
|
|
(7) |
|
|
(8) |
built based on the classical asymptotic results with the objective of
verifying whether the ACF and PACF are zero from a certain lag k.
The
bootstrap method introduced in Efron’s work (1979) is based on the construction
of sample distributions by resampling a single existing sample. As it is well
known, the technique consists in replacing the unknown distribution of the data
F of the original sample data for an F' estimator, in general the empirical
distribution function
Suppose
a random sample
|
|
(9) |
To apply the bootstrap
in time series, it is necessary to have an algorithm that preserves the
correlation structure of the series, such as moving blocks (EFRON, 1979). With
this technique the observations of the time series are grouped into blocks of
length l. The bootstrap samples are obtained by resampling with replacement of
these blocks, forming samples of the same size of the original series. The
algorithm described below, based on moving blocks, has been tested here to get
the sampling bootstrap distributions
In
the historical data series
This
way, we have the I-th bootstrap replication of the sample pairs
|
|
(10) |
The estimates
The
The
|
|
(11) |
|||
|
|
(12) |
|||
|
|
|
By means of the
|
|
(13) |
with
|
|
(14) |
for
In
order to evaluate the performance of the bootstrap procedure, by comparing it
with the asymptotic method, we simulated time series from ARMA models. The
residues of synthetic series are Gaussian with variance
Series
were simulated departing from each 15 model structures AR (1), MA (1), AR (2),
MA (2), AR (3), MA (3), ARMA(1,1) and
ARMA (2,2), some with parameters chosen so that
Consider
the model of the structure MA(2),
The
asymptotic estimates are calculated through the expressions of Bartlett and
Quenoulli. The bootstrap algorithm was applied with B = 1000 for each Monte
Carlo repetition. Tables 1 and 2 show the average values of the estimated
standard deviations for lags k = 1,2,3,4.
Table 1:
Estimates of the standard deviation of the ACF and PACF for the model
|
n = 30 |
n = 50 |
n = 100 |
||||||
exact |
asymptotic |
bootstrap |
exact |
asymptotic |
bootstrap |
exact |
asymptotic |
bootstrap |
|
|
0.129 |
0.182 |
0.168 |
0.094 |
0.141 |
0.138 |
0.072 |
0.100 |
0.098 |
|
0.129 |
0.182 |
0.168 |
0.094 |
0.141 |
0.138 |
0.072 |
0.100 |
0.098 |
|
0.139 |
0.187 |
0.172 |
0.122 |
0.144 |
0.138 |
0.072 |
0.101 |
0.101 |
|
0.139 |
0.182 |
0.178 |
0.114 |
0.141 |
0.141 |
0.069 |
0.100 |
0.103 |
|
0.193 |
0.210 |
0.157 |
0.144 |
0.163 |
0.126 |
0.118 |
0.116 |
0.096 |
|
0.149 |
0.182 |
0.169 |
0.115 |
0.141 |
0.138 |
0.091 |
0.100 |
0.102 |
|
0.186 |
0.216 |
0.156 |
0.158 |
0.166 |
0.127 |
0.107 |
0.117 |
0.094 |
|
0.159 |
0.182 |
0.170 |
0.129 |
0.141 |
0.133 |
0.093 |
0.100 |
0.103 |
Table 2: Estimates of the standard
deviation of the ACF and PACF for the model
|
n = 30 |
n = 50 |
n = 100 |
||||||
exact |
asymptotic |
bootstrap |
exact |
asymptotic |
bootstrap |
exact |
asymptotic |
bootstrap |
|
|
0.164 |
0.183 |
0.174 |
0.137 |
0.141 |
0.142 |
0.103 |
0.100 |
0.105 |
|
0.164 |
0.183 |
0.174 |
0.137 |
0.141 |
0.142 |
0.097 |
0.100 |
0.096 |
|
0.173 |
0.208 |
0.155 |
0.148 |
0.166 |
0.127 |
0.107 |
0.120 |
0.096 |
|
0.164 |
0.183 |
0.189 |
0.132 |
0.141 |
0.149 |
0.089 |
0.100 |
0.106 |
|
0.183 |
0.221 |
0.155 |
0.159 |
0.173 |
0.129 |
0.123 |
0.122 |
0.099 |
|
0.158 |
0.183 |
0.164 |
0.125 |
0.141 |
0.128 |
0.095 |
0.100 |
0.103 |
|
0.174 |
0.227 |
0.156 |
0.146 |
0.181 |
0.132 |
0.109 |
0.128 |
0.100 |
|
0.154 |
0.183 |
0.160 |
0.129 |
0.141 |
0.126 |
0.096 |
0.100 |
0.107 |
In both experiments we observed
that the bootstrap estimates display good behavior in comparison with the
asymptotic estimates, especially in samples of size n = 30 and n = 50. In this case, in
so far the lag k increases, the asymptotic estimates become more biased than
the bootstrap estimates.
Consider
the estimation of percentiles of the distribution of the autocorrelation function.
In Figure 1, the dotted line represents the percentiles 5% and 95% of the exact
distribution of
We
observe that the bootstrap estimates reflect more adequately the sampling
distribution of the partial autocorrelation function of the asymptotic method.
Particularly in cases where the distribution is not symmetric, the bootstrap
provides more accurate estimates.
Figure 1: 5% and 95% of
the exact, bootstrap and asymptotic distributions of
The
assumptions set out in (6) for each Monte Carlo repetition were also tested,
and this way we could evaluate the coverage probability of the null parameter
by the asymptotic intervals (7) and (8). The percentile bootstrap confidence
interval (13) and bias corrected percentile interval (14) were constructed to
test the equivalent null hypotheses,
The
hypotheses were tested for the first 4 lags of the ACF and PACF for each of the
referred structures. The confidence level of all intervals is 95%. Table (3) displays
the probability of coverage of the confidence intervals, for the model AR(3)
Table 3:
Probability coverage of zero by asymptotic (A) percentile (B) and bias
corrected bootstrap (BC) intervals for the model
|
n = 30 |
n = 50 |
n = 100 |
||||||
C |
B |
BC |
C |
B |
BC |
C |
B |
BC |
|
|
0.37 |
0.38 |
0.35 |
0.17 |
0.27 |
0.16 |
0.00 |
0.00 |
0.00 |
|
0.37 |
0.38 |
0.35 |
0.17 |
0.27 |
0.16 |
0.00 |
0.00 |
0.00 |
|
0.81 |
0.48 |
0.47 |
0.65 |
0.50 |
0.47 |
0.77 |
0.64 |
0.60 |
|
0.56 |
0.42 |
0.39 |
0.38 |
0.27 |
0.19 |
0.22 |
0.25 |
0.30 |
|
0.81 |
0.63 |
0.60 |
0.76 |
0.68 |
0.54 |
0.31 |
0.37 |
0.20 |
|
0.47 |
0.50 |
0.32 |
0.15 |
0.40 |
0.17 |
0.00 |
0.14 |
0.03 |
|
0.84 |
0.46 |
0.42 |
0.50 |
0.39 |
0.25 |
0.08 |
0.06 |
0.05 |
|
0.92 |
0.93 |
0.95 |
0.91 |
0.98 |
1.00 |
0.86 |
0.99 |
1.00 |
The results presented
in Table 3 reveal that in samples of size n = 30 and n = 50 the bootstrap intervals, especially
BC, have higher empirical power to reject the null hypothesis, i.e., they
better estimate parameters that are not null.
When
the series are simulated departing from AR(3) models, we expected
A
major problem relies on the identification of the order of the model departing
from the simulated series with parameters chosen so that
This
way, the classical technique considers the process as white noise, instead of
identifying a model with low values for ACF and PACF. In these cases the
bootstrap performance is also superior, especially in samples of size n = 30 n
= 50 because the probability coverage of zero is less in both the analyzed intervals.
In
this paper we propose a bootstrap procedure to identify the order of ARMA
models. The algorithm was tested on simulated time series from models of
structures AR(1), AR(2), AR(3), MA(1), MA(2), MA(3), ARMA(1,1) and ARMA (2,2).
This way we determined the sampling distributions of the autocorrelation and
partial autocorrelation functions, classically used in the identification of this
type of structure, free from the Gaussian assumption. The examples show that the
bootstrap has good performance in samples of all sizes and that it is superior
to the asymptotic method for small samples. The bootstrap estimates are more
accurate, i.e., they display less variability than the asymptotic estimates.
In the
identification of models with low values for the ACF and PACF, the classic
method is ineffective for samples of any size, by considering the process as
white noise. The bootstrap can be an alternative to this type of structure,
because the confidence intervals have a lower probability coverage of the null
parameter, i.e., they have more power to reject the null hypotheses.
These
results were repeated in the simulated series from all the models studied and
the repetition of results may be justified by the bootstrap distribution of ACF
and PACF. The comparison among the percentiles of the exact, asymptotic and
bootstrap distribution shows that the bootstrap reproduces more satisfactorily
the true distribution of the autocorrelation and partial autocorrelation
functions.
As a
suggestion for further research, it would be interesting to apply the technique
presented here to real time series data with the objective of identifying ARMA(p,
q) models, which after adjusted, could be employed, for instance, for
predicting the behavior of economic or industrial series.
ANDERSON,
R. (1942) Distribuition of serial correlation coefficient. Annals of Mathematics Statistics, n. 13, p. 1-13
BARTLETT,
M. S. (1946) On the theoretical specification and sampling properties of
autocorrelated time-series. Journal of
the Royal Statistical Society, v.
8, n. 27, p. 27-41.
BOX,
G. E. P.; JENKINS, G. (1976) Time Series
Analysis Forecasting and Control. Holen Day: New Jersey.
BOX,
G. E. P.; JENKINS, G.; REINSEL, G. C. (1994) Time Series Analysis. Prentice Hall: New Jersey.
CAVALIERE,
G.; TAYLOR, R. (2008) Bootstrap unit root tests for time series with nonstationary
volatility. Econometric Theory, n. 24,
p. 43-71.
CHOI,
B. S. (1992) Identification of ARMA
Models. Springer: New York.
COSKUN,
A.; CEYHAN, E.; INAL, T. C.; SERTESER, M; UNSAL, I. (2013) The comparison of
parametric and nonparametric bootstrap methods for reference interval
computation in small sample size groups. Accred
Qual Assur, n. 18, p. 51-60.
EFRON,
B. (1979) Bootstrap methods: another look at jackknife. Annals of Statistics v. 7,
n. 1, p. 1-26.
EFRON,
B. (1986) Bootstrap methods for standard errors confidence intervals and other
measures of statistics accuracy. Statistical
Science, v. 1, n. 1, p. 54-77.
MACHADO,
M. A. S; SOUZA, R. C. (2012) Box & Jenkins model identification: A
comparison
of methodologies. Independent Journal of
Management & Production, v. 3,
n. 2, p. 54-61.
MINERVA,
T.; POLI, I. (2001) ARMA models with genetic algorithms, in: Applications of Evolutionary Computing.
Springer: New York, p. 335-342.
MORETTIN, P.; TOLOI, C. M. C.(2006) Análise de séries temporais. Blucher: São Paulo.
MULLER,
U.; SCHICK, A.; WEFELMEYER, W.(2005) Weighted residual-based density estimators
for nonlinear autoregressive models. Statistc Sinica, n. 15, p. 177-195.
NETO CHAVES, A.(1991) Bootstrap em séries temporais. Thesi (PhD in Eletric Engineering),
PUC: Rio de Janeiro.
ONG,
C.S.; HUANG, J.J.; TZENG, G.H. (2005) Model identification of arima family using
genetic algorithms. Applied Mathematics
and Computation, v. 164, n. 3, p.
885-912.
PAPARODITIS,
E.; STREITBERG, B. (1992) Order identification statistics in stationary autoregressive
moving-average models: vector autocorrelations and the bootstrap. Journal of Time Series Analysis, v. 13, n. 5, p. 415-434.
QUENOUILLE,
M. H. (1949) Approximate tests of correlation in time-series. Journal of Statistical Computation and
Simulation, n. 8, p. 75-80.
ROLF,
S.; SPRAVE, J. (1997) Model identification and parameter estimation of arma
models by means evolutionary algorithms. Computational
Intelligence for Financial Engineering (CIFEr), v. 1, n. 997, p. 237-243.
SAAVEDRA,
A.; CAO, R. (1999) Rate of convergence of a convolution-type estimator of the
marginal density of a ma(1) process. Stochastic Process,
n. 80, p. 129-155.
SENSIER,
M.; VAN DIJK, D. (2004) Testing for volatility changes in u.s. macroeconomic time
series. Review of Economics and
Statistics, n. 86, p. 833–839.
SILVA, D. (1995) O método bootstrap e aplicações a regressão múltipla. Dissertation
(Master in Statistics), Unicamp: Campinas.