TIME SERIES FORECASTING BY USING A NEURAL ARIMA MODEL BASED ON WAVELET DECOMPOSITION

In the prediction of (stochastic) time series, it has been common to suppose that an individual predictive method – for instance, an Auto-Regressive Integrated Moving Average (ARIMA) model – produces residuals like a white noise process. However, mainly due to the structures of auto-dependence not mapped by a given individual predictive method, this assumption might be easily violated, in practice, as pointed out by Firmino et al. (2015). In order to correct it (and accordingly to produce forecasts with more accuracy power), this paper puts forward a Wavelet Hybrid Forecaster (WHF) that integrates the following numerical techniques: wavelet decomposition; ARIMA models; Artificial Neural Networks (ANNs); and linear combination of forecasts. Basically, the proposed WHF can map simultaneously linear – by means of a linear combination of ARIMA forecasts – and non-linear – through a linear combination of ANN forecasts – auto-dependence structures exhibited by a given time series. Differently of other hybrid methodologies existing in literature, the WHF forecasts are produced carrying into account implicitly the information from the


INTRODUCTION
Over the years, several forecasting methods have been proposed with the aim of producing increasingly accurate predictions of (stochastic) time series. In general, they could be split into two great exclusive categories: the individual (or single) predictive methods, such as the well-known Auto-Regressive Integrated Moving Average (ARIMA) models, as in Hamilton (1994); and the combination of individual predictive methods, proposed initially by Bates and Granger (1969). Indeed, the collection of single predictive methods might yet be regrouped into two exclusive classes: the statistical one (here it lies, for instance, the (linear) ARIMA models), and the machine learning one (here it lies the (non-linear) Artificial Neural Networks (ANNs), as in HAYKIN, 2001). By hybrid forecasting methods, it means those ones that always carry out the modelling of a given time series, denoted by , according to the following three steps: in Step 1, a single forecasting method from statistical class/machine learning class is applied to for producing its forecasts as well as its residuals; in Step 2, the forecasting errors generated in Step 1 are predicted by using an individual forecaster from machine learning class/statistical class; and, in step 3, the forecasts provided in Step 1 are "corrected" by the predictions of the residuals produced in Step 2 such that to generate the hybrid forecasts of the underlying time series . In effect, a hybrid predictive method can be referred to as particular case of combined forecasters.
In the process of prediction of time series, it has often been to assume that a single predictive method produces residuals like a white noise process (i.e., unpredictable). However, mainly due to the linear or non-linear structures of autodependence not captured by a single predictive method adopted by the decision http://www.ijmp.jor.br v. 7, n. 1, January -March 2016ISSN: 2236 maker, this supposition may be trivially broken, most applications in real world. For instance, the (linear) ARIMA models are able to statistically guarantee, based on a significance level , only uncorrelated residuals (but not statistical independence, as it is often assumed); this because, from mathematical point of view, those ones may be visualized as linear filters, as pointed out by Hamilton (1994). In turn, Zhang (2003) shows in his numerical experiments -on which three very popular time series were forecasted by employing an hybrid forecaster -that the residuals produced by the ARIMA models could be efficiently predicted by using non-linear forecasters (namely, the Multi-Layer Perceptron ANNs (MLP-ANNs), as in HAYKIN, 2001)). In his research, it can be verified that the ARIMA model forecasts were in fact remarkably improved when received the sum of the prediction of their respective residuals.

INDEPENDENT JOURNAL OF MANAGEMENT & PRODUCTION (IJM&P)
Finally, in the numerical experiments in Firmino et al. (2015), each time series was modelled by means of several plausible ARIMA models whose forecasts were added by the predictions of their residuals produced by different ANNs; the results achieved exhibit a remarkable accuracy gain in the hybrid forecasts when compared with other traditional methods, in all adherence statistics.
In turn, wavelet decompositions of levels r have shown to be very efficient in dealing with time series forecasting due to its multi-scaling property, as highlighted in  (2011)), recognized as the "wavelet subspaces", of the space (defined in Section 2.1). In this way, accuracy gains may be achieved when http://www.ijmp.jor.br v. 7, n. 1, January -March 2016ISSN: 2236 the wavelet methods (as the wavelet decomposition) are adopted in the forecasting process.

INDEPENDENT JOURNAL OF MANAGEMENT & PRODUCTION (IJM&P)
Thus, the Wavelet Hybrid Methodology (WHF) proposed here aims to produce hybrid forecasts aggregating different information from different sources (i.e., numerical methods) with higher level of accuracy than other methodologies presenting in literature. The WHF integrates the following approaches: wavelet decomposition of level r, in Section 2.1; ARIMA models, in Section 2.2; ANN methods, in Section 2.3. Section 3 describes detail all steps to carried out the WHF (in particular, presents the Linear Combination (LC) of forecasts proposed here). In general lines, the proposed WHF can capture, at the same time, linear (by means of a LC of ARIMA models) and non-linear (by using a LC of ANNs) auto-dependence structures exhibited by a given time series to be forecasted, as well as its information on spectral frequency. In a different way of other hybrid predictive methods in time series literature, the WHF hybrid forecasts implicitly adheres spectral information through the WCs and perform an alternative linear combination of forecasts, as it will be seen. In order to illustrate the proposed methodology, the Canadian lynx and the British pound/US dollar exchange rate time series are modelled and hence predicted, following Zhang (2003).
The current paper is split into 5 sections. Beyond the introduction (Section 1), in Section 2, a review of the used methodologies is presented; Section 3 describes the steps of the proposed WHF; in Section 4, the main numerical results are delayed and comments upon; and, finally, in Section 5, the paper is closed.

REVIEW OF LITERATURE
The purpose of this section is to present a brief review of some basic concepts which are needed for defining the WHF method, described in Section 3. It starts, in Section 2.1, by describing the wavelet decomposition of level r, which is the algorithm adopted in initial step of the ARIMA and ANN methods. This is followed by the basic concepts on ARIMA and ANN models, in Sections 2.2 and 2.3, respectively.

Wavelet decomposition of level r
Based on Kubrusly andLevan (2006), andTeixeira Jr et al.(2015), if a subset , wherein takes a fixed integer value, is in fact an  (2) where is the level parameter (which is commonly assumed to be equal to r);

INDEPENDENT JOURNAL OF MANAGEMENT & PRODUCTION (IJM&P)
and (that consist, respectively, of the approximations of the WCs and from Equation (1), wherein is such that ); is a parameter that takes an
Importantly, if T is not an integer power of 2, it is often to fill in the underlying time series with zeros until its length T is increased up to the next integer power of 2.
This procedure may be always carried out due to the fact that the zeros added up do not affect the calculation of the WCs and in (2), what means the previously auto-correlation exhibited by , as well its WCs, is preserved (HAVEN; LIU; SHEN, 2012).

ARIMA models
Let be a time series exhibiting structure of auto-correlation.
According to Liu (2006), is an ARIMA (p,d,q) process, if only if, it can be represented as in Equation (3): (3) Where: B is a backward operator, defined by , wherein k runs over in the set ; is the difference operator, with d representing its order; and are the lists of model complex parameters, with and , where they must satisfy both the invertibility and the stationarity conditions (as in HAMILTON, 1994); is the innovation consisting of a state of the random variable of a white noise stochastic process, denoted by , with zero mean and null auto-covariance; p and q are, respectively, the orders of the auto-regressive part, denoted by , and of the moving average part, represented by part .
According to Liu (2006), in order to obtain the best possible ARIMA model, three basic steps should be carried out: (i) test the plausible values for the parameters p, d, and q, in Equation (3), which can be obtained through the profile

INDEPENDENT JOURNAL OF MANAGEMENT & PRODUCTION (IJM&P)
http://www.ijmp.jor.br v. 7, n. 1, January -March 2016 ISSN: 2236-269X DOI: 10.14807/ijmp.v7i1.324 analysis of the plots of simple and partial auto-correlation functions of the residuals (as in HAMILTON, 1994); (ii) define the method to be used to estimate the ARIMA parameters (the most common is the Maximum Likelihood Estimation method (as in Liu (2006)); and, (iii) make a diagnostic check to choose the most parsimonious and adequate model to be used for generating both the in-sample and the out-of-sample forecasts of .

Artificial Neural Network
Artificial Neural Networks (ANNs) are well-known to be flexible computing frameworks for modeling and forecasting a broad range of stochastic time series exhibiting either linear or non-linear auto-dependence structures. Contrary to many linear statistical forecasting models, stationarity is not required by ANN methods (see e.g. TEIXEIRA JR. et al., 2015).
Another important aspect of ANNs is that they are universal approximates of compact (i.e., closed and bounded) support functions, as showed in Khashei and Bijari (2011). In effect, since observations from a time series that exhibit dependency on past values may be seen as points of the domain of an unknown compact support function, it follows that the ANNs are capable of approximating them (for modeling or forecasting) with a high degree of accuracy.
According to Zhang (2003), the predictive power of ANNs comes from the parallel processing of the information exhibited by the data. In addition, AAN models are largely determined by the stochastic characteristics inherent in the time series.
In this context, the feedforward multilayer perceptron ANNs (see e.g. HAYKIN, 2001) are the most widely used neural prediction models for time series forecasting.
Particularly, an artificial network composed by three layers (namely, input, hidden an output layers) of simple processing units numerically connected by acyclic links. The relationship between the output and the L-lagged inputs, represented by , has the following Equation (4): (4)

INDEPENDENT JOURNAL OF MANAGEMENT & PRODUCTION (IJM&P)
http://www.ijmp.jor.br v. 7, n. 1, January -March 2016 ISSN: 2236-269X DOI: 10.14807/ijmp.v7i1.324 where and are the ANN parameters, called connection weights; is the number of input nodes; is the number of hidden nodes; is the approximation error at time t; and is the transfer function, here, a logistic function -although it is possible to adopt other functions (see e.g HAYKIN, 2001). The logistic function is widely employed as the hidden layer transfer function in neural network forecasting and is mathematically defined by Equation (5): where and is the exponential function with Euler's basis (as in Haykin, 2001). Due to being a non-linear transfer function, the ANN model in (5), in fact performs a non-linear mapping of the past observations to produce a forecast for .

PROPOSED METHODOLOGY
Let be a time series for which k steps-ahead forecasts are required in a forecasting horizon h. For this purpose, the WHF proposed here is carried out accordingly by the following six steps.
Step 1: A wavelet decomposition of level r of is performed, producing its r+1 WCs -namely, one WC of approximation at level , denoted by , and r WCs of detail at levels from to , denoted, respectively, by , , …., .
Step 2: The r+1 WCs generated in Step 1 are individually modelled through ARIMA model in order to yield the following lists of in-sample and out-of-sample forecasts: , and , …, , where h represents the forecasting horizon, and , the value of degrees of freedom lost in the ARIMA modelling. It is important to point out that and , for
Step 3: The forecasts of the r+1 WCs from Step 2 are linearly combined by means of the LWC, defined in Equation (6).

(6)
Where is the combined linear forecast of ; and , , …, are the optimal LWC parameters, which are obtained by solving the NLP (RAGSDALE, 2004) described below.

Objective: minimize MSE. Subject to:
Where, is the list of (in-sample) forecasting errors; and and are the decision variables (to be optimized and substituted in Equation (6)).
Step 4: The list is decomposed by a wavelet decomposition of level k, providing its k+1 WCs -i.e., one WC of approximation at level , denoted by , and k WCs of detail at levels from to , denoted, respectively, by , .
Step 5: Each k+1 WC produced in Step 4 is used as input patterns in the ANN in order to generate the combination of the following list of in-sample and out-of-sample forecasts , where , is the degrees of freedom lost until here.
Following the above seven steps of the WHF, it is worth noting some important aspects. Firstly, Steps 1 and 4 are aimed to obtain a finite number of temporal subseries (WCs) with better stochastic pattern than the underlying time series. It is possible due to the fact each WC has a stationary spectral frequency, as noted in Mallat (2009); while can be seen as the result of the sum of spectral components with different frequencies, as pointed by Levan and Kubrusly (2003).

Secondly, in
Step 2, the goal is to identify a plausible linear auto-dependence structure (i.e., a linear forecaster), the ARIMA models were chosen here, as they are widely adopted for this purpose, as highlighted by Hamilton (1994). Thirdly, in Step 3, after identifying a plausible linear structure for each WC, their predictions are linearly combined by means of the WLC and then generated the combined linear forecasts, which decode linear information exhibited by .
Consequently, the forecast is in fact a forecast of and allows for being interpreted as an aggregator of distinct linear information from different linear sources (i.e., the r+1 different linear ARIMA models).

Fourthly, in
Step 4, each WC produced here is endowed with noise (like a white noise) and non-linear auto-dependence structure of , as the forecasting errors are, following (HAMILTON, 1994), outcomes from a linear filter (in this case, the WLC of ARIMA models, in Step 3).
The WLC of ARIMA models is not able to map properly non-linear autodependence structures. In effect, it is easily noticed that each WCs of the residuals needed to be modelled by a non-linear forecaster. Unlikely, important information could be lost in the predictive process. Thus, in Step 5, each WC was combined by a http://www.ijmp.jor.br v. 7, n. 1, January -March 2016ISSN: 2236 (non-linear) ANN model. The software library used here allows for the parameters may be set by the decision maker.

INDEPENDENT JOURNAL OF MANAGEMENT & PRODUCTION (IJM&P)
Fifthly, in Step 6, the forecast produced in Step 5 generate the list of in-sample and out-of-sample combined forecasts that provide information on the non-linear auto-dependence structure of . Similarly to the combined linear forecast , can be interpreted as an aggregator of nonlinear information from different nonlinear sources (namely, the r'+1 ANN models).
Sixthly, the hybrid forecasts , endowed with both linear and non-linear information, can be seen as a version of filtered by both a linear filter (namely, the WLC of ARIMA models, in Step 3) and a non-linear filter (i.e., the WLC of ANNs, in Step 5). Accordingly, the list of forecasting errors , note that , can be in fact classified as a noise process.

NUMERICAL RESULTS
For evaluating the proposed hybrid methodology, an annual time series of sunspot concerning the period from 1700 to 1987 (resulting 288 points), plotted in In this paper, a comparison between the actual values and their forecasts is performed accordingly for a forecasting horizon out-of-sample, which is used for all approaches considered here. The E-Views 8 software was used for the ARIMA modelling; while the MLP ANN modelling was performed in MATLAB R2013a software.
In turn, the adjustment of LWC parameters in Steps 3 was done through the solver package of Excel 2013. Finally, the wavelet decompositions were

INDEPENDENT JOURNAL OF MANAGEMENT & PRODUCTION (IJM&P)
(a) WC of approximation at level 2, .
(b) WC of detail at level 2, .
(c) WC of detail at level 3, . Figure 4: WCs of the training sample of the Sunspot annual time series. http://www.ijmp.jor.br v. 7, n. 1, January -March 2016ISSN: 2236 Regarding Step 2, it has: a) An ARIMA(28,1,32) model, with logarithm transformation, for predicting the WC of approximation at level 2. Let be, for each instant t, the first difference of (which is the WC of approximation at level 2 of the state ). Thereby, the mentioned predictor is algebraically given by Equation (7) Assume that = , for each instant t. So, the mentioned model is (algebraically) defined by Equation (9):

INDEPENDENT JOURNAL OF MANAGEMENT & PRODUCTION (IJM&P)
The method used here for estimating the complex parameters in items (a), (b) and (c) above was the Likelihood approach, as in Hamilton (1994).
The optimal adaptive parameters from Step 3 are equal to , , .

Following, in
Step 4, the predicting errors , being , produced in Step 3, was decomposed by means of a wavelet decomposition of level 2, with orthonormal basis of Haar (see MALLAT, 2009)). http://www.ijmp.jor.br v. 7, n. 1, January -March 2016ISSN: 2236 The best configurations of the ANN required in Step 5 to model the three WCs of the forecasting errors , has five data of each decomposition in input layer and seven neurons in hidden layer. Finally, in Step 6, the forecasts produced in Steps 3 and 5 are simply summed, generating the hybrid forecasts   The models and parameters to the exchange rate time series are given in the

INDEPENDENT JOURNAL OF MANAGEMENT & PRODUCTION (IJM&P)
http://www.ijmp.jor.br v. 7, n. 1, January -March 2016 ISSN: 2236-269X DOI: 10.14807/ijmp.v7i1.324   The necessary information to application of the proposed method for Canadian lynx time series is shown in Table 4. The Canadian lynx error was decomposed by means of a wavelet decomposition of level 2, with Haar orthonormal basis. The ANN input patterns has five observation of each decomposition and the hidden layer has five neurons. The results can be seen in Figure 7 and the adherence statistics in Table 5 INDEPENDENT   Table 1, 3 and 5 clearly proved that the proposed WHF method achieved remarkably better results than any of other predictive methods cited in this paper, on out-of-sample performance measures.

Comparisons in
According to Figure 5, 6 and 7, the observed values and the predictions produced by the proposed method over the out-of-sample period are strongly correlated, meaning that a high predictive power was achieved in the Sunspot, Exchange rate and Canadian lynx data application.
It is also worth pointing out that despite the relative complexity of the mathematical techniques that integrate the proposed methodology, described in Section 2, its implementation is indeed relatively straightforward with use of appropriate software such as E-Views 8 and MATLAB R2013a software.