ESTIMATES BY BOOTSTRAP INTERVAL FOR TIME SERIES FORECASTS OBTAINED BY THETA MODEL

 

Daniel Steffen

Community University of Chapeco Region, Brazil

E-mail: daniel_steffen@unochapeco.edu.br

 

                                               Anselmo Chaves Neto

Federal University of Paraná, Brazil

E-mail: anselmo@ufpr.br

 

Submission: 04/05/2016

Revision: 20/05/2016

Accept: 23/08/2016

 

ABSTRACT

In this work, are developed an experimental computer program in Matlab language version 7.1 from the univariate method for time series forecasting called Theta, and implementation of resampling technique known as computer intensive "bootstrap" to estimate the prediction for the point forecast obtained by this method by confidence interval. To solve this problem built up an algorithm that uses Monte Carlo simulation to obtain the interval estimation for forecasts. The Theta model presented in this work was very efficient in M3 Makridakis competition, where tested 3003 series. It is based on the concept of modifying the local curvature of the time series obtained by a coefficient theta (Θ). In its simplest approach the time series is decomposed into two lines theta representing terms of long term and short term. The prediction is made by combining the forecast obtained by fitting lines obtained with the theta decomposition. The results of Mape's error obtained for the estimates confirm the favorable results to the method of M3 competition being a good alternative for time series forecast.

 Keywords: Forecasting; Time Series; Bootstrap; Theta Model

1.     INTRODUCTION

            The forecasting models are of great importance in the academic and social media because of its wide applicability in various areas of scientific, industrial, commercial and services. These predictions, which are made by such companies to estimate the demand for its products, and consequently, plan the production schedule, shopping and other activities including identifying when and where to focus marketing efforts (LIEBEL, 2004).

            An interesting illustration is the hydroelectric construction plant in which knowledge of the time series of river flow that supply the dam is essential to the development of the project. Currently, industries need to plan in detail the production and stocks kept at the disposal of operations. Thus, the time series application is essential.

            The techniques of time series forecasting are the predictions from sequences of past values, in other words, from observations of the series [Z­t-1, Z­t-2, Z­t-3, .... , Z1]. The objectives of the techniques of time series forecasting are:

·         forecasting future values ​​of the time series;

·         describing only the behavior of the series (checking of the trend and seasonality);

·         identifying the mechanism that generates the series (the stochastic process that generated the series).

            The modeling of the series is based on a single realization and this requires it to have an ergodic stochastic process. The predictions resulting from the application of these techniques may be related only to the information contained in the historical series of interest (based on classical statistical methods) or even in addition to incorporating this information, may consider other supposedly relevant and which are not contained in the series analyzed (methods based on Bayesian statistics).

            According to Boucher and Elsayed (1994), the techniques of time series forecasts can be divided into two categories: qualitative and quantitative techniques. The quantitative technique is the most used when there is a set of data, and it is possible to apply various statistical methods in order to extrapolating these data and to obtaining probable future values.

            Among the most widespread prediction models, we can mention the Box and Jenkins models, which have the advantage of its systematic methodology that allows both estimates, specific as per interval and based on a series of hypotheses and statistical tests for acceptance of the models, however they should only be applied to linear series, stationary or not (GUTIERREZ, 2003). This is a very powerful procedure, but requires a very accurate knowledge.

            Assimakopoulos and Nikolopoulos (2000) have proposed a new univariate forecasting model called Theta, which is relatively simple to apply and presented one of the best performances in time series forecasts.This was one of the methods tested in the M3-Competition of Makridakis (2000). The model consists in decomposing time series in two lines called Theta, each line is extrapolated separately by linear regression and Simple Exponential Smoothing (SES), respectively, then the two forecasts are combined with equal weights, obtaining the Theta forecast.

            However, as disadvantage it may be cited the lack of confidence intervals for the estimates in the work of Assimakopoulos and Nikolopoulos (2000). The confidence intervals are very important to have a reliable estimate of the size of the error you can make to get an estimate.

            The purpose of this work is applying the technique for computing-intensive resampling known as "Bootstrap" to estimate by confidence intervals the point forecasts achieved by Theta Model. The Bootstrap resampling method consists in a set of data, either directly or through an adjusted model in order to create the data replication, which can evaluate the variability of amounts of interest without using analytical calculations.

            Therefore, this technique is particularly useful when the estimators calculation is complicated by analytical methods. Resampling permits different alternatives to meet standard deviations and confidence intervals by analyzing a set of data (DAVISON; HINKLEY, 1997).

2.     REVIEW OF LITERATURE

2.1.  Theta Model

            According to Nikolopoulos et. al. (2011) the Theta model is a time series forecasting model developed from the idea that an extrapolative method is practically unable to capture efficiently all available information hidden in a time series. The Theta model sparked interest in academia, due to its amazing performance in positive predictions in M3-competition (MAKRIDAKIS, et. al, 2000).

            This model can be understood according to the analysis of Hyndman and Billah (2003) as being equivalent to simple exponential smoothing with "drift". However, Nikolopoulos and Assimakopoulos (2005) disagree with this approach and claim that the theta model is more general than the simple exponential smoothing because it is an approximation to the decomposition of the data and that it can be relied on extrapolation of any forecasting method.

            The Theta model is based on modification of the local curvature of time series seasonally adjusted by a coefficient theta. This coefficient is applied directly to the second difference of the series (ASSIMAKOPOULOS; NIKOLOPOULOS, 2008). This application results in a series called “Theta Line”, maintaining the average and the slope of the original data, but not their curvatures.

            The general formulation of theta model is based on the following steps:

·         Decomposition of the initial series in two or more rows theta;

·         Each theta line is extrapolated separately and the forecasts are simply combined with equal weights.

            The best formulation of the model and which was tested in the Competition-M3 is the decomposition of the time series in two theta lines. In this case the number of observations is decomposed as follows:

= ,                                                                                       (1)

            Where,  is the linear regression of the data and  is obtained by the following expression:

.                                                                                         (2)

            The  describes the series as a linear trend.  duplicates the local curvatures extending the short-term action. For extrapolation of  it is applied the method of simple exponential smoothing. The final forecasting  for the Theta model is obtained by combining the two lines with equal weights,

 = ,                                                            (3)

            In practice the method can be easily implemented by using EXCEL an electronic spreadsheet. Nikolopoulos and Assimakopoulos (2005) suggest the following steps for its implementation:

·        Step 0: Seasonal decomposition of data by the classical method multiplicative if necessary;

·        Step 1: Apply Linear Regression to data and prepare  and forecasts;

·        Step 2: Prepare values for  using formula (1);

·        Step 3: Extrapolate  with either SES (Simple Exponential Smoothing) or other smoothing method, such as moving averages;

·        Step 4: Combine with equal weights the forecasts from SES and LR (Linear Regression).

            Theta model is usually simple and requires no extensive training. According to the results of competition of Makridakis (2000) the method has obtained good predictions in monthly series stationary or with trend or seasonality. Petropoulos & Nikolopoulos (2013) argue for the use of more theta lines Q Î {-1, 0, 1, 2, 3}, as to extract even more information from the data. These lines can be extrapolated with other exponential smoothing methods like the Holt exponential smoothing and Brown exponential smoothing.

2.2.  “Bootstrap” Method

            "Bootstrap" is a computer-intensive method developed by Bradley Efron (1979) to be used in the estimation of the variability of statistics. Generally speaking, "bootstrap" is a technique that objectives the estimation by point or confidence interval of parameters of interest using resampling of the original data. It should be used when the classical methods for this purpose are asymptotic, difficult to implement or simply not existing for specific statistics.

            "Bootstrap", as already mentioned, is a computationally intensive method that uses Monte Carlo simulation to estimate standard errors and confidence intervals. According to Chaves Neto (1991), ‘Bootstrap’ is a non-parametric statistical technique computationally intensive that allows evaluating the variability of statistics based on data from a single sample exists.

            The basic idea of "Bootstrap" is resampling a set of observations of the original sample, directly or via an adjusted model in order to create replicas of data, from which it can evaluate the variability of statistics without the use of analytical methods.

            So when you have a random sample of size n, x’ = [x1, x2, x3, ..... , xn], with replacement it becomes NBS samples with replacement from the original sample resulting in a sample called "bootstrap" and denoted by x *. Calculating the T statistic of interest with the NBS samples "bootstrap" it can be gotten the set of estimates "bootstrap" consisting of  i = 1, 2, ... NBS. These values create an approximation of the true distribution sample of T.

            Resampling is based on an empirical distribution, in other words, there is probability mass equal to 1/n each sample point. Thus the empirical distribution placed in the sample data is F = 1/n. The key point of the method is thus the replacement that allows the reset of as many samples as desired.

            The goal is to see how the statistics obtained from the resampling obtained vary due to random sampling. In cases of parameter estimation in which the sampling distribution of the statistic (estimator) is unknown the "bootstrap" is very helpful.

            Hestemberg et al. (2003) stated that the original sample represents the population from which it was removed. The resampling represents what you should get when many samples are taken from the original population. The "bootstrap" distribution of statistics, based on many resampling, represents an approximation of the true sampling distribution of statistics. In order to obtain reliable results it should be taken thousands of "bootstrap" samples from the original sample.

 

 

3.     PROPOSED METHOLOGY

            The time series analyzed was obtained by a program generator of time series (GST), developed experimentally in the Pascal language. It has been Chosen to generate series with 36 observations and the generation was made to the structure models AR (1) defined by , setting autoregressive parameter   into the parameter space in the region of stationarity, . It was considered the value of the constant term d = 45  and noise variance has been set at V(at) = = 0,2. The programs were developed in Matlab version 7.1 to achieve the objectives of this work.

3.1. Adaptation of the models and computational implementation

3.1.1. Forecasts Theta

            Given the time series , the series fits a linear regression model by the method of ordinary least squares (OLS), obtaining the estimation of , and the vector  that will be designated as .

To achieve other lines theta, it replaces  in the equation:

                                                                                         (4)                              

             is extrapolated by a simple exponential smoothing (SES). The combination with equal weights in the period h gives the final forecast for the theta model.

=                                                                               (5)

3.1.2 Confidence Interval “Bootstrap”

            In the present study residues are obtained by the sample , obtained by combining equal weight of  and  after application of the method of exponential smoothing.

            The problem considered is one which uses linear models to estimate the sampling distribution of statistics  used to estimate . The overall regression model adapted to the context prediction for linear regression is:

                                                                                                  (5)

where:

:  Answers vector with dimension n;

X: matrix model of order nxp

: parameter vector with dimension p;

: residual vector with dimension n.

            With linear regression model applied, the same is used to generate  obtained by the equation:

 = 2                                                                                     (6)

            To generate the preview line , a combination is made with equal weights between  and , and after the application of simple exponential smoothing (SES) on .

 = (+ )                                                                             (7)

            For the purpose of "Bootstrap", will be used residues obtained from this combination, so it follows that:

 =  -                                                                                                                 (8)

where:

 Original series, generated by the simulator

 Series estimated by Theta;

            The steps for computational implementing to obtain confidence intervals for the forecast h periods are:

1) Fit a regression model by ordinary least squares (OLS), obtaining the

estimation of , and the vector of answers  e .

2) Apply a model for exponential smoothing on , generating the vector . Get the theta model by the equation:

 = ,                                                                             (9)

obtaining the vector of estimated residues which shall be considered the original sample for the purpose of “bootstrap”.

3) Select B random sample of size n, from residues  obtained in step (1) using resampling with replacement, with probability  for each residue selected  .

, , …. ,  

, , … ,  

…...............................

…...............................

, , … ,                     ~   

4) Generate the pseudo-series, with each sample “bootstrap” by the equation:

                                                                                                             (10)

5) Adjust again the model by ordinary least squares to pseudo-series, obtaining the estimated "bootstrap" from  ,   and the vector "bootstrap" for the theta model  ().

6) Store  in a vector BX1;

            The following flowchart, Figure 1 shows steps of the algorithm.

 

Bootstrap distribution of that simulates the sampling distribution of . { ; = 1, 2, 3, ... ,B}

 

Figure 1: "bootstrap" Algorithm distribution from

            From this distribution "bootstrap"  one can calculate the standard deviation "bootstrap” and the breaks "bootstrap" for . To obtain a percentile confidence interval with confidence level of 95% to , it is ordered in ascending order of the data distribution "bootstrap" ≤...≤ And use  and   respectively as lower and upper limits from the confidence interval from  to .

4.     NUMERICAL RESULTS      

            Table 1, below, shows the series generated by the simulator GST with 36 observations. The last six values ​​of the series were stored for validation and performance testing.

Table 1: Time series generated by simulator

Time Series

 

ear 1

ear 2

ear 3/test

month 1

45.08

44.84

45.05

month 2

44.69

44.68

45.14

month 3

44.61

44.60

44.87

month 4

44.90

44.70

45.04

month 5

45.21

44.50

45.24

month 6

45.13

45.06

45.25

month 7

45.15

45.12

44.93

month 8

44.99

44.85

45.21

month 9

45.06

44.93

45.10

month 10

44.89

44.60

45.18

month 11

44.78

44.83

45.09

month 12

44.79

44.75

45.15

Source: The authors.

            Theta Model is decomposed into two theta lines, L (Θ = 0) and L (Θ = 2) and extrapolated by linear trend and simple exponential smoothing, respectively.

            Table 2, below, shows the six series values ​​stored for performance testing, and forecasts for  and , and  forecast for the Theta Model, the evaluation of quality of the prediction according to the criterion MAPE, RMSE, and MSE.

Table 2: Period actual value, regression lines and smoothing, theta forecasts, MAPE, and RMSE MSE

Period

(h)

Observed

(test)

MAPE

MSE

 

1

44.93

44.9652

45.5130

45.2391

0.6880

0.0955

2

45.21

44.9687

45.5130

45.2409

0.0682

0.0010

3

45.10

44.9722

45.5130

45.2426

0.3162

0.0203

4

45.18

44.9757

45.5130

45.2443

0.1424

0.0041

5

45.09

44.9792

45.5130

45.2461

0.3462

0.0244

6

45.15

44.9826

45.5130

45.2478

0.2167

0.0096

Mean

 

0.2963%

0.0258

RMSE

 

0.1606

Source: The authors.

            RMSE ​​for the series decomposed into two lines and the extrapolation by SES and LR respectively, is equal to 0.1606. The performance measure medium according to the criterion MAPE is 0.2963%. Figure 1, below, shows the time series analyzed, the linear regression line of exponential smoothing and the forecasts for the theta method.

Figure 2: Time Series, L (Θ = 0) RL, L (Θ = 2) of AES forecast Theta.

            Table 3, below shows the forecast results obtained by the theta and also the predictions obtained by the Statgraphics software, using traditional methods optimized for the lowest RMSE and their respective MAPE errors.

Table 3: Period, theta forecasts and Box & Jenkins and MAPE's

 

THETA

BOX & JENKINS

Period

(h)

MAPE

ARMA

(1,0)

MAPE

1

45.2391

0.6880%

45.0868

0,3489%

2

45.2409

0.0682%

45,0054

0,4525%

3

45.2426

0.3162%

44.9648

0,2997%

4

45.2443

0.1424%

44.9445

0,5212%

5

45.2461

0.3462%

44.9344

0,3450%

6

45.2478

0.2167%

44.9293

0,4888%

Mean

 

0.2963%

 

0.4095%

Source: The authors.

            Table 3, allows us to affirm that the Theta model in its simplest application L(Θ = 0) and L(Θ = 2), using the method of simple exponential smoothing (SES) for extrapolation of L(Θ = 2), obtained the best performance getting an average absolute percentage error of 0.2963% against 0.4095% achieved by the methodology of Box and Jenkins.

            Table 4, below, shows the observed values​​, the predictions for the Theta Methods, Standard Deviation "bootstrap" MSE "bootstrap" and the Lower and Upper Limits of confidence.

Table 4: Real value, theta forecast, “bootstrap” standard error, “bootstrap” MSE, confidence interval

Period

(h)

Observed

ThetaF.

Standard error “bootstrap”

MSE

“bootstrap”

Lo. Limit 95%

Up. Limit 95%

1

44.93

45.2391

0.0020

0.09x10-4

45.2329

45.2406

2

45.21

45.2409

0.0040

0.37x10-4

45.2286

45.2439

3

45.10

45.2426

0.0062

0.86x10-4

45.2228

45.2488

4

45.18

45.2443

0.0085

1.49x10-4

45.2200

45.2522

5

45.09

45.2461

0.0108

2.42x10-4

45.2141

45.2561

6

45.15

45.2478

0.0125

3.12x10-4

45.2123

45.2600

 

 

 

 

1.39x10-4

 

 

Source: The authors.

            Analyzing Table 4, it is note that the MSE "bootstrap" on the predictions obtained appear in ascending order, meaning that each forecast horizon the confidence limits appear in a larger range. Making the predictions less reliable as the forecast horizon increases.

            The interval "bootstrap" for the forecast horizon of h periods ahead for the time series, it was applied the method "bootstrap" with a number of replications B = 1000. The histogram of the estimates "bootstrap" of the forecast for the first prediction , Figure 2 below shows the distribution of "bootstrap" for . In the figure it is observed a high degree of symmetry, which suggests a Gaussian model for these values gotten.

Figure 2: “Bootstrap” distribution for

5.     CONCLUSIONS

            This work adopts the method known as "bootstrap", to estimate by confidence intervals for forecasts for the Theta Model. The Theta model has also proved to be a good alternative for time series forecasting, because the results indicate a significant advantage over conventional methods.

            The method is simple, does not require extensive training and basic statistics. From the results obtained, it is noted that the greater complexity of a model does not necessarily result in better results in modeling data. In its simplest application, by the series decomposition in linear regression and simple exponential smoothing is obtained at least equivalent results to other automated methods.

            The results of the errors MAPE's confirm the favorable results of the M3-competition of Makridakis and Hibon (2000). For the construction of confidence intervals it was used the method of computation intensive "Bootstrap", which was obtained by the percentile intervals of 95% confidence.

            The distribution data "bootstrap" showed symmetrical behavior, which suggests an estimation model normally distributed with 1000 replications. The problem to determine how many B replications will be required to obtain good estimates of lower and upper limits of confidence intervals by the "bootstrap" is discussed in Efron and Tibshirani (1993).

REFERENCES

ASSIMAKOPOULOS, V.; NIKOLOPOULOS, K. (2000). The theta model: a decomposition approach to forecasting. International Journal of Forecasting v. 16 p. 521 –530.

ASSIMAKOPOULOS, V.; NIKOLOPOULOS, K. (2008). Advances in the theta model. University of Peloponnese, Department of Economics.

BOUCHER, T. O.; ELSAYED, E. A. (1994). Analysis and control of production systems., 2.nd ed., Prentice Hall, New Jersey.

CHAVES NETO, A. (1991) Bootstrap” em Séries Temporais. Thesi (PhD in Eletric Engineering) – Pontifícia Universidade Catolica do Rio de Janeiro. PUC-RJ.

DAVISON, A. C.; HINKLEY, D. V. (1997). Bootstrap Methods and their Application. Cambridge University Press.

EFRON, B. (1979) Bootstrap methods: another look at jakknife. Annals of Statisticis, v. 7, n. 1, p. 1-26.

EFRON, B.; TIBSHIRANI, R. J. (1993). An introduction to the “bootstrap”. Chapman and Hall, New York.

GUTIÉRRES, J. L. C. (2003). Monitoramento da Instrumentação da Barragem de Corumba-I por Redes Neurais e Modelos de Box e Jenkins. Dissertation (Master in Civil Engineering), PUC-RIO.

HESTERBERG, T.; MOORE, D. S.; MONAGHAN, S.; CLIPSON, A.; EPSTEIN, R. (2003) “Bootstrap” methods and permutation tests, In: The practice of business statistics. New York: W. H. Freeman.

HYNDMAN, R. J.; BILLAH, B. (2003). Unmasking the Theta method. International Journal of Forecasting, v.19, n. 2, p. 287-290.

LIEBEL, M. J. (2004). Previsão de Receitas Tributárias – O caso do ICMS no Estado do Paraná. Dissertation (Professional Master’s degree in Engineering) – Universidade Federal do Rio Grande do Sul – RS.

MAKRIDAKIS, S.; HIBON, M. (2000). The M3-Competition: results, conclusions and implications. International Journal of Forecasting, v. 16, p. 451 –476.

NIKOLOPOULOS, K.; ASSIMAKOPOULOS, V. (2005) Fathoming the Theta model. In: 25th International Symposium on Forecasting, ISF, San Antonio, Texas, USA.

NIKOLOPOULOS, K.; ASSIMAKOPOULOS, V.; BOUGIOUKOS, N.; LITSA, A.; PETROPOULOS, F. (2011). The Theta model: An essential Forecasting Tool for Supply Chain Planning. Advances in Automation and Robotics, Lecture Notes in Electrical Engineering, n. 123, p. 431-437.

PETROPOULOS, F.; NIKOLOPOULOS, K. (2013). Optimizing Theta model for monthly data. In: Proceedings of the 5th International Conference on Agents and Artificial Intelligence.