PRINCIPAL COMPONENTS IN MULTIVARIATE CONTROL CHARTS APPLIED TO DATA INSTRUMENTATION OF DAMS

A high number of instruments that assess various quality characteristics of interest that have an inherent variability monitors hydroelectric plants. The readings of these instruments generate time series of data on many occasions have correlation. Each project of a dam plant has characteristics that make it unique. Faced with the need to establish statistical control limits for the instrumentation data, this article makes an approach to multivariate statistical analysis and proposes a model that uses principal components control charts and statistical T2 and to explain variability and establish a method of monitoring to control future observations. An application for section E of the Itaipu hydroelectric plant is performed to validate the model. The results show that the method used is appropriate and can help identify the type of outliers, reducing false alarms and reveal instruments that have higher contribution to the variability.


INTRODUCTION
The control chart is one of statistical quality control techniques most used and can be very useful in controlling the instrumentation of a dam. The facility of online supplying of data with high frequency provides a huge mass of data that generates a series of control charts. However, it is necessary to interpret these data to produce additional knowledge beyond simple time series data. Hotelling (1947) introduced multivariate control charts at the time of the Second World War and, from the development of computers that enabled its implementation, had great development. Its use has spread because of the need for quality control of several variables not is adequately treated with univariate tools such as Shewhart control charts, especially when there is correlation between variables.
Once the variables do not behave independently of one another, they should be considered together and not separately YOUNG, 2002).  http://www.ijmp.jor.br v. 7, n. 1, January -March 2016ISSN: 2236 individually may be impractical or even lead to high false alarms and discredit the monitoring system. It should be given continuous attention to operation and maintenance of the safety of a dam power plant. The tool for this is the instrumentation of the dam. Due to the particularities of each hydropower plant, there are no universal procedures applicable to all dams in respect to the evaluation of the instrumentation. The evaluation and judgment of information available for an experienced engineering team is the best way to contribute to a decision and choose the best action to be implemented (USACE, United States Army Corps of Engineers, 1995).
The main sources of variability of the readings of instruments for monitoring dams are attributed to temperature, reservoir level and aging (ROSSO et al., 1995), CHENG;ZHENG, 2013;NEDUSHAN, 2002).Two models were created to relate the effects of instrument readings and environmental variables in a dam of China. The methods showed ability to reduce the rate of false alarms and detect defective instruments (CHENG; ZHENG, 2013).
Diagnosed singular values in monitoring dam safety, with a case study on the hydroelectric plant in China, via multivariate analysis of principal components and graphic control Hotelling (GU et al., 2011). In a hydroelectric power in China it was applied a model that extracts principal components of data instrumentation and establish a seasonal hydrostatic in time model between the variables reservoir level, temperature and time effects and principal components (YU et al., 2010).  http://www.ijmp.jor.br v. 7, n. 1, January -March 2016ISSN: 2236 used to monitor the performance of the structures by means of readings taken and are warning signs for abnormal situations of structural behavior. Values that were determined earlier in the design phase and the filling of the reservoir are not, in many cases, more applicable during the operation phase . It is necessary to establish operational control values for the instrumentation of hydroelectric plants.
This article aims to propose a multivariate statistical model for monitoring instruments for monitoring dams via control charts and principal components analysis and seeks to separate the effect of environmental variables on the reading of instruments from other sources of variability by use of statistics and and establish control values for monitoring future observations. The method is evaluated in a case study applied to real data from monitoring a dam hydroelectric plant.
The article presents in section 2 the theoretical basis of knowledge of control charts and statistics to be used in this work and the place where the case study was applied and its significance. The section 3 describes the data and methods used in the developed model. Section 4 presents and discusses the results of a case study.
In the section 5 some conclusions and considerations are presented.

control charts
The field of multivariate analysis consists of statistical techniques that consider two or more random variables related to a single entity in an attempt to produce an overall result that takes into account the relationship between the variables (JACKSON, 1991).The multivariate process control is a methodology based on control charts used to monitor the stability of a multivariate process. Stability is achieved when one or more parameters of interest remain stable on samples (MASON; YOUNG, 2002).
One of the first studies to examine correlated variables from the perspective of statistical control using multivariate procedures for military purposes was given by Hotelling (HOTELLING, 1947). This control procedure was based on a statistic that generalizes the Student statistic which later received the name of Hotelling .
The application of univariate control charts can lead to erroneous and misleading interpretations and that multivariate methods are a good alternative

INDEPENDENT JOURNAL OF MANAGEMENT & PRODUCTION (IJM&P)
http://www.ijmp.jor.br v. 7, n. 1, January -March 2016 ISSN: 2236-269X DOI: 10.14807/ijmp.v7i1.369 (MONTGOMERY, 2013;JOHNSON;WICHERN, 2007). If the variables are correlated increases the probability of emission of false alarms and not to receive an alert when the multivariate process is out of control (RYAN, 2011 It is pointed to the existence of two phases in statistical quality control process. In the phase I (retrospective), the control limits are established and tested to the data available. It should be under levels considered statically under control. In the second phase (perspective) control limits are established from the same preliminary data and are used to monitor future data (MONTGOMERY, 2013;RYAN, 2011).
In the case of reading instrumentation monitoring of dams, which will be of interest in this work, the value of the statistic evaluated in in the phase I by (Montgomery, 2013), (JOHNSON; WICHERN, 2007;RYAN, 2011), where a vector observation is not independent of the estimators and , is given by equation (3).
The upper control limit in the case of , is recommended to be calculated based on a beta distribution (MASON; YOUNG, 2002), in this case (4)

INDEPENDENT JOURNAL OF MANAGEMENT & PRODUCTION (IJM&P)
http://www.ijmp.jor.br v. 7, n. 1, January -March 2016 ISSN: 2236-269X DOI: 10.14807/ijmp.v7i1.369 where represents the quantile of the distribution with and degrees of freedom. The upper control limit ( ) of phase II, when the parameters are estimated in a previous sample and a vector observation is independent of the estimators and , is given by (5) where represents the quantile of the distribution with and degrees of freedom. The dimensional ellipsoid prediction of a future observation is given by all vectors satisfying (6) where is the number of samples (time).
An important issue in the treatment process with individual observations is the way of estimating the covariance matrix. The usual estimator is given by equation (2), however, there are various ways to estimate the covariance matrix, for example, the covariance matrix estimated by successive differences, is given by (7) This matrix was proposed by (HOLMES; MERGEN, 1993) and (CHOU; MASON; YOUNG, 1999) made a comparison between five types of covariance matrix estimates and showed that the common estimator is preferred for outliers detection.

Principal components analysis (PCA)
When the number of variables to be analyzed increases, the parameter that evaluates the average number of samples required to detect changes in the process is deteriorated, in this way, if suspected that the process variability is not equally distributed among all variables, it is useful to use other methods (MONTGOMERY, 2013).
Techniques 'reduction' data are based on the principle of creating sets of latent variables that capture the significant variation 'hidden' in the data. The change that the sets of latent variables extract of the process variables is of fundamental
A method which can extract features in the data can be useful in dam safety study. As the instrumentation readings are a result of the combination of several factors, methods of multivariate data analysis can provide the following advantages: 1) more profitable by reducing the number of individual analysis, 2) greater ability to explain and separate the variability due to one because of random variability attributable since the random variabilities are, by definition, uncorrelated from one instrument to another and 3) to identify patterns of behavior (NEDUSHAN, 2002).
The PCA is a multivariate data analytical technique in which a number of related variables are transformed into a set of uncorrelated variables that are linear combinations of the original variables, where it is expected to explain the variability between variables with a smaller number of variables (JACKSON, 1991).
Its industrial application has contributed to the multivariate statistical process control, since only a few of multivariate control charts can serve as an index of process performance. PCA improves the early detection of failures in relation to the univariate graphs (KOURTI, 2005).
The eigenvectors of the covariance matrix form the columns of the orthogonal matrix the spectral decomposition of , so that (8) where is a diagonal matrix of eigenvalues representing the variance of each principal component (JACKSON, 1991). Therefore, one can transform correlated variables in new uncorrelated variables through the transformation Is also true that (10)

INDEPENDENT JOURNAL OF MANAGEMENT & PRODUCTION (IJM&P)
http://www.ijmp.jor.br v. 7, n. 1, January -March 2016 ISSN: 2236-269X DOI: 10.14807/ijmp.v7i1.369 However, when using a number principal component, then takes the sub-matrix of order of and the sub-vector of order in product , yielding an approximation to will be represented by .
There is no consensus in the literature regarding the amount and the criteria for determining the number of components to be retained. A series of criteria for choice is presented in Jackson (1991), in this work the choice was based on the percentage of variance explained and the ability to detect out of the limit values as compared with the control chart. Here, this choice is relativized because the not retained components will also be evaluated on the statistic.
The fact that the PCA produce independent variables have the advantage of making it possible to compare the false alarm rate of statistical control procedures of multivariate quality with univariate procedures such as Shewhart charts, because, according Montgomery (2013), the true probability type I error, if the variables are independent to the set control procedure is , where is the number of variables and there is no closed formula otherwise.

Regression analysis for missing data
When working with large databases is relatively common not to have all the desired data. There are several reasons for this fact. In the case of automatic data acquisition, electronic problems could cause the loss or unreliability in receiving the information. In the case of non-automated acquisition, there may be several forms of human errors that cause no part of the data to be evaluated.
A measure of the suitability of a model to a time series is given by measuring the mean square error (MSE), given by (12) where is the observed value, is the predicted value, is the number of observations and is the number of parameters of the model or the number of independent variables used in the linear regression model.

The statistic
When is formulated a model of the principal components in which the projection data has been standardized, an observation, consisting of a vector of variables, can be write as , if not taken all principal components, has only an approximation , so that (13) The first term on the right side of the equation (13)    The upper control limit of , denoted by , according Mudholkar and Jackson (1979) is (15) where is the value that corresponds to percentile of the standard normal distribution, is the probability of type I error (will be fixed at 0.0027 here)

INDEPENDENT JOURNAL OF MANAGEMENT & PRODUCTION (IJM&P)
and (16) for , is the th eigenvalue of the covariance matrix and (17)

DATA AND METHODS
This work was developed with real data from the instrumentation section E the Itaipu hydroelectric plant, shown in Figure 2. Faced with various instruments present, the piezometers were chosen because of the amount and the importance of their role in measuring uplift pressures in the dam.
The E section has seven piezometers. As the frequency of readings from the period of reservoir filling in varied, was chosen to analyze the period from until when the readings were approximately biweekly for all piezometers. This period generated readings for each instrument from now on called piezometers . http://www.ijmp.jor.br v. 7, n. 1, January -March 2016ISSN: 2236 The reading of each piezometer is an independent and identically distributed random variable. For phase I, the adjustment test (retrospective) of the model, were selected readings and for Phase II validation (perspective) of the model, we selected the remaining readings. It is important to mention that in this phase I data are considered under the control of a statistical point of view. The instrument had some missing data during the phase I. In this way, the linear and by time series regressions were applied and the equation (12) was used to choose the best model for filling the missing data.  Test the multivariate normality;

INDEPENDENT JOURNAL OF MANAGEMENT & PRODUCTION (IJM&P)
 Choosing the Type I error probability and the estimate of the covariance matrix;  Construct the graphic for the full set of variables; http://www.ijmp.jor.br v. 7, n. 1, January -March 2016ISSN: 2236  Extraction and selection of the number of principal components and normality test;

INDEPENDENT JOURNAL OF MANAGEMENT & PRODUCTION (IJM&P)
 Construction of the graph and the ellipse control of the principal components in phases I and II;  Calculation of residual and statistics;  Construction of -chart;  Interpretation of results.

RESULTS
Among the forecast models for missing data to was chosen a model with , the lowest compared to other regression models and linear regression on the other variables to adjust missing data between observations adjusted under the preceding. To evaluate the hypothesis of multivariate normality was used a test of adjustment described in Mingoti (2005) and Johnson and Wichern (2007). For each vector containing the standard readings of piezometers was calculated (18) where corresponds to the percentile of the distribution . Once exactly of the first phase of the sample satisfied the condition, we can accept the hypothesis that the data come from a multivariate normal distribution at confidence level.
For comparison, univariate charts Shewhart control of the sample mean were built for phase I, , e is, according Montgomery (2013) an unbiased estimator for σ, given by , where is the average amplitude and , so that the false alarm rate is . The Table 1 shows the number of observations outside the control limits (OCL) to univariate Shewhart charts respective of to instruments to 300 data of phase I. Note the impossibility of univariate treatment with the high number of observations OCL.
To select the estimator covariance matrix to be used in chart for the data on the seven standard piezometers were tested usual covariance matrix (2) and the

INDEPENDENT JOURNAL OF MANAGEMENT & PRODUCTION (IJM&P)
http://www.ijmp.jor.br v. 7, n. 1, January -March 2016 ISSN: 2236-269X DOI: 10.14807/ijmp.v7i1.369 matrix successive differences (7). The Table 2 shows the number of observations outside the control limit (OCL) for chart for each type of tested covariance matrix for a type I error probability fixed at for every variable that corresponds the limits of Shewhart charts, then, according Montgomery (2013), the real probability of type I error if the variables are independent, for all control procedure is . It should be noted, of course, the original variables are not independent.
Nevertheless the multivariate treatment becomes feasible, since the probability that 8 or more observations in the universe of 300 are at random above if the data originates a multivariate normal distribution is , for , it cannot reject the hypothesis that the process is in statistical control at 95% confidence. Because the matrix (7) is more sensitive to small deviations from the mean, for the purpose of this study, the matrix (2) was selected.

Successive Differences All
Extracting the principal components of the data set of phase I, a simulation was performed by selecting principal components that explain a percentage of variability as shown in Table 3.  http://www.ijmp.jor.br v. 7, n. 1, January -March 2016ISSN: 2236 The eigenvectors of the matrix of the spectral decomposition of are shown in Table 4. Data from scores of principal components were retained and multivariate normality was tested by the test at a confidence level of that obtained the acceptance of the hypothesis normality, according second column of Table 5.

Phase I Phase II
For example, the upper control limit for statistical (scores of PC's) in phase I was calculated from equation (4) and resulted in and all observations of this phase did not exceed the control limit. For phase II, the upper control limit of statistical calculated from equation (5) Table 5. The fifth and ninth columns of Table 5 contain the number of observations outside the control limits ( ) for the phase I ( ) and for the phase II ( ) and these observations ( ) are in the sixth and tenth column for charts in the phases I and II. Finally, seventh, and eleventh columns of Table 5 contain the probability that the amount of obtained is in control statistical at 95%, that is, values less than 5% should reject the hypothesis that the data are under control.
If we use of the principal components, then taking the sub-matrix of order of and the subvector of order in the product , yielding an approximation for , denoted by , using the equations (8), (9) and (10). The value obtained for the upper control limit of statistical using equations (16), (17) and (15) was when , the others values of for are in the second column of Table 6. The remaining columns of Table 6 show the amount of , and what are these observations. The control chart of statistical including the data of the two phases is shown in the Figure 3 and the Figure 4. displays the control of the ellipse confidence data for the first two principal components, the point in red represents the single observation out of control, which is outside the ellipse control we take the components and . These plots were constructed for and the usual covariance matrix, equation (2), at confidence level. The Figure 5 shows the behavior of the statistic in the period of analysis, for , calculated by equation (14).

CONCLUSIONS
This paper attempted to establish a method for dealing with control charts for dam monitoring instruments. In practice, given the large number of instruments in a large dam and the correlation between them, the individual monitoring of each instrument can be unfeasible, either by excessive graphics to analyze or the large number of false alarms that can discredit the system. The proposed method involves multivariate analyzes and summarizes the analysis of a set of instruments in the statistics and combined with PCA for explaining, respectively, the inherent variability (assignable causes) and random sources in the system. The objective was to reduce the work using multivariate analysis, reducing false alarms to statistically under control levels and identify differences in observations outside the control limits for and statistics.
Is worth mentioning that if it he had chosen to analyze seven Shewhart univariate charts for the mean and the control limits would be obtained between
The results show that the principal components model combined with the statistic best fit data of phase I when are taken, at least, four principal components, because in this case the observations listed as out of control to the chart appearing as out of control to the chart of principal components or chart (compare Table 2, Table 5 and Table 6) and showed that, in the case study applied the multivariate monitoring of piezometers, located in section E of the Itaipu hydroelectric plant, system is in statistical control at confidence, independently of the contracted model, i.e., chart or principal components combined with statistic.
Another benefit of the combined use of these statistics with 4 principal components was that the observations identified as out of control, one can make a distinction between them. Among the observations given as out of control for the statistic for 4 PC's (see Table 6), observation 25 is related to the maximum global value of the instrument , observation 140 is associated with the local maximum value in the instrument (both outliers are apparent) and observation 249 is associated with the global maximum of which is the instrument with smaller variance.
The adopted model enables an interpretation of the PC's as a consequent variability of environmental factors (inherent) to the model. It is essential to understand that the variability of the principal components is originated from these factors and also control the random variability that may be linked to outliers ( statistic).
The use of principal components has another advantage which is to overcome the problem of singularities. In this case study, for example, the determinant of the covariance of all the original variables matrix was . The singularities are associated with the existence of eigenvalues near zero that can generate computational problems in the inversion of the covariance matrix and the consequent calculation of statistical.
The first four principal components explain more than 90% of the variability. It was observed that the first principal component has an interpretation in terms of an

INDEPENDENT JOURNAL OF MANAGEMENT & PRODUCTION (IJM&P)
http://www.ijmp.jor.br v. 7, n. 1, January -March 2016 ISSN: 2236-269X DOI: 10.14807/ijmp.v7i1.369 average contribution of each instrument for the overall variability and that this depends on the elevation layer and in which they are. According to the first column of Table 4, it can be seen that the instrument is has the greatest contribution to this principal component and its location is before a concrete injection curtain and at a lower elevation, i.e., the local theoretically more susceptible to uplift pressures. The second instrument with greater contribution in this principal component is located in a joint with lower elevation, as shown in Figure 6. Since the instrument that has almost no effect on this component is and its location is after injection curtain at the top elevation of instruments study. This confirms the efficiency of concrete injecting curtain at the dam. Suggestions for future work involving this type of approach may be the use of non-parametric statistics, the variation of the rate of false alarms and the analysis of other instruments can also enable the discovery of new knowledge, as well as seek interpretation for other principal components.