Preprint / Version 1

Prediction of indicators through machine learning and anomaly detection: a case study in the supplementary health system in Brazil

##article.authors##

  • Mirele Marques Borges Universidade Federal do Rio Grande do Sul
  • Cláudio José Müller Universidade Federal do Rio Grande do Sul

Keywords:

Machine Learning, Indicators, Anomaly Detection, Feature Engineering, Supplementary Health System

Abstract

The research aimed to investigate the stages of a Machine Learning model process creation in order to predict the indicator over the number of medical appointments per day done in the area of ??supplementary health in the region of Porto Alegre / RS - Brazil and to propose a metric for anomalies detection. Literature review and applied case study was used as a methodology in this paper, besides was used the statistical software called R, in order to prepare the data and create the model. The stages of the case study was: database extraction, division of the base in training and testing, creation of functions and feature engineering, variables selection and correlation analysis, choice of the algorithms with cross-validation and tuning, training of models, application of the models in the test data, selection of the best model and proposal of the metric for anomalies detection. At the end of these stages, it was possible to select the best model in terms of MAE (Mean Absolute Error), the Random Forest, which was the algorithm with better performance when compared to Linear Regression and Neural Network. It also makes possible to identified nine anomaly points and thirty-eight warning points using the standard deviation metric. It was concluded, through the proposed methodology and the results obtained, that the steps of feature engineering and variables selection were essential for the creation and selection of the model, in addition, the proposed metric achieved the objective of generates alerts in the indicator, showing cases with possible problems or opportunities.

References

Araújo, F. H. D., Santana, A. M., & Santos Neto, P. A. (2015). ma Abordagem Influenciada por Pré-processamento para Aprendizagem do Processo de Regulação Médica. Journal of Health Informatics, 7(1).

Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.

Burrell, J. (2016). How the machine ‘thinks’: Understanding opacity in machine learning algorithms. Big Data & Society, 3(1).

Domingos, P. M. (2012). A few useful things to know about machine learning. Commun. acm, 5(10), 78-87.

García, E. et al. (2007). Drawbacks and solutions of applying association rule mining in learning management systems. Proceedings of the International Workshop on Applying Data Mining in e-Learning, 13-22.

Garla, V. N., & Brandt, C. (2012). Ontology-guided feature engineering for clinical text classification. Journal of biomedical informatics, 45(5), 992-998.

James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning, 112, 18. New York, Springer.

Kraska, Tim et al. (2013). MLbase: A Distributed Machine-learning System. CIDR. 2.1.

Lee, P. P. Y. et al. (2019). Interactive interfaces for machine learning model evaluations. U.S. Patent n. 10,452,992.

Marsland, Stephen. (2014). Machine learning: an algorithmic perspective. Chapman and Hall/CRC.

Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2018). Foundations of machine learning. MIT press.

Murdoch, W. J. et al. (2019). Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44), 22071-22080.

Refaeilzadeh, P., Tang, L., & Liu, H. (2009). Cross-validation. Encyclopedia of database systems, 532-538.

Rodriguez-Galiano, V. et al. (2015). Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geology Reviews, 71, 804-818.

Vinyals, O., Dean, J. A., & Hinton, G. E. (2019). Training distilled machine learning models. U.S. Patent n. 10,289,962.

Downloads

Posted

2021-01-25
صندلی اداری سرور مجازی ایران Decentralized Exchange
فروشگاه اینترنتی