Prediction of indicators through machine learning and anomaly detection: a case study in the supplementary health system in Brazil
Main Article Content
Abstract
The research aimed to investigate the stages of a Machine Learning model process creation in order to predict the indicator over the number of medical appointments per day done in the area of supplementary health in the region of Porto Alegre / RS - Brazil and to propose a metric for anomalies detection. Literature review and applied case study was used as a methodology in this paper, besides was used the statistical software called R, in order to prepare the data and create the model. The stages of the case study was: database extraction, division of the base in training and testing, creation of functions and feature engineering, variables selection and correlation analysis, choice of the algorithms with cross-validation and tuning, training of models, application of the models in the test data, selection of the best model and proposal of the metric for anomalies detection. At the end of these stages, it was possible to select the best model in terms of MAE (Mean Absolute Error), the Random Forest, which was the algorithm with better performance when compared to Linear Regression and Neural Network. It also makes possible to identified nine anomaly points and thirty-eight warning points using the standard deviation metric. It was concluded, through the proposed methodology and the results obtained, that the steps of feature engineering and variables selection were essential for the creation and selection of the model, in addition, the proposed metric achieved the objective of generates alerts in the indicator, showing cases with possible problems or opportunities.
Downloads
Article Details
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
1. Proposal of Policy for Free Access Periodics
Authors whom publish in this magazine should agree to the following terms:
a. Authors should keep the copyrights and grant to the magazine the right of the first publication, with the work simultaneously permitted under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 that allows the sharing of the work with recognition of the authorship of the work and initial publication in this magazine.
b. Authors should have authorization for assuming additional contracts separately, for non-exclusive distribution of the version of the work published in this magazine (e.g.: to publish in an institutional repository or as book chapter), with recognition of authorship and initial publication in this magazine.
c. Authors should have permission and should be stimulated to publish and to distribute its work online (e.g.: in institutional repositories or its personal page) to any point before or during the publishing process, since this can generate productive alterations, as well as increasing the impact and the citation of the published work (See The Effect of Free Access).
Proposal of Policy for Periodic that offer Postponed Free Access
Authors whom publish in this magazine should agree to the following terms:
a. Authors should keep the copyrights and grant to the magazine the right of the first publication, with the work simultaneously permitted under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 [SPECIFY TIME HERE] after the publication, allowing the sharing of the work with recognition of the authorship of the work and initial publication in this magazine.
b. Authors should have authorization for assuming additional contracts separately, for non-exclusive distribution of the version of the work published in this magazine (e.g.: to publish in institutional repository or as book chapter), with recognition of authorship and initial publication in this magazine.
c. Authors should have permission and should be stimulated to publish and to distribute its work online (e.g.: in institutional repositories or its personal page) to any point before or during the publishing process, since this can generate productive alterations, as well as increasing the impact and the citation of the published work (See The Effect of Free Access).
d. They allow some kind of open dissemination. Authors can disseminate their articles in open access, but with specific conditions imposed by the editor that are related to:
Version of the article that can be deposited in the repository:
Pre-print: before being reviewed by pairs.
Post-print: once reviewed by pairs, which can be:
The version of the author that has been accepted for publication.
The editor's version, that is, the article published in the magazine.
At which point the article can be made accessible in an open manner: before it is published in the magazine, immediately afterwards or if a period of seizure is required, which can range from six months to several years.
Where to leave open: on the author's personal web page, only departmental websites, the repository of the institution, the file of the research funding agency, among others.
References
Araújo, F. H. D., Santana, A. M., & Santos Neto, P. A. (2015). ma Abordagem Influenciada por Pré-processamento para Aprendizagem do Processo de Regulação Médica. Journal of Health Informatics, 7(1).
Bishop, C. M. (2006). Pattern recognition and machine learning. Springer.
Burrell, J. (2016). How the machine ‘thinks’: Understanding opacity in machine learning algorithms. Big Data & Society, 3(1).
Domingos, P. M. (2012). A few useful things to know about machine learning. Commun. acm, 5(10), 78-87.
García, E. et al. (2007). Drawbacks and solutions of applying association rule mining in learning management systems. Proceedings of the International Workshop on Applying Data Mining in e-Learning, 13-22.
Garla, V. N., & Brandt, C. (2012). Ontology-guided feature engineering for clinical text classification. Journal of biomedical informatics, 45(5), 992-998.
James, G., Witten, D., Hastie, T., & Tibshirani, R. (2013). An introduction to statistical learning, 112, 18. New York, Springer.
Kraska, Tim et al. (2013). MLbase: A Distributed Machine-learning System. CIDR. 2.1.
Lee, P. P. Y. et al. (2019). Interactive interfaces for machine learning model evaluations. U.S. Patent n. 10,452,992.
Marsland, Stephen. (2014). Machine learning: an algorithmic perspective. Chapman and Hall/CRC.
Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2018). Foundations of machine learning. MIT press.
Murdoch, W. J. et al. (2019). Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences, 116(44), 22071-22080.
Refaeilzadeh, P., Tang, L., & Liu, H. (2009). Cross-validation. Encyclopedia of database systems, 532-538.
Rodriguez-Galiano, V. et al. (2015). Machine learning predictive models for mineral prospectivity: An evaluation of neural networks, random forest, regression trees and support vector machines. Ore Geology Reviews, 71, 804-818.
Vinyals, O., Dean, J. A., & Hinton, G. E. (2019). Training distilled machine learning models. U.S. Patent n. 10,289,962.