BIG DATA ANALYTICS: ACHIEVEMENTS, CHALLENGES, AND RESEARCH TRENDS

 

Andre Coelho Vaz Henriques

Fundacao Getulio Vargas, Brazil

E-mail: acvhenriques@gmail.com

 

Fernando de Souza Meirelles

Fundacao Getulio Vargas, Brazil

E-mail: fernando.meirelles@fgv.br

 

Maria Alexandra Viegas Cortez da Cunha

Fundacao Getulio Vargas, Brazil

E-mail: alexandra.cunha@fgv.br

 

Submission: 7/19/2019

Revision: 9/18/2019

Accept: 10/2/2019

 

ABSTRACT

Big data applications combined with analytical tools foster prediction techniques that impact societal, economic, and political changes. After almost a decade of studies, this paper proposes to identify major debates on big data analytics, presenting its evolution over the past years and identifying its research tendencies. We limited our research to the top eight journals in information systems. Our findings suggest that big data analytics is apparently reaching a plateau, which might be confirmed by publications in the following years. The paper contributes to the current debate on big data by identifying ongoing studies in the research community. In addition, it provides a critical analysis of the field development, from its perceived benefits to its unimagined consequences. Finally, we conclude that other perspectives on big data analytics might include a new wave of studies and that new paths beyond productivity gains can be explored.

Keywords: Big data. Analytics. Business Intelligence. Datification. Data Science.

1.       INTRODUCTION

            The term big data refers to data whose size goes beyond the ability of regular database software to capture, store, manage, and analyze (MANYIKA et al., 2011). Big data applications combined with analytical tools (or big data analytics) foster prediction techniques that influence societal, economic, and political changes. After almost a decade of studies, this paper proposes to identify major debates regarding big data, presenting big data’s evolution over the past years and identifying its research tendencies.

            In the big data era people are computer-mediated in their daily activities, generating a very large amount of digital records. Additionally, in a socioeconomic environment heavily influenced by mobile applications (apps), each transaction involving any text, digital procedure, tactile command, voice, and other user inputs in an app is data. This context presents a myriad of possibilities to take advantage of big data analytics. The success of companies such as Google, eBay, Facebook, and Amazon arouses interest and draws attention to the big data phenomenon both in the academic and business worlds. These corporations, just to name a few, are the hallmarks of big data applications.

            But the advances originating from BDA technologies raise new issues. On the political side, the 2016 elections in the United States were strongly affected by media resources based on BDA—the same techniques were employed more recently in Brazil, exerting a major influence on the results of the 2018 elections. According to The Economist (2018), the Chinese government is working on a surveillance system based on facial recognition, including factors such as emotions and sexuality, aiming to control its population in an unprecedented way. On the other hand, San Francisco, in California—the center of the technology revolution—just banned facial recognition by police and certain agencies (THE NEW YORK TIMES, 2019). These types of developments evoke questions, such as the limits of privacy over other interests, which need further discussion.

            While the strategic value of data processed by algorithms promotes great efficiency to corporations, the implications for society and individuals are not clear. Decision models leveraged by sophisticated algorithms can replace the judgements of complex analyses, invading knowledge occupation professions. Therefore, jobs, institutions, and industries established today might be affected in uncertain ways. These enabling technologies may modify markets all over the world, leading to impacts that are still unknown. Thus, the technology’s many benefits can lead to negative consequences.

            In the current scenario, a systematic analysis of the field’s evolution, clarifying topics that have already been investigated and pointing out issues that still need further research, is lacking. To fill that gap, the following research question is asked: what are the current debates in big data analytics field and what are its research trends? The study synthesizes major challenges and concerns regarding BDA, presents the field’s development over time, and points out gaps that need further investigation. Although research has been conducted in this area, the present analysis, based on the eight major journals on information systems (IS), provides a new perspective.

            From an academic point of view, this study presents a clear picture of BDA development over time and uncovers gaps that have not yet been addressed. Such analysis provides a better understanding of big data analytics applications and their consequences, generating reflections on the possibilities and boundaries in the field. For practitioners, the study brings together BDA techniques, models, and a mindset that have been successfully applied, and at the same time, it provides a warning regarding BDA limits and brings attention to its applications.

            To achieve this, we first present a theoretical foundation, introducing the concepts of big data analytics. Then, we describe the method applied in the research. Next, we discuss the production of articles, the impact and challenges generated by BDA, a retrospective of major contributions, and expectations of new studies on the topic. Finally, we present our conclusions, including the limitations of the study and future research suggestions.

2.       EXISTING THEORETICAL FOUNDATIONS

            The propagation of the web, social media, mobile apps, and sensor networks, in addition to the cost reduction in storage and computing resources, has given rise to ubiquitous and increasing digital computer records termed big data (MULLER et al., 2016), while the use of analytics to extract value from big data has given rise to big data analytics (MULLER; FAY; BROCKE, 2018).

            Computers embedded in products such as cars, vacuum cleaners, or video consoles have given rise to large amounts of digitized data (LOEBBECKE; PICOT, 2015). Location-based processes and the internet of things also contribute to data generation (LYYTINEN; GROVER, 2017); therefore, with all these resources, technology provides the opportunity to transform data into ‘actionable insights’ (SABOO; KUMAR; PARK, 2016; KITCHENS et al., 2018).

            That is, BDA emerged to describe the analytical technologies employed and the large and complex amounts of data required to manage them.

            The term intelligence has been used in academic literature since the 1950s, but only in the 1990s has it become popular in business and IT communities (CHEN; CHIANG; STOREY, 2012). Hoping not to commit a heresy, we understand the big data analytics concept as being very similar to the more famous (and ‘less sexy’)—as put by Newell and Marabelli (2015)—business intelligence (BI).

            Hence, one can consider big data as a close successor of business intelligence (ABBASI; SARKER; CHIANG, 2016). In other words, the term ‘big data analytics’ (or just big data) has been adopted to refer to data sets and analytical techniques for large and advanced applications, requiring complex techniques in their usage.

            We are going through a transition period in which limited volume, regular velocity, and small variety are being replaced with a new concept of information that is very different from the traditional one, as precisely described by Abbasi, Sarker and Chiang (2016).

            In this context, the central role is played by structured data, which is stored in data centers employing relational database management systems (RDBMS). In this arrangement, many organizations integrate structured data sources in data warehouses and data marts that use extract, transform, and load (ETL) technologies. The stored data are analyzed by data analysts and programmers using structured query language (SQL), reverting the resulting data to BI tools, report generators, or analytical models employed in predictive technologies.

            Conversely, in the knowledge stage of the value chain, there are direct interactions among the information enablers and decision makers, that is, the consumers and producers of information. In this scenario, technologies such as knowledge management systems, corporate wikis, BI dashboards, reporting tools, and expert systems revert knowledge through technologies such as decision support systems (DSS) collaboration tools and recommender systems that guide decision-making processes by analysts and managers (ABBASI; SARKER; CHIANG, 2016). 

            Thus, BDA has significatively advanced from its early stage of business intelligence 1.0—marked by structured data, dashboards, data mining, OLAP, and statistical analyses; to 2.0—distinguished by unstructured online data, social network analyses, web analytics and intelligence, and social media analytics; until the current 3.0 era—characterized by mobile and sensor-based content, mobile analytics and location, and context relevant analyses (CHEN; CHIANG; STOREY, 2012; GROVER et al., 2018).

            Therefore, BDA today not only involves well-structured traditional data stored in traditional databases and data warehouses (BAESENS et al., 2016) but also implies large, diverse, and dynamic sets of digital traces and user-generated content in addition to analytics methods (MULLER et al., 2016). It encompasses public, proprietary, and purchased sources of unstructured data, including documents, web content, video, image, audio, and sensor data (GROVER et al., 2018) whose development is far from trivial (CONSTANTIOU; KALLINIKOS, 2015).

            It involves the analyses and interpretations of all kinds of digital information (LOEBBECKE; PICOT, 2015) and arises from major sources, including large-scale enterprise systems, online social graphs, mobile devices, the internet of things, and open data (BAESENS et al., 2016). It borrows techniques grounded in statistics, machine learning, and econometrics, among others.

            There are some big data features and functionalities that are commonly called management ‘V’s: volume, variety, and velocity (CHEN; PRESTON; SWINK, 2015; MULLER et al., 2016; CLARKE, 2016; ABBASI; SARKER; CHIANG, 2016; HAN; PARK; OH, 2016), which means data that are too large, fast, or hard to process. Volume refers to the enormous amount of data to be processed.

            Velocity refers to the necessity of the speed with which data are processed, from their generation to their use. One of the most challenging aspects of this chain maybe be the time from data extraction until the generation of value from the data, that is, when the data becomes useful or relevant (CONSTANTIOU; KALLINIKOS, 2015; BAESENS et al. 2016).

            Variety is related to the great diversity of origins, forms, and formats of data, which makes them difficult to categorize and tabulate. It involves not only traditional data but also user-generated text, videos, images, social network data, web and mobile clickstreams, sensor-based data, and spatial-temporal data (MCAFEE; BRYNJOLFSSON, 2012; ABBASI; SARKER; CHIANG, 2016).

            More recently, some authors have incorporated other ‘V’s in this hall, such as ‘variability’ and ‘value’. Variability is related to the susceptibility of the data to changes, such as when it is translated into another language (NUAMI et al., 2015). In terms of value, the concept of big data involves not only a vast amount of data but also the process by which organizations derive value from them—which inevitably varies across organizations, situations, and managers (LYYTINEN; GROVER, 2017), e.g., improving organizational decision making, promoting service innovation, and ensuring higher satisfaction and retention (GROVER et al., 2018). Last, Abbasi, Sarker and Chiang (2016) and others consider another ‘V’ in the information value chain—veracity, which refers to the truthfulness of the data.

            In BDA, advanced technologies are employed to analyze data to discover useful information that is hidden, such as unknown correlations, or to uncover patterns (CHEN; PRESTON; SWINK, 2015), providing answers to questions that have not even been considered (GROVER et al., 2018). In contrast to research in which data are collected for a specific end and measured by validated instruments, big data often just happens (MULLER et al., 2016).

            Since large samples are becoming more common in the IS field, researchers are increasingly working with big data (CHATLA; SHMUELI, 2017). However, Zuboff (2015) criticizes the passive position assumed regarding the topic, saying that the literature’s view of BDA as a technological phenomenon disregards its social origin. On Zuboff’s view, big data have an intentional sense and severe consequences, predicting and modifying human behavior through a logic that he refers to as ‘surveillance capitalism’.

            In fact, aspects such as privacy, surveillance, and democracy arouse debates that still need further investigation. In this sense, digitization and big data analytics, or ‘datification’ (GALLIERS et al., 2015; NEWELL; MARABELLI, 2015; LOEBBECKE; PICOT, 2015), are embedded in all areas of life. Interactions with objects with sensors and IP addresses provide a mass of data sources, and humans have become ‘walking data generators’ (MCAFEE; BRYNJOLFSSON, 2012; LOEBBECKE; PICOT, 2015).

3.       RESEARCH APPROACH

            To address the aim of this paper, we carried out an analysis of the field’s evolution, clarifying topics that have already been investigated and pointing out issues that still need further research. The literature review adopted was concept-centric. According to Webster and Watson (2002), in this kind of review the concepts that determine the organizing framework in order to synthase the literature.

            To provide a systematic review, we limited our research to the top eight Journals based on the Association for Information Systems (AIS) Senior Academic Collegiate—the most respected and recognized Journals in Information Systems field. It includes the European Journal of Information Systems, the Information Systems Journal, Information Systems

            Research, the Journal of AIS, the Journal of Information Technology, the Journal of MIS, the Journal of Strategic Information Systems, and MIS Quarterly. Instead of a longitudinal analyses of a vast number of papers, the aim was to deeply analyze the articles, which contemplate geographical, methodological, and topic diversity considerations.

            The list includes the mature and established knowledge, being representative of the IS field. Thus, the 'Basket of Eight' Journals can reflect the core body of knowledge in IS, serving as a data source for investigating the field’s development. 

            We assumed that not all the papers discussing ‘big data’ adopt that specific term. Therefore, we first expanded our search by looking for papers containing the terms ‘analytics’ and ‘intelligence’ (CHEN; CHIANG; STOREY, 2012; LUVIZAN; DINIZ, 2017) in the keywords, title, or abstract. However, the papers analyses showed that ‘datification’ and ‘data science’ are quite common in related fields, which led us to include both these terms in our search.

            In an initial analysis, the papers were selected, discarded, or subjected to a fine-grained examination. Articles explicitly containing the term ‘big data’ in any of the search fields were included in the study, while all the others went through a verification process. The articles that did not adopt specific ‘big data’ nomenclature but that combined large, diverse, and dynamic data—with broad academic consensus determining the conditions under which big data emerges—were included. As technology progresses over time, the size of the dataset classified as big data increases.

            For this reason, authors like Manyika et al. (2011) understand that it would not be reasonable to define a quantity of data that characterize it. In line with these authors, the criteria adopted to determine a large volume of data was those unable to be managed by regular software, such as Excel and the like. Articles discussing regular databases, enterprise systems (e.g., ERP, CRM, e-commerce) and traditional predictive analytics, among others, were discarded, and those that somehow contributed to clarifying big data aspects (e.g., advanced text analytics tools) were included. All the selected articles were subjected to a rigorous reading and were classified in a spreadsheet.

            The time limit defined was the year 2010, since we understand that the big data phenomenon only emerged after enabling technologies arose from that year onwards. The papers were collected between November 1st of 2018 and January 15th of 2019, and the searches contemplated the articles published between 2010 and 2018.

            However, it was not found articles published in 2010 and 2011. Out of the 135 candidates in the initial pool, we selected 41 papers that met the selection criteria. Our analyses focused on summarizing the main findings of the papers, highlighting current debates, and finding aspects that could characterize and classify the articles. Nevertheless, the intention of this study was not to merely describe the area but to actually contribute to new research, pointing out gaps and trends in the literature.

4.       DISCUSSION

4.1.          Global Production

            The first factor that determined the selection of articles in this research was the country with which the authors were associated at the time of publication. As we can see in the next figure, the publications come almost entirely from the northern hemisphere; Australia is the only exception. We see no publications on BDA at all from South America or the entire African continent, and we find that authors whose institutions are based in the United States (26 of 60) produced nearly half the publications.

World Map

Figure 1: ‘Basket of Eight’ BDA World Production

Source: created by the authors

            After the United States, the countries of China, the United Kingdom, and the Netherlands are tied with four publications, followed by Denmark, Liechtenstein, and Taiwan, with three publications. Germany, Hong Kong, India, Israel, and South Korea follow with two, and finally, there are Australia, Belgium, and Switzerland with one publication each. We assume that this scenario—in which publications originating from the United States and European countries prevail—is probably not unique to BDA publications but reflects the continuum of global production in the ‘Basket of Eight’.

            One fact, however, attracts our attention. Except for China and India, the other members of BRICS (Brazil, Russia, and South Africa) also have no publications in the field in the leading journals. This fact catches our attention, considering the size and economic influence of the countries that compose the BRICS. One major potential of BDA is precisely the fostering of economic gains—not to mention all the social and political aspects.

            Conversely, countries with a more modest global presence—such as Liechtenstein, Taiwan, and Israel—share the stage with large, developed nations. The notable accomplishments of these countries perhaps encourage professionals and academics from the rest of the world which have not yet reached such a ‘title’ in the field.

            Another interesting fact observed regards the institutions with which the authors were associated when publishing their articles related to BDA. The only institution that published four times in the leading journals in the field was the University of Liechtenstein, from the Principality of Liechtenstein.

            The monarchy is situated between Austria and Switzerland and has a population of nearly 38,000, and its University is the leading producer of BDA in the ‘Basket of Eight’. The intention behind this global overview of big data publications in leading journals is to provide a big picture of the field and of how efforts could be directed or rethought.

4.2.          Impacts and Challenges Generated by Big Data Analytics

            Data quality, analytical tools, and human analytics talent are some of the enablers of BDA that generate insights and valuable knowledge for decision making. Moreover, while BDA opens new opportunities, it also introduces new challenges, such as the impact on the labor market and privacy concerns. Regarding this topic, we analyze the major impacts and challenges identified in the reading of the papers from the ‘Basket of Eight’ journals regarding big data analytics.

            To provide a big picture of the main issues faced by the field, we first present the following figure with keywords extracted from the selected articles in this research. The keywords are ranked according to their frequency and displayed in a tag cloud visualization. The terms ‘big data’, ‘big data analytics’, ‘analytics’, and ‘business intelligence’ were removed to highlight topics published in the articles selected in this research.

 

 

Figure 2: Keywords Tag Cloud

Source: created by the authors

            The diversity may reflect the ramifications of the topic, from which different paths emerge. Nevertheless, some words also attract attention, such as data quality, privacy, social data, and sentiment analyses.

            In the following, we highlight some challenges and concerns in the field, based on the common topics referred to in the selected articles.

4.2.1.     Qualified Professionals

            The benefits promoted by big data applications are diverse. However, in addition to technology, BDA adoption requires qualified professionals. Both academic and nonacademic literature points out that the shortage of professionals and individuals capable of using the big data potential may be one of the major difficulties in its application and development. Qualified professionals with experience and expertise are key to developing and implementing BDA strategies, including data scientists, programmers, developers, and analysts (GROVER et al., 2018).

            According to Baesens et al. (2016), most universities do not offer mature programs and classes on BDA, and even worse, many professors do not have the necessary knowledge to effectively deliver big data education. To solve this issue, the authors conclude that alliances between the academy and the business world could help provide good quality education programs.

4.2.2.     Privacy

            Although the advantages of the network economy are well known, concerns about privacy have emerged in research. Big data analytics technologies move faster than the chain of systems that preserves privacy and information security (LOWRY; DINEV; WILLISON, 2017).

            Zuboff (2015) points out that today, we have data from several sensors embedded in objects, bodies, and places. The author draws attention to the fact that some technology companies put innovation first and disregard the consequences—e.g., exhibiting a photograph of a private property without license. 

            Apparently, users have been persuaded to ignore the dark side of datification and its package of digital traces because the benefits are higher than the costs. Therefore, it seems that individuals perceive that it is better to be able to look for something specific on Google (and thus support the algorithm that knows about what we want and about us) than to simply not use it (NEWELL; MARABELLI, 2015).

            Aligned with this, the General Data Protection Regulation (GDPR) enacted in 2016 is a European law that enhances data protection for Europe’s citizens and thus ensures that all small, medium, and large companies will have to invest in cybersecurity. In addition to local companies, companies all over the world that have business with Europe need to adjust to this regulation. Similarly, the ‘right to be forgotten’ was sanctioned by the European Union court in 2014—that is, links to ‘irrelevant’ or ‘outdated’ information may be deleted whenever requested by citizens of the European Union.

4.2.3.     Little Data

            While big data are data originating from indiscriminate groups with the logic of decision-making algorithms, a recent phenomenon called ‘little data’ might be emerging (NEWELL; MARABELLI, 2015). It uses big data to direct knowledge in a targeted way that is potentially unfair, predicting the behavior of a particular individual. These kinds of actions might have a serious impact, giving rise to questions about the boundaries of BDA in ethical and privacy domains. Digitized devices that are able to track and record individuals’ actions permeate our lives and pose relevant questions that still need to be addressed.

4.2.4.     The Labor Market

            The replacement of humans by machines in basic and routine activities is not a recent phenomenon. Now, machines are progressively starting to replace humans in cognitive tasks, since big data-based systems are becoming more cost effective and have a higher hit rate (LOEBBECKE; PICOT, 2015). The consequences of this change are still obscure, but it seems that it will dramatically modify the current configuration of several professions.

 

4.2.5.     Algorithm Complexity

            Although sometimes algorithms are very good at predictions, in several cases they are incomprehensible (MULLER et al., 2016). It is necessary to understand the relation between data and the analyzed phenomenon (LYYTINEN; GROVER, 2017). Nevertheless, highly advanced algorithms composed of complex formulas closed in black boxes are unlikely to be adopted to support key strategic business areas, such as fraud detection, credit risk measurement, or medical diagnosis (BAESENS et al., 2016).

4.2.6.     Infrastructure

            Big data analytics infrastructure implies the collecting of different types of data, sharing data, and integrating sources of data. In addition to human talent, organizations need to invest in analytics portfolios and big data assets to promote their development.

            BDA infrastructure encompasses data sources (e.g., clickstream, transactional, user-generated, social media) and proper platforms to collect, ingrate, share, process, and manage big data—especially those dealing with unstructured data in multiple formats (GROVER et al., 2018). More recently, data lakes have become the current best practice solution to data collection and integration (KITCHENS et al., 2018).

            They consist of vast repositories in which organizations store data in their native format until they analyze and extract value from it. This solution reduces costs for sharing data within a firm and promotes experimentation and discoveries. In addition, due to the large size of data, increasingly more outsourced firms work as servers in the so-called ‘cloud’ (LOWRY; DINEV; WILLISON, 2017).

4.2.7.     Data Quality

            Park et al. (2012) highlight the fact that little attention has been paid to problems regarding erroneous data by academic research, although institutional agencies such as the US Census Bureau are making efforts in this area. Without proper data quality, resources will inevitably be misallocated (CLARKE, 2016). More often than not, data are noisy, erroneous, and missing; due to exponential growth, ensuring trustworthy sources of data and information is difficult (GROVER et al., 2018).

4.3.          Retrospective of Major Contributions

            To the best of our knowledge, the publication of the remarkable article by Chen, Chiang and Storey (2012) is a hallmark of big data in IS, clarifying concepts, channeling the term, and providing guidance for future studies. This paper identifies the evolution, applications, and emerging research areas of BI&A (1.0, 2.0 and 3.0). In the same year, Chau and Xu (2012) developed a technique to effectively collect, extract, and analyze blogs related to a specific topic, and Park et al. (2012) created an inference model based on patterns of social ties that assess the validity of self-reported customer profiles.

            In the following years, the big data analytics potential was explored in business. This exploration started to show the implications for strategy making (CONSTANTIOU; KALLINIKOS, 2015) and demonstrated that its adoption influences business growth (CHEN; PRESTON; SWINK, 2015).

            Moreover, Constantiou and Kallinikos (2015) focused attention on unstructured data—such as the media of text, image, and sound—which cross the alphanumeric systems that have prevailed in organization management. Additionally, in 2015, the first studies pointing out big data analytics consequences were published, and the terms ‘datification’ and ‘digitization’ emerged.

            In this sense, Loebbecke and Picot (2015) demonstrate the side effects of big data analytics in business and society; Newell and Marabelli (2015) show the economic, legal, organizational, ethical, cultural, and psychological consequences of digitization—including issues related to privacy, control and dependence; and Zuboff (2015) questions the new global architecture of computer mediation.

            However, the year of the big data was 2016. Almost 40% of the articles in the ‘Basket of Eight’ were published in 2016. Considering the growth of publications and interest in the topic, several studies guiding BDA research gained space.

            In an editorial in the Journal of the Association of Information Systems, Abbasi, Sarker and Chiang (2016) discuss the emerging implications for theory and methodology arising from big data’s disruptive effects. In line with this, in that very year, MIS Quarterly published its second editorial related to big data (RAI, 2016)—drafting opportunities for IS research—and published a special issue on BDA, leveraging the number of articles on the topic.

            Moreover, Ketter et al. (2016) present a conceptual and methodological approach by which IS research can address BDA issues, while Baesens et al. (2016) provide a perspective on emerging research opportunities regarding big data, and Muller et al. (2016) set guidelines for conducting BDA studies in IS.

            At the same time, 2016 is also marked by studies introducing new models and techniques. In this regard, we highlight the works of Brynjolfsson, Geva and Reichman (2016), who demonstrate a crowd-squared approach for predicting search trend data; Lash and Zhao (2016), who create a system able to predict movie profitability in the preproduction stage; and Shi, Lee and Whinston (2016), whose works enhance decision making in mergers and acquisitions through BDA techniques.

            Furthermore, Menon and Sarkar (2016) present a scalable approach to solve privacy concerns when sharing transactional databases, and Li, Chen and Nunamaker (2016) develop a system that is capable of identifying underground economy sellers. Finally, Clarke (2016) draws attention to the moral and legal responsibilities of computing researchers and professionals.

            Apparently, big data analytics publications reached their peak in 2017. The large number of publications was replaced by a reduced (but not less notable) quantity of articles. In fact, the works produced in 2017 brought novel insights. Kelly and Noonan (2017), with the Indian public health service, show how systematic practices of working with data prevail and how the challenge of conceiving new forms of data continues to appear in familiar ways.

            Furthermore, Guo et al. (2017) innovate with a system framework capable of extracting a small number of articles that could represent the diversified content generated on an organizational blogging platform. Finally, Gunther (2017) clarifies how organizations realize value from big data—a concept further investigated by Müller, Fay and Brocke (2018)—providing objective estimations of BDA business value.

            The publications of 2018 were marked by a few exotic studies and novel contributions. In this regard, we mention the work of Aversa, Cabantous and Haefliger (2018), in which, by means of a Formula 1 race, the authors determine that decision support system (DSS) potential failure is exacerbated under pressure and time constraints.

            Additionally, Deng et al. (2018) and Li, Dalen and Rees (2018) analyze sentiment within big data. The former authors show the influence of microblog sentiment on stock returns, while the latter verify that stock microblog features serve as proxies for market sentiment. Furthermore, Lehrer et al. (2018) clarify how BDA technologies enable service innovation, and Zhou et al. (2018) identify the limits of BDA; they verify that increasing review volume reduces customer agility.

            Based on the number of publications and exploration of the field, it seems that BDA is reaching a plateau—which might indicate its maturity level. The next figure shows the number of publications per year, demonstrating the evolution of the field in terms of articles published in the ‘Basket of Eight’.

com linha tendencia

Figure 3: BDA Production in the ‘Basket of Eight’

Source: created by the authors

            Given the context presented, we opted to compare this evidence within an expanded scenario. Searching for the words ‘Big Data Analytics’ on Google Trends we found signs that corroborates to the possible plateau the technology might be reaching. As shown in the next figure, based on the production between 2010 and 2018, an inflection point takes place in 2017 and a tendency of a decreasing number of articles in the field—which might be confirmed in the following years.

Figure 4: BDA Production according to Google Trends

Source: created by the authors—based on Google Trends data

            The fact that it might be reaching a plateau does not mean, however, that the field is fully explored. Instead, it might show the maturity of the technology.

            Based on the published content since 2010 in the ‘Basket of Eight’, it is possible to identify different waves of BDA, as shown in the following figure. The analyses clarify diverse events that include BDA’s first studies, potential for business, social media data, consequences, research concerns, information security and privacy concerns, new models and techniques, sentiment analyses, and finally (what appears to be) a plateau.

Flecha

Figure 5: BDA Evolution

Source: created by the authors

4.4.          Research Trends

            Several highlights from articles from a few years ago have already been addressed, which leads us to focus on those ideas we consider more relevant and that are still in need of further research. We also choose not to highlight issues regarding specific topics from other areas (e.g., mergers and acquisitions, the stock market, customer behavior); without denying the value of these formidable works, their scope goes beyond the IS field.

            Therefore, we try to indicate future studies in a broader sense, bringing out findings that may be applicable in the information systems field as a whole. Similarly, we do not focus on broader variations of similar studies (e.g., allowing the generalizability of research results, enhancing study validity, or approaching other—but similar—dimensions or domains). Rather, we mostly choose insights that we believe somehow shake up BDA in the IS field. In the following table, we compile promising research opportunities on this topic based on the analysis of the selected articles.

Table 1: Research Opportunities

Research Opportunities

Brief Description

Authors

Theories            and Methods

BDA is not merely a data process change but is highly disruptive for academic studies, making it necessary to reassess our research methodologies, assumptions, and substantive questions.

ABBASI; SARKER; CHIANG (2016); BAESENS et al. (2016); LYYTINEN; GROVER (2017).

Interdisciplinary Studies

Researchers should consider collaborating with other areas, which could result in advancing the IS field through the introduction of new methodological tools.

AVERSA; CABANTOUS; HAEFLIGER (2018); BREUKER et al. (2018); GUNTHER et al. (2017); LOEBBECKE; PICOT (2015); MULLER et al. (2016).

Privacy,   Ethics,    Security, and Surveillance

There is a need for studies on surveillance by private and public authorities, which includes the protection of individual rights, privacy, ethical issues, and risk concerns.

BREUKER et al. (2018); GUNTHER et al. (2017); LOWRY; DINEV; WILLISON (2017); ZUBOFF (2015).

Service Innovation

There are missing studies on approaching BDA materiality and how it enables service innovation.

KELLY; NOONAN (2017); LEHRER et al. (2018).

New BDA Applications

There are several research opportunities regarding BDA applications, including sentiment, perspectives from outside the data, and the meaning and relevance of images and videos.

AVERSA; CABANTOUS; HAEFLIGER (2018); CONSTANTIOU; KALLINIKOS (2015); DENG et al. (2018); GUO et al. (2017); SABOO, KUMAR; PARK (2016); KITCHENS et al. (2018).

Governance

There is a need to broaden our understanding of information governance, identifying how antecedents (enablers or inhibitors) apply to it and how it affects organizational performance.

TALLON; RAMIREZ; SHORT (2014)

Social Impacts

Studies on the broad social issues raised by BDA are missing, including how digitization (as an actor) affects social relationships.

LOEBBECKE; PICOT (2015); NEWELL; MARABELLI (2015).

BDA Value

There is still a gap in reliable empirical evidence on BDA’s business value, making it necessary to explore how organizations effectively convert big data potential into economic and social value.

ABBASI; SARKER; CHIANG (2016); GROVER et al. (2018); GUNTHER et al. (2017); MULLER; FAY; BROCKE (2018).

Source: created by the authors

5.       CONCLUSION

            After a peak in publications in 2016, it appears that BDA will soon reach a plateau—which might be confirmed by publications in the following years. In part, it may be that BDA is being replaced by new terminologies (e.g., data science, datification) but mostly it is being transformed to have new, complex and deep ramifications.

            It seems that we are arriving at a land of big data impacts. We are going through a transition in which a new analytical mindset is taking created place, and the boundaries of what we can and cannot do are still obscure.

            Given the availability of data, different kinds of devices, machine learning, algorithms, sensors, and data clouds provide endless possibilities. Many solutions have been found. Perhaps other perspectives can now be more deeply explored. According to the papers analyzed in this research, there are topics that still need further investigation. In this regard, we highlight privacy concerns and ethical aspects, impacts on society, new applications, and interdisciplinary research that might constitute new waves of studies, defining and limiting BDA boundaries.

            Concerning the privacy and ethical aspects, no one wants to live in a ‘Big Brother’ environment, but we all want the privileges that the ‘sharing’ of data allows. There is a need to revisit various aspects of the social pact with technology, considering where more transparency and information are needed. What is our relationship to data and how can it help or harm us? People need to understand where they are heading and what big data means for the market. When accepting cookies to access certain data, for instance, how many people actually know what a cookie is? There is a need to educate and inform people to make them understand the tradeoffs that come with the data that they provide.

            Furthermore, most of the related works focus on increasing efficiency, mainly on supporting the private sector. Perhaps opportunities to explore the benefits for society and other areas are being left behind. How can BDA effectively help people’s lives in cities? How might BDA help with water consumption in less developed regions, agriculture, or governments—how can it generate value for society? In part, these results may have been found because of the nature and purpose of the searched journals.

            However, the above are still issues that might be more deeply explored. That is, studies on BDA could explore how to improve people’s quality of life, not just how to increase business results. We mean that big data analytics can go beyond cost reduction, optimization, productivity gains, increased efficiency, and so on—by providing analyses from a social perspective.

            In addition, it seems that new techniques will form a continuum in BDA, especially in congregating data. We understand that integrating silos of data might be a fruitful path to explore. Future works might expand the area through collaborations of IS academics and professionals in other fields, integrating advances such as machine learning and human interaction and developing systems to integrate others.

            In addition, the absolute absence of publications from South America and Africa, as well as the modest participation of BRICS, whereby Brazil, Russia, and South Africa are still mute in the leading journals, is frightful. Professionals, researchers, and even government agents from these large nations might lose the opportunity to explore a field full of possibilities. We hope that this finding encourages them to expand their research in this area. At the same time, the University of Liechtenstein, for example, might be an outstanding place for the development of data science professionals.

            This study contributes to the academy by synthesizing major challenges and concerns regarding big data analytics, presenting its evolutionary waves and development over time and indicating research tendencies that can be further explored—and that go beyond business efficiency. For practitioners, it presents techniques and models that have been successfully applied and that are rapidly being disseminated. At the same time, it warns about the limits of BDA and draws attention to issues that should be considered.

            The more that technologies develop, the more possibilities there are. This might be an endless race: each time faster, each time better. Big data analytics are good for those who produce and for those who consume. However, this does not give us the right to ignore the impact that the technology generates.

            Debates regarding machines taking our jobs are pertinent and essential, of course, but this is yet another chapter of the industrial revolution—which is now taking place by means of other kinds of technology. Further debates and studies are needed to understand (and forecast) changes and to define proper boundaries—whether through ethical, cultural, legal, or other means. When the elevator was invented, the obligatory position of the elevator operator was created. Disruptive technologies go through this process of acceptance in various spheres of society.

            Last, although this research accomplishes the aim of providing a broad picture of BDA in the most acknowledged journals, this study is limited by the method adopted, as it analyzes only the eight major journals in IS. More studies expanding this perspective could provide a broader view of the field. Additionally, future studies could adopt other methods of content analysis to treat the data collected, such as semantic, morphological, structural, syntax, among others.

REFERENCES

ABBASI, A.; SARKER, S.; CHIANG, R. H. L. (2016) Big Data Research in Information Systems: Toward an Inclusive Research Agenda. Journal of the Association of Information Systems, v. 17, n. 2, p. 1–32.

AVERSA, P.; CABANTOUS, L.; HAEFLIGER, S. (2018) When Decision Support Systems Fail: Insights for Strategic Information Systems from Formula 1. Journal of Strategic Information Systems, v. 27, n. 3, p. 221–236.

BAESENS, B.; BAPNA R.; MARSDEN J. R.; VANTHIENEN, J.; ZHAO J. L. (2016) Transformational Issues of Big Data and Analytics in Networked Business. MIS Quarterly, v. 40, n. 4, p. 807-818.

BREUKER, D.; MATZNER, M; DELFMANN, P.; BECKER, J. (2016) Comprehensible Predictive Models for Business Process. MIS Quarterly, v. 40, n. 4, p. 1009-1034.

BRYNJOLFSSON, E.; GEVA, T.; REICHMAN, S. (2016) Crowd-Squared: Amplifying the Predictive Power of Search Trend Data. MIS Quarterly, v. 40, n. 4, p. 941-961.

CHATLA, S. B.; SHMUELI, G. (2017) An Extensive Examination of Regression Models with a Binary Outcome Variable. Journal of the Association for Information Systems, v. 18, n. 4 p. 340–371.

CHAU, M.; XU, J. (2012). Business Intelligence in Blogs: Understanding Consumer Interactions and Communities. MIS Quarterly, v. 36, n. 4, p. 1189-1216.

CHEN, M.; WANG, P. (2018) A Roadmap to Determine the Important Factors of the House Value: A case study by using actual price registration data of Taipei housing transactions. Independent Journal of Management and Production, v. 9, n. 1, p. 245-261.

CHEN, D. Q.; PRESTON, D. S.; SWINK, M. (2015) How the Use of Big Data Analytics Affects Value Creation in Supply Chain Management. Journal of Management Information Systems, v. 32, n. 4, p. 4–39.

CHEN, H.; CHIANG; R. H.; STOREY, V. C. (2012) Business Intelligence and Analytics: From Big Data to Big Impact. Journal of Management Information Systems Quarterly, v. 36, n. 4, p. 1165-1188.

CLARKE, R. (2016) Big Data, Big Risks. Information Systems Journal, v. 26, n. 1, p. 77–90.

CONSTANTIOU, I. D.; KALLINIKOS, J. (2015) New Games, New Rules: Big Data and the Changing Context of Strategy. Journal of Information Technology, v. 30, n. 1, p. 44–57.

DENG, S.; HUANG, Z.; SINHA, A. P.; ZHAO, H. (2018) The Interaction Between Microblog Sentiment and Stock Returns: An Empirical Examination. MIS Quarterly, v. 42, n. 3, p. 895–918.

GALLIERS, R. D.; NEWEL, S.; SHANKS, G.; TOPI, H. (2017) Datification and its Human, Organizational and Societal Effects: The Strategic Opportunities and Challenges of Algorithmic Decision-Making. Journal of Strategic Information Systems, v. 26, n. 3, p. 185–190.

GROVER, V.; CHIANG, R. H. L.; LIANG, T.; ZHANG, D. (2018) Creating Strategic Business Value from Big Data Analytics: a research Framework. Journal of Management Information Systems, v. 35, n. 2, p. 388–423.

GUNTHER, W.; MEHRIZI., M.; HUYSMAN, M.; FELDBERG, F. (2017) Debating Big Data: A Literature Review on Realizing Value from Big Data. Journal of Strategic Information Systems, v. 26, n. 3, p. 191–209.

GUO, X.; WEI, Q.; CHEN, G.; ZHANG, J.; QIAO D. (2017) Extracting Representative Information on Intra-Organizational Blogging Platforms. MIS Quarterly, v. 41, n. 4, p. 1105-1127.

HAN, S.; PARK, S.; OH, W. (2016) Mobile App Analytics: A Multiple Discrete-Continuous Choice Framework. MIS Quarterly, v. 40, n. 4, p. 983-1008, 2016.

KELLY, S.; NOONAN, C. (2017) The doing of Datafication (and What this Doing Does). Journal of the Association for Information Systems, v. 18, n.12, p. 872–899.

KETTER, W.; PETERS, M.; COLLINS, J.; GUPTA, A. (2016) Competitive Benchmarking: An IS Research Approach to Address Wicked Problems with Big Data Analytics. MIS Quarterly, v. 40, n. 4, p. 1057–1080.

KITCHENS, B.; DOBOLYI, D.; LI, J.; ABBASI, A. (2018) Advanced Customer Analytics: Strategic Value Through Integration of Relationship-Oriented Big Data. Journal of Management Information Systems, v. 35, n. 2, p. 540–574.

LASH, M. T.; ZHAO, K. (2016) Early Predictions of Movie Success: The Who, What, and When of Profitability. Journal of Management Information Systems, v. 33, n. 3, p. 874–903.

LEHRER, C.; WIENEKE, A; BROCKE, J. V.; JUNG, R; SEIDEL, S. (2018) How Big Data Analytics Enables Service Innovation: Materiality, Affordance, and the Individualization of Service. Journal of Management Information Systems, v. 35, n. 2, p. 424–460.

LI, W.; CHEN, H.; NUNAMAKER, J. F. (2016) Identifying and Profiling Key Sellers in Cyber Carding Community: AZSecure Text Mining System. Journal of Management Information Systems, v. 33, n. 4, p. 1059–1086.

LI, T.; VAN DALEN, J.; VAN REES, P. J. (2018) More than just noise? Examining the information content of stock microblogs on financial markets. Journal of Information Technology, v. 33, n. 1, p. 50–69.

LOEBBECKE, C.; PICOT, A. (2015) Reflections on Societal and Business Model Transformation Arising from Digitization and Big Data Analytics: a research agenda. Journal of Strategic Information Systems, v. 24, n. 3, p. 149–157.

LOWRY, P. B.; DINEV, T.; WILLISON, R. (2017) Why Security and Privacy Research Lies at the Center of the Information Systems (IS) Artefact: Proposing a Bold Research Agenda. European Journal of Information Systems, v. 26, n. 6, p. 546–563.

LYYTINEN, K.; GROVER, V. (2017) Management Misinformation Systems: A Time to Revisit? Journal of the Association for Information Systems, v. 18, n. 3, p. 1–44.

LUVIZAN, S.; DINIZ, E. (2017) Big Data e o Uso Secundário de Dados: Desafios para a Qualidade de Dados e a Inovação. In: Encontro da Associação Nacional de Pós-Graduação e Pesquisa em Administração, XLI, Sao Paulo, Proceedings. Sao Paulo: ENANPAD, 2018. 

MANYIKA, J.; CHUI, M.; BROWN, B.; BUGHIN, J.; DOBBS, R.; ROXBURGH, C.; BYERS, A. H. (2011) Big Data: The Next Frontier for Innovation, Competition, And Productivity. McKinsey Global Institute.

MENON, S.; SARKAR, S. (2016) Privacy and Big Data: Scalable Approaches to Sanitize Large Transactional Databases for Sharing. MIS Quarterly, v. 40, n. 4), p. 963-981.

MCAFEE, A.; BRYNJOLFSSON, E. (2012) Big Data: The Management Revolution. Harvard Business Review, p. 1–9.

MULLER, O.; JUNGLAS, I.; BROCKE., J.; DEBORTOLI, S. (2016) Utilizing Big Data Analytics for Information Systems Research: Challenges, Promises and Guidelines. European Journal of Information Systems, v. 25, n. 4, p. 289–302.

MULLER, O.; FAY, M.; VOM BROCKE, J. (2018) The Effect of Big Data Analytics on Firm Performance: An Econometric Analysis Considering Temporal Dynamics and Industry Characteristics. Journal of Management Information Systems, v. 35, n. 2, p. 488–509.

NUAIMI, E; NEYADI, H.; MOHAMED, N; AL-JAROODI, J. (2015) Applications of big data to smart cities. Journal of Internet Services and Applications, v. 6, n. 25, p. 1–15.

NEWELL, S.; MARABELLI, M. (2015) Strategic Opportunities (and Challenges) of Algorithmic Decision-Making: A Call for Action on the Long-Term Societal Effects of ‘Datification’. Journal of Strategic Information Systems, v. 24, n. 1, p. 3–14.

PARK, S.; HUH, S.; OH, W.; HAN, S.P. (2012) A Social Network-Based Inference Model for Validating Customer Profile Data. MIS Quarterly, v. 36, n. 4, p. 1217–1237.

RAI, A. (2016) Synergies Between Big Data and Theory. MIS Quarterly, v. 40, n. 2, p. iii–ix.

SABOO, A. R.; KUMAR, V.; PARK, I. (2016) Using Big Data to Model Time-Varying Effects for Marketing Resource (Re) Allocation. MIS Quarterly, v. 40, n. 4, p. 911–939.

SHI, Z.; LEE, G.; WHINSTON, A. (2016) Toward a Better Measure of Business Proximity: Topic Modeling for Industry Intelligence. MIS Quarterly, v. 40, n. 4, p. 1035-1056.

TALLON, P.; RAMIREZ, R.; SHORT, J. (2014) The Information Artifact in IT Governance: Toward a Theory of Information Governance. Journal of Management Information Systems, v. 30, n. 3, p. 141–177.

The Economist (2018) Does China’s digital police state have echoes in the West? Special Report on Leaders, May 31st. Acessed in 12/07/2018. <https://www.economist.com/leaders/2018/05/31/does-chinas-digital-police-state-have-echoes-in-the-west>

The New York Times (2019) San Francisco Bans Facial Recognition Technology. Acessed in 10/06/2019.

<https://www.nytimes.com/2019/05/14/us/facial-recognition-ban-san-francisco.html>

WEBSTER, J.; WATSON, R.T. (2002) Analyzing Past to Prepare for Future: Writing Literature Review, MIS Quarterly, v. 26, n. 2, p. xiii—xxiii.

ZHOU, S.; QIAO, Z.; DU, Q.; WANG, G. A.; FAN, W.; YAN, X. (2018) Measuring Customer Agility from Online Reviews Using Big Data Text Analytics. Journal of Management Information Systems, v. 35, n. 2, p. 510–539.

ZUBOFF, S. (2015) Big Other: Surveillance Capitalism and the Prospects of an Information Civilization. Journal of Information Technology, v. 30, p. 75–89.