The Potential Pitfalls of Big Data in Healthcare – An Evolutionary Tale

Potential Pitfalls of Big Data in Healthcare - An Evolutionary Tale

Big data is being successfully leveraged in many industries by both traditional information companies, as well as companies that would not be thought of in this way. There are many examples of successful uses of big data, such as Target’s pregnancy score and Macy’s ability to create on-demand pricing of their product offerings.

The literature is full of potential uses of big data in healthcare, but there are not as many published success stories, especially at the provider level, which can include hospitals or physician practices. There are some success stories such as HCA’s development of a data center, as well as University of Pennsylvania Medical Center’s (UPMC) investment.

Why are some providers successful in this space while others seem to struggle? What is the evolution of analytics in the provider space?

Defining Big Data

The best definition of big data that I have seen is any time that the volume, velocity or variety of its data is overwhelming to the institution, then the data is defined as “big” … otherwise, it’s just data.

Hospitals and physician practices now have access to electronic patient treatment information in the Electronic Medical Records (EMR), treatment deliver systems, billing system, imaging systems and many other systems that collect data as part of treating, managing or marketing to the patient.

The actual volume of data may not be overwhelming, but either the velocity or variety is overwhelming to the local provider. In other words, the treatment and management systems collect the data, but the ability to consolidate and analyze it either does not exist or is very early in the evolutionary process.

There appear to be several areas in the hospital culture that need to evolve to be successful in the big data space.

Technology as an Investment

First is the shift from viewing technology as an expense to viewing it as an investment. Likely due to narrow margins and the competitive market place, providers attempt to limit their expenditures on technology that is required to develop an analytics environment.

The providers that find success collecting and analyzing big data strategically invest in the infrastructure required to store, process, clean and analyze the data. The investment may start small, such as appropriate analytics software or a staff member with analytics ability. Even if the investment is initially small, the project should be strategically important so it receives appropriate attention through the process.

Consolidating Analytics

The second area of evolution is moving from a culture where each department or unit has its own analyst, software and reporting metrics to a consolidated analytics unit that provides consistent, branded reports to its clients both internally and externally. This requires a change in the culture within the provider.

Obviously, this structure has some benefits. Each individual department’s resources do not match the sum of the total in terms of technology investments and the automation of reports that are consistent. A strategic investment that pulls the resources of several departments will usually serve the business better than many small, limited investments.

The evolution to a consolidated data center requires a champion sometimes called a chief data officer or a chief analytics officer who probably should not be placed below the chief financial officer or a chief technology officer.

Rather, this individual should be seen as peer within the C-suite that develops the reporting and analysis strategy as part of the overall strategic plan that includes quality of care, patient satisfaction and many other metric-driven initiatives.

The creation and development of an analytics culture and department also requires the identification of individuals with the correct skill set. Initially, when the culture is in its early stages of its evolution, the analytics staff members will need to serve many roles. The individuals will need to be able to interact with many different systems to extract and clean data while also serving as a consultant identifying the opportunities, the appropriate sources of the data, the correct analytical methodology and the most appropriate way to report back.

Luckily, there are education programs to assist with the development of staff that immediately contribute to the organization, such as Kennesaw State University’s Department of Statistics and Analytical Sciences.

As the organization matures, the center’s staff can be specialized in areas such as programming, reporting and consulting to increase capacity and sophistication.

Using Existing Infrastructure

Another area of the evolution requires learning to identify specific problems that analytics can assist and the current infrastructure can support. It is not possible to ‘boil the ocean’ or collect and process all of the available data that could be leveraged for a specific problem. Instead, use the existing resources and infrastructure to identify problems that could benefit from targeted analytics.

For instance, if there are issues with sepsis infections after admission, then use the information in the EMR to identify those patients among the population of patients who were treated. This assumes that the data from the EMR can be extracted and there is a consistent way to identify the patients with the disease. Then, data mining techniques can be utilized to identify patterns of risk for those patients during the admission process.

It is tempting to create a huge list of problems to work on, but the list should be prioritized and matched against the available resources.

Useful Data?

The final area is ensuring the data is useful. It is my experience that this is the most challenging since it requires standardization of terms used, treatment protocols, identification of specific areas to record specific items and other tasks that affect the point-of-care workflow.

I was working with a large physician practice’s EMR. It took nearly a year to create standard templates in the EMR and train the nurses to use them. It then took another year of training to ensure the data was consistently captured across all of the practices. The quality assurance checks required a place to store the data so it could be processed. Initially, this was done with text files until a process was created using SAS that could access the backend of the EMR directly.

The initial overhead included staff that understood healthcare data, EMRs, as well as the analytical software to process, analyze and report the information. The future evolution includes creating a data lake that will collect and preprocess data from across the enterprise.

The EMRs are also seeing an opportunity to provide an additional service helping to speed up the evolution. The providers spent a large sum of money to purchase and integrate the EMR into their practices. Then, they spent additional resources training staff and setting up the EMR to minimize the impact on the workflow of the nurses and physicians. The implementation of the EMR was a huge investment that only collected and stored the patients’ information.

So, Epic as well as other EMRs are partnering with various dashboard providers to incorporate reporting tools based on the patient’s experience as part of the EMR. This additional resource could be fed into the evolution described above by funding a few strategically important research projects that would have broad impact on patient safety, revenue or workflow.

The Bottom Line

Although the creation of an analytics culture within a healthcare provider requires several key items to be successful, the underlying message is to start small with strategically targeted analytics. Continued success requires a transition from seeing technology as an expense to seeing it as an investment. The providers also need to create an enterprise-wide data culture that develops consistent reporting and metrics.

As part of the development of the analytics culture, a champion needs to be identified that can wear many hats, especially early in the evolution. It also requires identification of strategically important analytics that can be built with the given infrastructure.

Finally, it requires the ability to identify specific issues with the data collection, appropriate remedies and the implementation of the remedies. It is not easy but can have huge impacts on the practice, including cost savings and patient satisfaction.


Kennesaw State University offers a Ph.D. in Analytics and Data Science, a full-time traditional program that includes 78 credit hours of courses, internships, research and dissertation. The faculty members have educational and professional backgrounds from varied disciplines and domains, including epidemiology, quality control, engineering, consumer finance and biostatistics. The curriculum is a compelling blend of Statistics, Mathematics and Computer Science, and it emphasizes programming, data mining, statistical modeling and the mathematical foundations supporting these concepts. Kennesaw State is just north of Atlanta, Ga., and doctoral students work with organizations in the public and private sectors to solve real problems using real data beginning in their first year of study.

Learn how Kennesaw State puts big data to work >


Sponsored article written by Herman E. Ray, Ph.D. for icrunchdata News Atlanta, Georgia USA
Want more? Follow us on Twitter | Get Job Alerts | Sign Up for our Newsletter


Channels: Analytics Big Data Data Science


About the Author
Dr. Herman “Gene” Ray is an Associate Professor of Statistics at Kennesaw State University. Prior to his arrival at Kennesaw State, he worked as a research scientist at Thomson Reuters. His academic credentials include a BS and MS in Mathematics from Middle Tennessee State University and a Ph.D. in Biostatistics from the University of Louisville.