In the world of big data, “bad data” exists. You’ve probably heard the term “bad data” before, but how would you define it?
It seems that the definition of “bad data” is evolving and debatable.
Many blogs already address the topic. In fact, existing books and complete chapters within books are dedicated to the idea and defining the term.
For example, in her blog, “What is Bad Data and its Side-Effects,” Mahak Vasudev defines bad data as “Dirty Data” and “information that can be erroneous, misleading and without general formatting.” And, in his "Bad Data Handbook," author Q. Ethan McCallum devotes the entire first chapter to the task of defining bad data, which he purports isn’t an easy one.
McCallum states: “It’s tough to nail down a precise definition of “Bad Data.” Some people consider it a purely hands-on, technical phenomenon: missing values, malformed records and cranky file formats. Sure, that’s part of the picture, but Bad Data is so much more. It includes data that eats up your time, causes you to stay late at the office, drives you to tear out your hair in frustration. It’s data that you can’t access, data that you had and then lost, data that’s not the same today as it was yesterday…”
Many technology organizations today who are attempting to define what constitutes bad data are doing so within the broader contexts of big data, data management and data quality, are communicating warnings about bad data and its associated costs and are coming to the market with new IT solutions, designed to help organizations reign in their bad data and realize the full value of their organizational data.
Though the marketplace is full of messages about the pitfalls of organizational bad data, some, like Roger M. Stein, have posed that bad data can sometimes lead to good decisions.
Regardless of whether data is defined as big, good, bad or ugly, how often do you consider the good and bad uses and applications of data?
The hashtag #DataForGood has been trending lately with wonderful examples of how big data is being analyzed and used for the greater good. It may be somewhat unpopular to consider, but what about the ways in which data is being analyzed and used for less than above board reasons?
In his blog, “The Duality of Big Data: The Angel and the Demon,” Daniel Riedel addresses the yin and yang nature of big data and cites several examples of how data can be used for good contributions to society at large as well as for corrupt purposes, including fraud by individuals and governments around the world.
If you haven’t been thinking about the potential bad and good applications of big data, maybe you should – especially since you’re likely generating your own personal, steady stream of data into the ocean of big data. As stated in this video promo about journalist Rick Smolan’s "The Human Face of Big Data" book:
“Our smart devices are turning each of us into human sensors. We now leave a trail of digital exhaust, a perpetual stream of texts, location data and other information that will live on forever. But who owns the data we generate? Who profits from it? And, why are governments and corporations the only ones thinking about the impact of big data?”
As we move into the future, you may want to give big data and its applications deeper personal thought and consideration. For now, which hashtag do you gravitate towards? #DataForGood, #DataForBad or both?