Incoming - How to Handle the Influx of Unstructured Data

Incoming - How to Handle the Influx of Unstructured Data

Big data is everywhere — and it's growing. By 2020, companies will need to manage 50 times more data with only a tiny (1.5%) increase to IT staffing. What's more, the rise of unstructured data will far outpace its structured counterpart. Right now, more than 90% of corporate data is unstructured; the explosion of social media metrics, wearable devices and sophisticated data mining tools will generate an influx of data that makes previous information volumes seem like a slow trickle by comparison. How do companies handle the incoming data storm?

Key Differences

To manage the flood, it's important to understand the differences between structured and unstructured data. Think of it like this: Structured data is anything placed in a database with purpose, and stored because it fell within a certain set of values or contained a specific set of characteristics. This is the data companies think they want — simple, straightforward and easy to analyze. Unstructured data, meanwhile, is everything else: Every document, email, presentation file or customer survey; every piece of data that doesn't have a specific “home” or purpose that is easy to articulate. While it's tempting to dismiss this information as mere runoff, the inevitable over-wash of structured data, there is massive potential waiting to be tapped.

Collection of these data types also differs. Health care publication Clinical Leader frames the process of discovering structured and unstructured data as a doctor's office visit. Structured information is easily gained from taking a patient's temperature or blood pressure, while unstructured data requires conversation; the ability to respond, interpret and effectively categorize physical symptoms or other signs of medical need.

Big Volume

As noted above, the sheer volume of big data — both structured and unstructured — is set to skyrocket over the next five years. Two factors driving this trend include the need for more data and the rise of wirelessly connected devices. Having seen the benefits of data analysis in improved decision-making, companies are understandably eager to get their hands on more information. Wireless sensors permit this kind of collection, everything from temperature readings to usage of physical office space and location-based tracking.

As a result, companies need an effective way to handle increased volume without drowning the sounds of effective data insight. Achieving this aim demands a data analysis platform capable of intelligently sorting large volumes of information coupled with corporate big data policy focused on finding relevant data sources rather than trying to hold and analyze everything that comes your way. Simply put, you can't have it all.

Managing Risk

The rise of unstructured data also presents an element of risk. Think of it like this: Unstructured data is, by nature, outside the existing categorization system used by your business. This could be as a result of origin, for example, customer responses to a survey or focus group, but could also stem from purpose — do HR spreadsheets need to be stored indefinitely? If so, should they be mined for critical insight?

Tapping unstructured data, therefore, comes with inherent risk since the inherent value of any piece of information is subject to interpretation. If your company assigns high value to a skewed survey, for example, any insight based on this data may be flawed, as well, and following this insight could be costly.

Best bet? Trial and error with limited scope. Get a feel for what matters, how it should be classified and what line-of-business (LoB) insights are offered.

Potential Insight

Despite the challenges, unstructured data comes with big potential. As noted by data and IT publication Wikibon, tapping this potential requires adding at least some structure to big data through active classification, and the addition of metadata along with active processes to prevent data duplication. In other words, this is a task for the cloud. Eighty-one percent of IT pros argue that cloud computing will be necessary for at least some of their big data projects, and unstructured data is the perfect fit: The on-demand resources and off-site storage of cloud services make them the ideal fit to add just enough “structure” and empower the use of unstructured information.

Ready to get a handle on unstructured data? Know what is different, then get ready for big volumes and possible risks by tapping the cloud.

Article written by Sheldon Smith
Want more? For Job Seekers | For Employers | For Contributors