It's well known that organizations and individuals deal with a large number of text documents on a daily basis. This unstructured data is more than four times the amount of structured data stored or managed by organizations. Is it valuable? Absolutely.
However, when we need to define why this unstructured data has strategic value for an organization, the generic ways we describe this value (to gain insight, to increase our knowledge, to provide the full picture, etc.) aren't very helpful.
In this post I will try to clear the fog and present some concrete examples of WHAT you should look for in unstructured data and HOW you should calculate its value.
We know that there are golden nuggets contained inside the documents we store and (should) read that can help us make better decisions or be more effective in our work.
People like to share the things they know with others (or maybe sometimes just the things they think they know) and traditionally the best form has always been in writing. To quantitatively evaluate the value of this knowledge, we need to be more specific and describe exactly what we are looking for in each scenario.
So what are some examples of this strategic "knowledge" that is buried in the corporate or publicly accessible news repositories?
Take for instance any one of these scenarios:
You could go on and on, but in these scenarios each person would have a pretty clear idea of what information they need and would want access only to that information in the shortest possible time.
Achieving this would require a human or a software to read all the content for them up front so that they could quickly sift through the material and eliminate the unimportant from the essential. This is strategic knowledge.
How can you determine its value? Start by analyzing what would be the cost to acquire that knowledge (to initiate new research, hire a consultant, etc.) and then consider the costs related to the risks of accomplishing these tasks and eventual goals without having the knowledge you need. With a thorough analysis, it will be immediately clear that even conservatively, these costs are staggering.
Here things start to get even more interesting. Documents contain data that actually look like data. They come in the form of numbers and can be assigned to a clearly defined variable. For example, the paragraphs inside annual reports add valuable data to the tabled P&L and Balance Sheets. Academic papers offer valuable data points in pdfs or word documents and not in a database, etc. As for the case of knowledge, a good text analytics tool can help turn this "hidden" data in a DB for further analysis.
And that's not all. For example, different social media can be analyzed to identify which attributes customers are assigning to a brand (is it cool, modern, etc.), which communications between investors could help predict how a stock will fluctuate and which dialogues between suspects can help avoid a terrorist attack.
In each of these situations, data does not present itself in the form of numbers as in the previous cases. Data is extrapolated from the analysis of text and turned into a format (such as 1’s and 0’s) that can be understood by a predictive model. This kind of data can be even more valuable.
If you take any of these scenarios, it is not very complicated to associate a quantitative value to the availability of this data. What is the value of monitoring whether the perception of your brand is in line with your marketing strategy? How valuable is it to learn that today instead of three months from now?
Start by considering the investment required to change this perception, as well as the reduced sales deriving from this misalignment. Once again, the numbers go up pretty quickly. Then add that to the cost of not accessing the knowledge you have.
You might argue that this is just another aspect of knowledge, but I beg to differ. Getting answers to questions addresses different needs. When you look for knowledge, your needs are much broader and less defined and you need to expand your understanding. You are essentially looking for a combination of elements that do not answer a single question. Often what you or your organization need are just simple answers..
Think about manuals, wikis, etc. Customer care specialists or customers, themselves might look for an answer about how to do something. They don't want to learn everything there is to know about how a piece of hardware or an application works, they just want the one piece of information they need.
Once again, documents are great sources of answers. A simple question can help isolate these answers. This is another very specific capability of text analytics/smart cognitive computing software. It can read both content and questions for you, understand the facts and serve them back to match the requirements. The value is measurable in terms of productivity and compared to the two previous points, it is even easier to calculate the value of a tool that enables you to address your need for answers.
In summary, when people talk about unstructured information, they are often vague about why this information is strategic. We identified three very specific outputs deriving from understanding the content of a document. Today each of these can be automated. A person will always be more precise than a machine in reading a document, but a person can't read millions of documents per day.
Today, complexity requires you to do so just to be effective. When you are asking yourself in the future what the value of a semantic, text analytics or cognitive computing platform could be, refer back to the specific needs. They can probably be classified in one of these three categories. Calculating the return on the investment with this structured process will be easier than you think.