The interesting thing about data is, if we get huge amounts of information (aka: Big Data) the precision of the data declines, but the probability of the data increases. Big Data, for example, is like the tweets about the NY Yankees and the Facebook data (and other sources) relating to that team, being analysed to determine fan satisfaction with the team.
This kind of data is ‘Big’. If it involves scores of data sources and 9 figures data records it is ‘Big’. But is there a different kind of data that goes beyond what we refer to as ‘Big’ and would be better called, ‘Span Data’?
If you had a data asset of 100 people where 100% of those people bought a blue sports shirt, you could conclude that 100% of people buy blue sports shirts. You would be wrong.
The only conclusion that you can make is that 100% of the 100 people in your data set bought a blue sports shirt. Your sample is very small. If you knew that your data set was entirely made up of males from the ‘Blues’ fan club (Say Manchester City fan club), then a result of 100% is not surprising and probably not very useful. That extra ‘data point’ effectively nullifies the relevance of the sample population.
If your data was on 100,000 people and covered all genders and ages equally and you noticed that 100% of the people bought a blue sports shirt, could you then conclude that 100% of people buy blue sports shirts? You should probably look at the other factors that might influence your findings:
Now, what if you have a sample population that is massive and the data points are hugely diverse, say at a national level in the USA. You have all genders, ages, ethnicity, local and regional factors and a host of other factors. You then notice that 65% bought blue shirts in the last 6 months. What if you could also eliminate any special event that may have occurred during the period? What can you conclude then? Probably that in the same period in the following year it is likely that 65% will buy blue shirts again.
The benefit of the difference between the 100 person sample population and the USA level population is that you can draw a more accurate probability from the data. Critically, the data can now be viewed across a number of criterion. For example, if you sell blue shirts you may want to know which regions or ethnic populations don’t currently buy many blue shirts. This would potentially lead you to new markets.
With that insight, would you make a strategic decision? Perhaps after some investigation you would decide to provide red shirts to that target population (or whichever shirt would most likely penetrate that market) or perhaps you would even decide to use targeted advertisement to force blue shirt penetration into the targeted market.
You may well be thinking that this is ‘Big Data’ in action. I would argue not specifically, even if it is similar.
It tends to be a whole load of data pulled from a variety of sources and then analysed to find ‘interesting patterns’ that emerge.
There is not always complete serendipity, but there is a large serendipity element to most Big Data activity. Some operations have much better control, but I am suggesting that there is a new evolution possible: the child of ‘Big Data’ called ‘Span Data’. Wide Data may be even more accurate, but a ‘Wide Data’ definition already exists at the record/table level.
Span Data differs from Big Data for the following reasons:
An example would be a tyre manufacturer who decides to penetrate a new market. In this case, Big Data will be all about the impact on the Internet and social media, which is useful, but not the full requirement. The traditional approach is based on market size and potential revenue projections, against set up, launch, growth and running costs over a period. But what if the following was conducted on a suitably large market, say the USA, for all tyre sectors?
The key gain from the ‘Span Data’ set is that greater benefit and Insight may be derived. It aids in making key and strategic decisions, where probability of success can be better anticipated or at least efforts can be focused more surgically.
Big Data is an important part of this, but it is not the only data ‘type’ that needs to be considered. Even though the data set is ‘Big’, it more importantly ‘Spans’ a series of critical pillars to provide even more actionable insight.