When I have talked to CIOs, many of them worry that their CEOs and CFOs are going to start complaining that the information garnered from their Big Data initiatives have not made their businesses more money. It seems clear that Big Data cannot continue to be merely a science project.
Clearly, their worries are related to classical project errors around prioritization and initiative scoping. But they also relate to the maturity of the technology used to power their Big Data initiatives in the first place.
To prevent what technology analysts like to call the “trough of disillusionment”, your Big Data initiative needs to avoid the reasons that these projects fail. These include taking too long to collect data, collecting poor or suspect quality which reduces trust and security and allowing data misuse (or worse yet a breach).
I believe that successful Big Data initiatives are built upon three capabilities:
Big Data offers the opportunity to flip the business intelligence investment equation. With reasonable business requirements and data hypothesis, Big Data offers the opportunity to try before you buy.
Historically, there was both cost and risk in business intelligence. An enterprise architect put the problem to me this way: In historical business intelligence, he would spend a lot of time with customers capturing their requirements and then ETL developers would spend many months building the data and integration. When a customer would see the data, they would say, “this is not what I wanted” and they would have to rebuild the data integration again.
Today, we have the opportunity to put all of the relevant data into a Data Lake before making significant investment in the data. With this change, we can enable end users to manage the process of creating business intelligence. At the same time, we can let the data speak to the user directly with self-service tools (instead of building analytics only based on a priori premise) and then with this knowledge allow them to determine what analytics make business sense.
This is why the move to Hadoop-based analytical solutions is only the first step. The opportunity in front of us is to ingest the body of data into a Data Lake which is a mixture of a Hadoop cluster and traditional data warehousing approaches.
At the same time, Big Data integration offers the potential to deliver high throughput data ingestion and at-scale processing so business analysts can make better decisions using next generation analytics tools.
Put together, Big Data integration helps businesses gain better insights from Big Data because it:
In the early days of Big Data, most investment was clearly put into Hadoop and data analysis and less attention was paid governing the data being put into a Hadoop-based Data Lake. Clearly, as we move from science projects, we need to mature our Big Data approaches to data governance. As businesses have begun building complex architectures for Big Data, challenges around data governance and data privacy have only increased.
As Big Data strategies mature, we see more interest in comprehensive Big Data management platforms that handle data integration, data governance and data security for multiple projects across enterprises. End-to-end Big Data governance and quality means business and IT users can be confident with the data they’re using.
Look for comprehensive data governance that includes:
The need for governance matters with Big Data because of the larger volume and variety of data that originate from multiple sources, collected in one place (typically in Hadoop-based data lakes) and proliferated across many target systems. All this data needs to be governed.
Part of this process also involves investing in data once its business value has been proven. A key element of investing in data is fixing data quality so that it is trustworthy for business users. It can also involve integrating, consolidating or mastering data so it makes sense to business users.
Big Data security needs to analyze all data in order to quickly detect and act upon risks and vulnerabilities. This requires a 360-degree view of sensitive data, supported by risk analytics and policy-based protection of data at risk. Big Data security should de-identify information controlled by corporate policies and industry regulations. Risk-centric Big Data security must enable:
As you can see, Big Data is about a lot more than just the so called 3 Vs (volume, variety and velocity). It is time for Big Data to mature and come into the mainstream. If you would like to read more about avoiding the trough of disillusionment, please click here for more details on the topic.