Insights

3 Critical Capabilities to Propel Your Big Data Initiative Past the Trough of Disillusionment

When I have talked to CIOs, many of them worry that their CEOs and CFOs are going to start complaining that the information garnered from their Big Data initiatives have not made their businesses more money. It seems clear that Big Data cannot continue to be merely a science project.

Clearly, their worries are related to classical project errors around prioritization and initiative scoping. But they also relate to the maturity of the technology used to power their Big Data initiatives in the first place.

To prevent what technology analysts like to call the “trough of disillusionment”, your Big Data initiative needs to avoid the reasons that these projects fail.  These include taking too long to collect data, collecting poor or suspect quality which reduces trust and security and allowing data misuse (or worse yet a breach).

I believe that successful Big Data initiatives are built upon three capabilities:

  1. The ability to defer investment in data assets and products until its value has been established
  2. The ability to govern all data across an enterprise’s information management environment
  3. The ability to manage and secure sensitive data to minimize business risks

Deferring Investment in Data Assets and Products until the Value is Established

Big Data offers the opportunity to flip the business intelligence investment equation. With reasonable business requirements and data hypothesis, Big Data offers the opportunity to try before you buy.

Historically, there was both cost and risk in business intelligence. An enterprise architect put the problem to me this way: In historical business intelligence, he would spend a lot of time with customers capturing their requirements and then ETL developers would spend many months building the data and integration.  When a customer would see the data, they would say, “this is not what I wanted” and they would have to rebuild the data integration again.

Today, we have the opportunity to put all of the relevant data into a Data Lake before making significant investment in the data. With this change, we can enable end users to manage the process of creating business intelligence. At the same time, we can let the data speak to the user directly with self-service tools (instead of building analytics only based on a priori premise) and then with this knowledge allow them to determine what analytics make business sense.

This is why the move to Hadoop-based analytical solutions is only the first step.  The opportunity in front of us is to ingest the body of data into a Data Lake which is a mixture of a Hadoop cluster and traditional data warehousing approaches.

At the same time, Big Data integration offers the potential to deliver high throughput data ingestion and at-scale processing so business analysts can make better decisions using next generation analytics tools.

Put together, Big Data integration helps businesses gain better insights from Big Data because it:

  • Speeds up development, leverages existing IT skills better and simplifies maintenance through the use of a simple visual interface supported by easy-to-use templates
  • Increases performance and resource utilization by optimizing data processing execution and providing flexible, hybrid deployment across a variety of platforms
  • Handles a wide variety of data sources though hundreds of pre-built transformers and connectors, and orchestrates data flows by using broker-based data ingestion

Governing Data across the Enterprise Information Management Environment

In the early days of Big Data, most investment was clearly put into Hadoop and data analysis and less attention was paid governing the data being put into a Hadoop-based Data Lake. Clearly, as we move from science projects, we need to mature our Big Data approaches to data governance. As businesses have begun building complex architectures for Big Data, challenges around data governance and data privacy have only increased.

As Big Data strategies mature, we see more interest in comprehensive Big Data management platforms that handle data integration, data governance and data security for multiple projects across enterprises.  End-to-end Big Data governance and quality means business and IT users can be confident with the data they’re using.

Look for comprehensive data governance that includes:

  • Formal data quality assessments to detect data anomalies sooner
  • Pre-built data quality rules to ensure data is “fit-for-purpose”
  • Universal metadata catalog to facilitate search and automate data processing
  • Entity matching and linking to enrich master data for customers
  • End-to-end data lineage for data provenance, traceability and compliance audits

The need for governance matters with Big Data because of the larger volume and variety of data that originate from multiple sources, collected in one place (typically in Hadoop-based data lakes) and proliferated across many target systems. All this data needs to be governed.

Part of this process also involves investing in data once its business value has been proven. A key element of investing in data is fixing data quality so that it is trustworthy for business users. It can also involve integrating, consolidating or mastering data so it makes sense to business users.

Managing and Securing Sensitive Data to Minimize Business Risk

Big Data security needs to analyze all data in order to quickly detect and act upon risks and vulnerabilities. This requires a 360-degree view of sensitive data, supported by risk analytics and policy-based protection of data at risk. Big Data security should de-identify information controlled by corporate policies and industry regulations. Risk-centric Big Data security must enable:

  • “Single pane of glass” monitoring of sensitive data to provide visibility into the locations of sensitive data
  • Sensitive data discovery and classification for a comprehensive 360-degree view of sensitive data
  • Usage and proliferation analysis for a precise understanding of data risk
  • Risk assessment to help prioritize investments in security programs
  • Non-intrusive, persistent and dynamic data masking to protect sensitive data in development and production environments to help minimize the risk of a security breach

As you can see, Big Data is about a lot more than just the so called 3 Vs (volume, variety and velocity). It is time for Big Data to mature and come into the mainstream. If you would like to read more about avoiding the trough of disillusionment, please click here for more details on the topic.

Article written by Myles Suer
Want more? For Job Seekers | For Employers | For Influencers