How to Keep Your Data Lake From Becoming a Data Swamp

How to Keep Your Data Lake From Becoming a Data Swamp

Data lakes refer to massive storage of any structured and unstructured data at a big data scale. Data from various streams flow into the data lake and are available to cross-functional data scientists to examine and interpret patterns for predictive analytics and machine learning. On the surface, the idea sounds fantastic and full of possibilities. Many enterprises jumped on the bandwagon and created Hadoop-based repositories and started filling those with all kinds of data.

Whether or not organizations are finding a business value from their data lakes, however, is yet to be determined. In the hope of some future use, many companies are blindly putting all their data into their data lakes without any objectivity, governance or traceability.

Data Lakes or Data Swamps?

Without proper metadata and quality assurance of data, over time the data in lakes becomes unusable. Eventually, the data lakes become so-called data swamps that neither provide any operational value nor deliver any business insights. Even if enterprises use sophisticated tools to analyze and visualize big data, the lack of correlation back to accurate master profiles and operations means there are no guarantees that the answers are reliable. Companies need to put data management principles and processes in place to improve the data reliability.

What if you could blend master data and big data across all your internal and external systems, third party data subscriptions and social media sources? What if you could quickly match, merge, clean and relate all these data entities to create a reliable data foundation? What if your business applications and big data analytics platforms had real-time access to this trustworthy information in a closed loop? If we could do all this, imagine the business challenges that could be addressed with this information. A modern data management foundation such as this is core to put your big data to sound business use.

Data-Driven Applications

Any business endeavor needs to fulfill a business purpose. The initial goal of collecting such large amounts of data is to help the business make data-driven decisions, uncover new opportunities and mitigate finance and compliance risks. Data-driven applications help achieve that goal by creating a comprehensive picture of business entities such as customers, products, places, channels and activities by combining cleansed data from all sources and revealing relationships across these entities. Understanding the complex relationships across all your data entities is important. By identifying and visually revealing relationships between people, products, places and activities your business cares about, data-driven applications focus on the most valuable products, biggest opportunities and most influential customers.

With data-driven applications, business professionals work with industry-specific applications that bring together data and insights relevant to the task at hand to make better-informed decisions that have an immediate impact. Unlike analytics-only tools, such applications provide user-friendly visuals and also guidance in the form of intelligent recommendations for improvement and ability to act collaboratively, all within the operational use.

For instance, you can create a full 360 degree view of a customer by bringing together their profile data, historical interactions, past transactions and service tickets. You can bring in insights like their business value and churn propensity. You can also bring recommended actions from predictive analytics and machine learning that prompt users for the next best action or offer for the customer. Now your big data is delivering real value. Another essential characteristic of data-driven application is closed-loop feedback for immediate actions, such as alerts for a compliance risk, proposed steps to improve data quality or business suggestions to improve customer experience.

Putting Big Data to Use

Deriving business value from your big data initiatives depends on two key elements:

  1. Is your data of good quality and reliable?
  2. Does the business facing application present that information in a form that helps in decision management?

When you offer big data insights in the context of business operations and as personalized to the front line user, you are delivering demonstrable ROI. It helps users take the right actions based on accurate information. Now your big data, business applications and analytics are not disconnected. Your operational applications and analytics get access to reliable information, and closed-loop feedback makes sure that your data is always clean, current and complete.

Article written by Ajay Khanna
Image credit by Getty Images, Cultura, Monty Rakusen
Want more? For Job Seekers | For Employers | For Contributors