The term ‘Data Lake’ has become synonymous with almost every data initiative in any enterprise. This makes sense from the inherent benefits it brings like lowering the total cost of storage and eliminating the need for archival. Plus, Data Lakes provide a flexible schema and serve as a workbench for searching newer patterns and generating value out of raw data. The ability to manage data fidelity takes a major burden off of the Audit & Compliance departments of various telecomm, banking and insurance enterprises.
That being said, there are always two sides of every coin. With Data Lakes crossing petabytes of storage, it poses some fundamental questions from a business user perspective:
There is an inherent need to create a framework around a Data Lake that manages metadata (technical as well as business), lineage (N<->S), traceability and audit. Business leaders understand the importance of Data Lakes but also realize the gaps in this concept. They want to create an ecosystem that plugs the hole from the start.
It is expected that Data Lakes will transform into Data Market Places (like Amazon or Flipkart), that not only give access to its products (data in this case), but also will act as a recommendation platform for business users to guide them to which dataset to use, who should use which dataset, in what combination and what the end product will be.
The DCQI Framework enables the ‘write – back’ capability and hence lays the foundation for predictive and prescriptive analytics.
In other words, this will act as a ‘Data Market Place’ governed by a ‘Subscriber and Rules Policy Engine’ to create any application and pull or push any data inside a Data Lake!