Delivering Data Lake Value Versus a Data Swamp

Delivering Data Lake Value Versus a Data Swamp

It is becoming evident that insights-driven companies make more money and develop more sustainable barriers to entry. Forrester has estimated that they “will steal $1.2 trillion a year by 2020 (“Insights-Driven Business”, Forrester Research, July 27, 2016).

The insights-driven do this by experimenting and continuously learning. A key focus for them is getting the right data and then taking effective actions. These companies increasingly are adding data lakes to their repertory at the same time “as they put in place CDOs with the advanced data capabilities that are critical to successful digital transformation” (“Data Centric Businesses need a Data Centric Leader”, Forrester Research, April 26, 2016).

Clearly, successful CDOs will lean on their CIOs to succeed at creating a valued data lake versus a worthless data swamp. In order to understand what recommendations to give to CDOs and business leaders, we have interviewed 12 leading-edge CIOs regarding their perspective on data lakes. Specifically, we asked them for their recommended list of the do’s and don’ts.

The value of a data lake

CIOs tell me that they see the potential to create value from new business intelligence options including self-service business intelligence and a data lake. They see these trends as mattering because they are fundamentally about increasing the availability and transparency of data and enabling business users the ability to find answers without IT.

In fact, most CIOs connect in their minds self-service BI to the data lake as a requirement. This means that effective data lakes are not just about data storage but also about end user and data scientist self-exploration of the data contained within them.

Some CIOs even see the establishment of new business intelligence options as enabling a broadening the community served by IT.

“Self-service BI allows our broader community to be engaged in relevant up to data analytics based decision making,” Higher Educational CIO.

“This is a huge opportunity allowing end users to gain control of their data plus analyze quickly,” Another Higher Educational CIO.

“Informational in the power to provide value back to the company, more people have information can add business value back,” CIO Consultant.

Some CIOs see the value of taking this step occurring when it is used for everyday decisions, not only special requests. The value is the ability to benefit and impact everyday users.

Others focused more on how technology changes how business intelligence is done. They candidly assert that these approaches change how BI works and who is served.

“Important technology advances, like in-memory data bases, Hadoop, ‘disposable’ on-demand cloud business intelligence services are changing the game”, CIO Consultant.

This goes to the classic observation that you can't manage what you don't measure. Several CIOs felt that having implemented self-service business intelligence and data lake in fact offered better business/IT alignment all by itself.

Make it purposefully

CIOs showed here that they have learned from the first wave of business intelligence. They say that value is generated only when asking the right questions at the start. They are candid that even though the tools make data highly available, asking the right questions is still a challenging process for most businesses organizations.

Some even worry that if data lakes don’t move from the experimentation phase to generating business value, CEOs and CFOs will start complaining and heads may roll. For this reason, it wasn’t surprising when one CIO said that a data lake with no business goals or purposes is just taking up space.

CIOs say even though it is not easy, IT shouldn’t always say yes to a project. This same result was found earlier this year in interviews of 210 Business Executives. They found that those that excel at big data use it to achieve digital business objectives (“The Big Data Payoff, Capgemini IDG. 2016).

This, as well, echoes back to Tom Davenport’s comment that “even the most analytically oriented company needs to target it’s analytical efforts where they will do the most good, because resources, especially talent, are always constrained.”

It is clear that CIOs need to help make sure their business customers start with an end in mind. CIOs are clear that it can't be “data first, questions later”. Fixing things can start with something as simple as CIO asking their business counterparts or internal IT proponents what problems they are trying to solve.

To counter several industry slogans, CIOs say that while it is "about the data, it's also really about the intended purposes”. And they are frank that translating the data into answers can be even more challenging than many vendors portend. They know from firsthand experience that understanding the data and what it is telling you is crucial, and CIOs claim this won't happen by magic.

Start simple but solve new problems

One CIO said that the notion of a data lake can feel difficult if you have huge systems and difficulty identifying data definitions. CIOs in general feel that a big bang approach is a loser.

A healthcare CIO felt that they need to stop trying to solve "world data hunger". To do something about this was seen as involving people, process, governance and prioritization. And he was candid that one organization’s pilot could be another's phase 1 production rollout.

CIOs, therefore, suggest that you go small given the basis of your organization’s size. You should find a problem and then focus on the source data that could possibly relate to the solving of this problem only. This should be about IT and the business learning together. CIOs stress the notion of piloting and starting small to get big. Or put differently, go slow to get fast.

Meanwhile, CIOs said that you shouldn’t use new tools to “pave old reporting cow paths”. New tools need to be used to answer new questions or to enable better answers to existing questions. Interestingly, some CIOs questioned whether their IT organization should deliver these new approaches or whether it is better to deliver all through a public cloud vendor.

Govern the data going into the data lake

CIOs feel that it is important that there is transparency with how data is used and combined. This includes proper design and planning. CIOs feel that this involves identifying system “sources of truth” and allowing users access to extracted data. Without this, the data collected is seen once again as being a bunch of bits taking up storage space from other systems.

A hospital CIO put it this way, it “all comes down to the governance model”. He continued by making another relevant point, “I have seen organizations fail because IT tries to be the gatekeeper and control access”.

Fix your existing and new data problems

CIOs had varied opinions on this topic. They tended to stress the need for “data hygiene”. They feel that there are all sorts of quality, governance and accuracy issues. And when pushed on the topic, they claimed that the so-called data swamp is in reality a data lake filled with dirty data. CIOs said you get a data swamp when there is no reasonable data curation process. And this, of course, is seen as requiring both IT maturity and data governance.

If an appropriate "data curation" process is in place, a data swamp "shouldn't" happen. It was also claimed that data swamps can be avoided with proper analysis. One CIO summed things up by saying that in order to get value out of the data lake and big data, you need to continue to do “data management 101”.

Several CIOs suggested master data management and stewardship are also essential here. One IT leader countered, however, by saying that while some will try MDM to avoid data swamps, the most effective answer is likely official open data repositories grounded in truth.

Regardless of approach, CIOs felt that IT and the business need to understanding that real-world data is "dirty" at the start. Openness about this is seen as the beginning of the "cleansing" process. CIOs candidly suggested that everyone should have their eyes wide open to the reality that it takes a lot of work to get proper results.

Manage the data access and data security

CIOs worried about the “putting all your eggs in one basket” effect. They stressed the importance of having data security and privacy from the start. This is an important point because CIOs often see most big data/data lake projects still largely experiments. It comes as no surprise considering that only 27% of business executives say their big data projects have achieved profitability.

Regardless, data lakes will undoubtedly become targets for hackers or improper internal access. Governance needs to be done sooner rather than later. CIOs suggest a big challenge for data lakes is in providing necessary information while maintaining appropriate access control.

Train the team

Training is needed for each element described above. Business users and data scientists need to know that they must have a business purpose for a data lake and have specific questions in mind. And of course, they need to know that the above process can be circular.

Once they have meaningful data, they will want to answer new questions, and these could cause them to go back through the same process. Training needs to educate the notion that data lakes should start with small-data projects. And CIOs suggest that once real world problems have been solved, they represent the fodder for future organizational learning and support.

Simultaneously, there is a need for business and IT to be educated on why governance of the data lake matters, and this includes understanding why safe data practices are necessary. Data needs to be fixed, access control established and security establish put in place.

4 steps to avoid the data swamp

More and more organization will want and need to become insights driven in order to stay relevant to their business customers. A data lake can be a key element of doing this. But creating a data lake on its own has business risk. Acquiring data insights out of a data lake doesn’t just happen. There are concrete and meaningful actions that CIOs recommend be taken.

As we have discussed, there are four critical steps to prevent a data swamp.

  1. Start with concrete business questions.

    Like the data staging of the past, simply dumping a bunch of data into a data lake with no concrete purposes in mind and at least some a priori guesses of possible causation is a big business risk.

  2. The role of data governance, data access and data security does not go away with the volume of data in the data lake.

    These things have to be working if you are to avoid the data swamp.

  3. More data points does not mean that data quality is no longer relevant.

    The need for what CIOs call “Data 101” persists. Looking at bigger data sets does not eliminate the fact that there are data risks. And for larger data sets, the data risks may in fact be even greater.

  4. It is essential that you train your team.

    CIOs believe that training needs to happen at all levels. Part of this is the people, process and technology to ensure that data is curated and protected. Part of this is to ensure business users and data scientists understand to start small and have concrete questions in mind and understand there is work to do with all data.

Article written by Myles Suer
Image credit by Getty Images, Photolibrary, Russ Widstrand
Want more? For Job Seekers | For Employers | For Contributors