A Growing Gap Between Data Lake Expectations and Reality

In Gartner’s Hype Cycle for Data Management for 2018, data lakes are plummeting towards the trough of disillusionment. Expectations for data lakes are widely left unmet by implementers, end-users, and executive sponsors alike. While that is not necessarily an indictment of the technology’s potential, it is a looming issue that will get more pressing over time if the gaps between expectations and reality are not addressed.

The Hype Cycle says data lakes are 5-10 years from mainstream productivity, but that’s of little use to companies that have built data lakes and expect to get value out of them today. A more recent report from Gartner analysts Nick Heudecker and Adam Ronthal — “How to Avoid Data Lake Failures” — explains how companies can avoid common strategic blunders in implementing data lakes and make them a useful part of their data management strategy.

 

We are just getting to the point in the history of data lake adoption where the gaps between the goals imagined in implementation and the results are becoming apparent. As Heudecker and Ronthal say, “the lack of documented data lake failures, whether personally experienced or communicated in the popular press, has convinced many organizations that data lakes are a magical answer to their data and analytics requirements. Many of these organizations will likely fail. They just haven’t failed yet [emphasis added].”

The three common failure modes described in the Gartner report all relate back to a disconnect between the aspirations of those advocating for data lakes and the many complexities that must be overcome to turn fractured enterprise data into high quality fuel to power digital transformation initiatives.

“The popular view is that a data lake will be the one destination for all the data in their enterprise and the optimal platform for all their analytics. This view rests on three assumptions that have not proved correct:

  • The first is that everyone in the enterprise is data-literate enough to derive value from large amounts of raw or uncurated data. The reality is that only a handful of staff are skilled enough to cope with such data, and they are likely doing so already.
  • The second is that the enterprise will be able to define cohesive governance and security policies across all datasets residing on a single cluster of physical infrastructure. The same attempt was made with data warehouse implementations, but proves far less successful with data lakes because the data they contain isn’t modeled. Creating policies for data without context is impossible.
  • The third is that data lake implementation technologies perform far better than they actually do, which leads to wild overestimations of their benefits.”

Successful data lake programs need to begin with realistic expectations. And that should start with the realization that simply putting all of a company’s data in one place doesn’t create much business value by itself. In fact, making siloed, messy, uncurated data more broadly accessible can actually do more to expose the depth of enterprise data quality problems and undermine organizational trust.

What is needed is shared understanding that data lakes are only a piece of the puzzle when it comes to modernizing enterprise data management. Large enterprises struggling to deal with decades of accumulated data debt can reach the so-called ‘plateau of productivity’ for their data lake investments if they think about them as necessary but not sufficient components of their plan. Without a holistic data strategy, data lakes run the risk of becoming data swamps. But when coupled with logical data unification (like the kind that Tamr Unify does for our customers) and proper governance, lineage, and privacy technologies, the whole can greatly exceed the sum of the parts.

Once you’ve read the Gartner report on How To Avoid Data Lake Failures, if you’re interested in reading more on the subject of modernizing enterprise data management, check out this O’Reilly Book on Getting DataOps Right. DataOps covers not just the technologies, but the people and processes (and associated transformations) needed to bring about a step-change in the value that enterprises get from their data.