Written by Tara Wildt
The market for data preparation tools is constantly evolving, although it remains both complicated and complex. As tools continue to mature, they are shifting from an initial focus on self-service, to now supporting data integration, analytics and data science use cases. In the 2019 Market Guide for Data Preparation Tools, Gartner noted that “modern data preparation tools now enable data and analytics teams to build agile datasets at an enterprise scale , for a range of distributed content authors.” This is a significant development from where these tools were a few years ago.
As enterprises try to evaluate data preparation tools to develop or improve their data pipelines, what are the major factors that should be driving the decision-making process? We’ve outlined three important market trends that enterprises should consider.
1. The Increased Use of Machine Learning
Increasingly, as Gartner notes, AI and machine learning are being used to “improve and, in some cases, automate the data preparation process.” Machine learning works from the bottom up by first using algorithms to conduct the heavy lifting of identifying relationships within the data, and engaging humans when necessary for training or validation. This means that data scientists are able to spend time on more valuable work such as analyzing data and generating insights. Over time, the amount of time a human needs to spend performing a specific task will decrease as the machine learning accuracy increases — improving efficiency and decreasing the amount of time someone such as a data scientist needs to spend working on these kinds of tasks.
2. The Need for Flexibility
Enterprises need data preparation tools that are flexible — both in terms of their deployment capabilities and their interoperability. With the growing popularity of cloud service providers such as Microsoft Azure and AWS, enterprises are increasingly looking for data preparation tools that can be deployed in the cloud, or at least as part of a hybrid/multi-cloud approach.
The same can be said when it comes to interoperability of data preparation tools with other tools in the data pipeline. An ideal data engineering architecture should include technologies that are best-of-breed and open. Delivering clean, complete data to consumers when they want it and how they want it requires piecing together multiple technologies from different vendors, whether they be large tech companies or startups. Moreover, there can’t be an aversion to open source if it can best solve the problem. Technologies and processes need to interoperate and follow the basic premise that they should be able to easily accept a table in and produce a table out. This interoperability enables enterprises to build a true, best-of-breed technology stack instead of having to rely on one single vendor to meet all of their needs.
3. The Importance and Data Engineering and DataOps
Several years ago, enterprises were focused on building out the data scientist function within their organizations. Now that this role has been established as a critical component of the enterprise, it’s becoming apparent that the focus needs to be on speeding analytical outcomes. According to Gartner, “data engineers and their teams are charged with building data pipelines to ingest, combine, prepare and deliver data for various use cases (both analytical and operational).”
At Tamr, we have talked a lot about how data engineering can be synonymous with DataOps. DataOps is an automated, process-oriented methodology used by analytic and data teams to improve the quality and reduce the cycle time of data analytics. Both DataOps and data engineering are about increasing analytic velocity — typically by building an open, best-of-breed ecosystem. This is also why the previous point about interoperability between tools is so important.
What This Means for Enterprises
As the landscape for data preparation tools continues to change and develop, there are a number of considerations that enterprise buyers have to take into account — beyond just the trends we have listed here. Moving forward, selecting data preparation tools will increasingly be dependent on factors such as whether machine learning is used, the potential for a tool to deploy on cloud, and how that tool integrates with other technology solutions in the organization’s data pipeline.
Gartner, Market Guide for Data Preparation Tools, 17 April 2019, Sharat Menon, Ehtisham Zaidi
Gartner does not endorse any vendor, product or service depicted in its research publications, and does not advise technology users to select only those vendors with the highest ratings or other designation. Gartner research publications consist of the opinions of Gartner’s research organization and should not be construed as statements of fact. Gartner disclaims all warranties, expressed or implied, with respect to this research, including any warranties of merchantability or fitness for a particular purpose.