Just a few short years ago, data lakes were all the rage. Every customer we talked to wanted to put all their data in one place, completely centralizing data curation, stewardship, and engineering. But as we worked with those customers to realize their goals of taking all that data and producing consumable outputs, we found one thing in common: the customers who engaged business users for data curation (in our case, the SMEs to help build a machine learning model to master their data) were more successful than those who left data curation entirely to some centralized team. Why? Because it is impossible for any team outside of the business to replicate 20 years of experience working with business data. For data projects to be successful, the business problem needs to be front and center, either by involving business consumers as part of the project definition so they are ready to consume its output, or by making them key members in the execution of the project. When you do not center on the business problem, it’s unlikely that the business will use the output.
During this time, another idea started to proliferate: data mesh. Instead of centralized everything, the organization would empower each distinct unit to create their own datasets. The thinking behind this approach is that if these units provided enough documentation around those datasets, different units could share data between each other and, in theory, everything would be perfect. Right?
Wrong. Here’s the problem. Building datasets that deliver value to another person or team and regularly delivering updates to those datasets is a complex data engineering task. So while the business teams may be in the driver’s seat in terms of defining and producing datasets, they are reliant on significant data engineering capabilities or extreme amounts of manual curation, often conducted in a spreadsheet, to produce anything of quality.
In the first example, we have customers veering too deeply towards data engineering groups who can build fantastically-complex pipelines to move data around, but struggle to provide datasets that deliver impact and value to the business. In the second case, business units know how to make the data be what it needs to be, but struggle to scale their efforts to multiple datasets, let alone to big data. Further, neither of these camps are successfully integrating their data with third party providers. The former struggle because of lack of business acumen. The latter fail because they lack technical acumen.
Which brings us to today. We are in a world where companies see-saw between data philosophy extremes while trying to answer the simple question: “how do I provide the best version of my data so that my business teams can make good decisions?”
Tamr Mastering’s off-the-shelf data product templates allow business teams to generate high-quality datasets that are easy to share across the organization. And because our templates are pre-built, they don’t have to worry about data engineering at all. Remember, the primary challenge for these customers isn’t knowing what’s in the data, it’s knowing how to put it all together in a scalable way. With Tamr Mastering’s data product templates, all you have to do is know how to map your data to our industry-standard schema and press a button. The result is a massively-scalable, machine learning-driven pipeline that organizes the data into key business entities like customers, suppliers, healthcare providers, and more.
With our data product templates, business teams can realize the dream of a data mesh because they are empowered to produce high-quality, scalable data products without the need for any data engineering expertise.
Data Products Templates with Third Party Data
Going back to our customers with centralized data engineering teams, a common challenge we see is that they are often responsible for managing the acquisition of third party datasets. But many times, they lack the ability to do anything other than copy them into a new place where business users can access them. Our built-in enrichment of company data with third-party datasets eliminates that challenge for these customers. Suddenly, not only can business teams have third-party datasets incorporated into their own data products, but centralized data engineering teams can do so as well. Regardless of how our customers organize around building and providing data to the business, our data product templates fill in the gaps to enable their teams to provide high quality, scalable output.
Weaving the Mesh
Taken together, Tamr Mastering’s data product templates allow our customers, irrespective of the organization of their data and business teams (centrally, federated, or somewhere in between), to begin the journey of implementing a data mesh. With Tamr Mastering’s data product templates, business teams can break off from the monolithic enterprise of centralized data engineering and produce their own data products, thereby directly contributing to the mesh. Further, because we build our data products on a cloud-native and scalable platform, data teams can feel confident that as business interest in the data product grows, they can assume maintenance and orchestration while scaling the number of sources feeding into it without risking data SLAs. Hence Tamr Mastering’s data product templates allow for a cohesive collaboration between data and business teams around how to build, organize, and scale an organization’s set of data products. Said differently, they allow the business to make a data mesh architecture a reality.