Data Products: Imperative vs. Declarative and a New Approach
Summary:
- DataOps and data Products can be approached imperatively or declaratively, with Tamr's ML-based approach focusing on the latter.
- Imperative systems require users to develop logic, while declarative systems allow users to describe desired states.
- DataOps often struggle with declarative code due to reliance on imperative tools, but Tamr's ML model helps bridge the gap.
- Customers have found success with Tamr's declarative data products, improving data quality and streamlining processes.
- Tamr's declarative data products simplify data engineering processes, improve data quality, and enhance collaboration across teams.
When it comes to DataOps and data products, there are two distinct approaches: imperative data products and declarative data products. We will look at the differences between the two approaches, the DataOps challenges with declarative code, and how Tamr’s ML-based approach has helped customers to find success with declarative data products.
Imperative vs. Declarative
The key differentiator between these systems is how you specify what needs to be done. In imperative systems, the user needs to develop the logic to implement the desired state. In declarative systems, the user simply needs to describe the desired state and the system infers the logic needed to create that state. A useful analogy is the difference between writing down a set of static directions versus asking Waze to take you to your destination – Waze will adjust as circumstances change, much like declarative systems.
The canonical example is how DevOps revolutionized workflows by switching to declarative, infrastructure-as-code, scripts. The DataOps Manifesto applies similar principles but often only manages to achieve “schema-as-code” which is not much more advanced than traditional DDL plus GitHub.
Why DataOps Struggle with Declarative Code
The major reason most DataOps initiatives struggle to realize the benefits of declarative code is their reliance on ‘standard’ data engineering tools such as DBT, Spark, and Python, which are inherently imperative. Even the most elegantly written imperative code is still an imperative solution. Data engineering teams are expected to implement the business’s declared needs, but current tools fall short in supporting a truly declarative approach.
Tamr takes a different approach. By leveraging machine learning for entity resolution, Tamr’s data products realize the ambition of declarative DataOps tooling. Tamr helps our customers’ data engineering teams partner with business users to state how they want their data to be organized, leaving the ML model to figure out how to achieve that.
Customer Success with Declarative Data Products
Many of our customers, such as a leading healthcare provider, faced significant data quality challenges before switching to our declarative approach. This customer dealt with mixed Healthcare Organization (HCO) and Healthcare Provider (HCP) data, and struggled with invalid values such as "UNKNOWN" or "MISSING" in their datasets. With Tamr's declarative system, they could filter out incorrect records (e.g., HCPs vs. HCOs) via a simple configuration, enhancing their data's accuracy without needing to manually process every issue. Further, they were able to create a rich, 360 experience for the data consumers, matching to and incorporating 3rd party reference data to fill out attributes missing in their source systems.
Similarly, CHG Healthcare tackled data quality issues by moving their pre-processing operations to Snowflake, while still relying on Tamr for phone standardization and address clean-up. Our platform’s declarative interface allowed them to quickly address these challenges, streamlining data cleaning across their datasets and making their golden records available in RealTime across their enterprise.
Tamr’s Declarative Data Products
Today, we’re extending the same declarative principles to our data products:
Stop worrying about the pipeline: Our configuration-based interface allows users to state their needs - for instance, “I want a company data product with information from GLEIF” – without concerning themselves with the implementation details.
Simple usage across all of your data: Instead of worrying about the typical pain of reusing an existing data pipeline on new sources, Tamr’s data products improve as you add additional sources to them. The larger scale data delivers an even richer entity resolution experience. All the user needs to do is point the data product at the new source.
Scale collaboration: Our platform enables users to give feedback on how records are resolved and fields are completed. This feedback is persistent across refreshes, allowing the machine learning models to continuously improve based on user input. Tamr’s flexibility also allows schema standardization across data products, as demonstrated in cases like CHG Healthcare’s experience with healthcare provider data.
By focusing on simple, declarative interfaces, Tamr’s ML-powered data products bring the best of DataOps to the world of data products. Unlike imperative interfaces in traditional MDM solutions, this approach allows data engineering teams to focus on their core responsibilities, while enabling data consumers to meaningfully contribute to the quality of their data in scalable, impactful ways.
Ready to experience how Tamr’s declarative data products can transform your data engineering processes? Request a demo today and see firsthand how our ML-powered solutions can streamline your workflows, improve data quality, and scale collaboration across your teams.