The Role of Rules in AI-Native MDM

Organizations have relied on rules-based master data management (MDM) solutions to master key business entities for decades. And while effective for certain use cases, rules-based approaches have a limit to their capabilities, making it difficult to deliver high-quality data at scale. In an attempt to master more records, organizations added more rules. But over time, overlapping logic, competing thresholds, and dependencies made these rules-based solutions difficult to maintain, making a fully rules-based approach unsustainable.
Today, AI-native MDM combines advanced AI/ML models, custom rules, and human/agentic curation to make data mastering simpler, more efficient, and easier to scale. In this article, we’ll explore the role of rules in an AI-native MDM environment.
What Are the Different Types of Rules in MDM?
In MDM, rules typically fall into several categories.
- Validation rules ensure that data meets certain quality and content standards before it’s ingested into the system.
- Standardization rules transform data into a common format so it can be used and compared across systems.
- Matching rules determine whether two or more records represent the same entity.
- Survivorship rules determine which values should be kept when multiple records for the same entity are merged.
- Governance rules define how people should manage and resolve data issues.
- Relationship rules identify how entities relate to each other.
Historically, these different types of rules worked in tandem to determine how organizations ingested, cleaned, matched, merged, and maintained data across systems. But rules-based MDM quickly reached a ceiling and could only take data quality so far. Each new source or use case introduced new formats, edge cases, and relationships that resulted in even more rules layered on top. And over time, these rules piled up, often contradicting one another.
Where Rules Still Matter in AI-Native MDM
While AI/ML models are essential for scaling entity resolution, rules still play an important role in a modern MDM architecture. The key difference is how they are used. Instead of relying on thousands of rigid rules to resolve entities, an AI-native MDM platform applies targeted rules in specific parts of the data mastering process where they provide the most value.
For example, many organizations rely on validation rules to ensure that new records meet governance standards before they are written into the master data hub. Fields may require specific formats, constrained values, or non-null requirements. These rules help prevent low-quality data from entering the system and ensure compliance with enterprise data governance policies.
Rules are also important in survivorship and consolidation logic. Once records have been clustered together, organizations often need clear policies to determine which values should populate the golden record. For instance, a business may choose to prioritize their CRM as the most trusted source for phone numbers, while their ERP system may be the most trusted source for billing addresses. Attribute-level survivorship rules allow organizations to enforce these business policies while still benefiting from AI-driven matching.
Entity Resolution at Scale: An N2 Problem
AI-native MDM replaces rules specifically in the matching layer, while other types of rules, such as validation, standardization, and survivorship, continue to play their role. AI-native MDM breaks the matching process into stages, combining machine learning with targeted rules at specific stages of the process to improve both performance and accuracy.
The challenge of a purely rules-based approach to entity resolution becomes clear when you consider the scale. At its core, matching requires comparing records to determine whether they represent the same entity. More specifically, this means comparing every record to every other record, what’s known as an “n2 problem.”
Consider this scenario: You have 10 entities that you want to resolve. You take the first record and compare it to the other nine to see if any are the same. Then you compare the second record to the remaining eight, the third to the remaining seven, and so on.
While this process may seem manageable for 10 records, it quickly becomes unmanageable as the number of records increases. Imagine the time it would take if you had a hundred, a thousand, or a million records. The correct formula to use is n x (n-1)/2, which results in:
100 x (100-1)/2 = 4,950 comparisons
1,000 x (1,000-1)/2 = 499,500 comparisons
1,000,000 x (1,000,000-1)/2 = 499,999,500,000 comparisons
If you assume that you can process 10,000 comparisons per second using rules, resolving a million records would take roughly 1.6 years. This is clearly not a method for handling data at scale.
As discussed in this white paper, AI-native MDM addresses this challenge by breaking the matching process into stages. Instead of comparing every record to every other record, the system narrows the problem step by step, reducing the number of comparisons before applying more advanced models.
The first step in this process is candidate retrieval, the process of identifying a smaller set of records that are likely to match. Instead of comparing every record to every other record, the system uses fuzzy rules to generate high-recall candidate sets. This approach dramatically reduces the number of comparisons, while ensuring that true matches are not missed.
In addition, simple grouping rules can be used to resolve obvious matches early in the process. For example, if key fields such as name, address, and identifier all match exactly, there is no need to apply more complex models. These rules allow the system to quickly handle straightforward cases, reserving AI/ML for more ambiguous scenarios, while also flagging the most complex edge cases for data stewards to address.
Finally, organizations can also layer on business-specific matching rules to reflect their domain requirements. For example, identity field rules such as matching on a shared customer ID across systems can deterministically link records. A healthcare organization can add business rules to enforce constraints, such as preventing two records with different dates of birth from being merged. These rules complement AI-driven matching by ensuring critical business logic is always respected.
Together, advanced AI/ML and select rules generally get organizations 90-95% of the way toward achieving fully clean, trustworthy data.
What About the Remaining 5-10% of Unresolved Data?
The “last mile,” the 5-10% of data that remains unresolved after the general mastering process, often requires a higher level of knowledge, precision, and data preparation. It’s a small portion of the data, but it typically consumes upwards of 80% of the data team’s time to resolve.
To address the last mile, Tamr’s AI-native MDM uses agentic data curation, an innovative concept that uses LLM-based AI agents to automate more of the data curation process by capturing and acting on the contextual insights needed to make confident decisions.
AI agents intelligently clean, curate, manage, and refine the idiosyncrasies and complex edge cases that are close to consumption and difficult to decipher, with minimal human intervention. By comparing outputs of entity matches and explaining the reasoning behind why records are (or are not) a match, AI agents can create a queue of records based on their preliminary analysis. Then, humans can review and decide if they trust the agent’s output or if they need to tune the model further.
How Does an AI-Native Approach Differ from Traditional, Rules-Based MDM?
Traditional, rules-based solutions like Reltio, Informatica, and Profisee are fundamentally different from AI-native MDM solutions like Tamr. Instead of applying select, custom business rules that make algorithms more efficient, traditional solutions require organizations to decide which rules they want to use and where they want to use them. For example, a rule may state, “If name is 70% similar, address is 90% similar, and phone number matches, then these records should be merged together.” This type of rule is onerous, error-prone, and time-consuming to maintain.
AI-native MDM, on the other hand, uses a layered approach of smart rules, advanced AI/ML, and agentic data curation capabilities all working together in tandem to handle 95%+ of the work required to master data at scale without the hassles of managing large sets of rules. Even with all of that available in a packaged offering of a pre-trained model, an AI-native MDM solution like Tamr still provides organizations the flexibility to include and enforce select rules specific to an organization’s business processes, data landscape, or data requirements, further enhancing the quality of the results. This enables data team members to focus their energy on solving the most difficult edge cases, whereas rules-based MDM makes them spend most of their time managing rules.
The Future is AI-First
AI is the catalyst that is transforming MDM. It enables organizations to master large volumes of data faster, identify relationships more accurately, and adapt in real time as data environments evolve. But that doesn’t mean rules disappear altogether. A small, well-defined set of business rules still plays a role. And while the most successful MDM strategies prioritize AI, they also recognize that thoughtfully applied rules have their place, too. In this way, rules and AI work together, with AI handling the complexity of matching at scale, while rules enforce structure, policy, and business logic across the data lifecycle.
Get a free, no-obligation 30-minute demo of Tamr.
Discover how our AI-native MDM solution can help you master your data with ease!


