Tamr Patents
Our commitment to innovation extends beyond just words. We set out to solve a difficult problem: how to clean and curate messy, heterogeneous data at enterprise scale. The approach we took was to apply machine learning (ML) to automatically integrate database schemas, match and cluster like entities, and connect those entities to produce “golden records”—while enabling and ensuring human oversight of the process. We built the core IP to automate this ML-driven, human-guided workflow, and our extensive portfolio of patents stands as proof of this commitment. Our portfolio isn’t just a list of patents but evidence that we’re deeply invested in the future of enterprise data mastering.
Insights
Answer the questions “What changed?” and “How accurate is my ML model?”
Review and Curation of Record Clustering (1 patent in family)
When you push a big update to your data product, the most basic, gut-check question is always,“What has changed since the last version?” This patent is our answer. It secures the method for managing clustering updates at large scale, enabling the automated creation of Tamr IDs, and allows us to reliably track and compare clusters between the current and proposed versions.. This capability allows customers an auditable view to visualize and review changes driven by source data updates, machine learning models, or human curation. It enables scalable re-clustering and propagates cluster IDs through changes for efficient review and feedback. It’s change management, built right into the platform.
Unbiased Cluster Accuracy Metrics (1 patent in family)
A major challenge in adopting any AI-powered data unification solution is reliably assessing model accuracy. While existing methods often introduce biases or fall short in practical scenarios, Tamr tackles the problem of guesswork head-on. This patent assures a robust, record-based metric for measuring clustering accuracy, ensuring consistent evaluation and monitoring across both training and production workflows. It directly addresses the “black box” problem of AI, giving you a trustworthy, consistent measure of data quality while reducing manual effort.
Curation
Curate large, diverse data sets at scale
Large Scale Data Curation (3 patents in family)
When Tamr was founded at MIT, technologies existed to address two out of the three V’s of big data:Volume and Velocity . However, no solution existed for managing data Variety. Tamr pioneered a system that fixed this by throwing out the old idea of treating schema mapping and deduplication separately. Coupling machine learning with subject matter supervision, Tamr developed a scalable, cost-effective system for large-scale data curation. This foundational family of patents protects the entire iterative workflow from building the initial linkage model to using stratified sampling intelligently to pick the most helpful pairs for your experts to review.
Curation with Version Control (1 patent in family)
Tamr is a pioneer in integrating manual data curation with version control, providing a principled approach to tracking how data, model, and curation changes come together to form a version of a data product. This patent protects the core mechanism for defining and storing parent and child “curation states,” ensuring transparency across teams. This system encapsulates all data, linkage facts, models, and expert answers at any point in time. Tamr’s approach supports high-level workflows composed of low-level curation components like tokenization, blocking, and candidate generation. It also offers advanced capabilities such as restarting and rolling back workflows, ensuring flexibility in managing data curation processes.
Reusing Transformations for Evolving Schema Mapping (2 patents in family)
Human input is the most costly aspect of data curation. Tamr patented a solution to make that investment go further. This method centers on the Transformation Graph, which allows the system to capture and reuse prior mapping work. When a source system changes its schema, this protected mechanism suggests the optimal (lowest cost) mapping paths by identifying reusable sequences of modifications. It allows Tamr to ensure schema mapping stays efficient and adaptive, saving you significant time and money.
Feedback
Capture human feedback in context and use it to train the model
In-Situ Data Issue Reporting, Presentation, and Resolution (1 patent in family)
Tamr recognizes that users are most likely to report feedback on data when they can do so from within the application they are using. Tamr’s innovation makes that feedback matter. This patented system provides an interface that ensures that when a user points out an issue, it automatically captures the full context (version, filters, elements) right where they are. This approach not only makes it simpler for users to provide feedback, but it also makes it easier for curators to view that feedback in context and take corrective action.. This transforms an isolated complaint into a structured, actionable training signal for the curation team, closing the loop between data consumption and data quality improvement.
Using Clusters to Train Supervised Entity Resolution (1 patent in family)
US Patent: US12242982B2
Tamr recognizes that scaling governance and expert oversight is impossible using individual record pairs. This patent family protects a crucial feedback loop that solves this by utilizing verified clusters as the preferred scalable unit of expert feedback. This feedback is then fed into Tamr’s learning loop, where verified clusters and steward input are transformed into training signals that continuously refine entity resolution accuracy, automate curation, and strengthen overall data quality and visibility.
AI/ML Mastering
Make meaningful connections and translate them into active learning
Scalable Binning for Big Data Deduplication (2 patents in family)
Matching a record against a corpus of millions,or even billions,of other records can lead to wasted time on irrelevant comparisons. Tamr’s patented innovations secure the technique of scalable binning, enabling machines to quickly and accurately focus on relevant records. Tamr’s patented approach uses distributed systems to scale deduplication efforts, employing advanced binning, blocking techniques, and imperfect-rule clustering, ensuring matching accuracy and efficiency at scale.
System for Scalable Hierarchical Classification Using Blocking and Active Learning Method (3 patents in family)
Organizing your data into a giant, complex taxonomy is a classic MDM headache. This patent family secures a practical, multi-step pipeline for hierarchical classification. The protected method involves building the classification model using multiple binary classifiers (one for each node in the hierarchy), consolidating input records via deduplication, and using a smart search algorithm to find the most likely category path. It ensures we can handle massive taxonomies and records without sparse training data impeding the accuracy.
Geospatial Binning (1 patent in family)
Matching geospatial data—roads, building footprints, points of interest— is vital for many industries, but it often requires a trade-off between accuracy and scale. Tamr’s innovative approach computes similarity across different feature types such as points of interest and building footprint without relying on projection, avoiding the usual accuracy trade-offs. This approach supports large-scale deduplication, advancing geospatial binning by computing proximity without a central index, and scales effectively to extremely large datasets using distributed systems.
Active Learning When Using Clusters for Supervised ML (1 patent in family)
Active learning aims to improve the accuracy of machine learning by asking the right question at the right time to get the biggest boost in model accuracy. Tamr’s patent refines this approach by translating the technical needs of the active learning system into practical questions that data experts can answer. This method enables the system to train highly accurate models with minimal expert effort by automatically surfacing high-impact record clusters for review through an intuitive interface.
See for yourself
Get a free, no-obligation 30-minute demo of Tamr, and discover how our unique AI-native MDM solution can empower you to deliver data you can trust.