Nik Bates-Haus
Nik Bates-Haus
Chief Architect
March 22, 2024

How Tamr Solves Real-world Entity Resolution at Scale (Five-part Video Series)

How Tamr Solves Real-world Entity Resolution at Scale (Five-part Video Series)

(Warning: Machine learning and mathematical modeling deep dive ahead!)

In this comprehensive video series from Tamr’s Chief Architect, Nik Bates-Haus, you can dig into a detailed overview of why solving the enterprise-wide entity resolution problem is so hard and how Tamr has successfully cracked the code so you don’t have to. In each of the videos you’ll gain a clear understanding of the various challenges faced in performing successful entity resolution, how Tamr thinks about and addresses them, and how we consistently deliver successful technical and business outcomes in each instance.  Let’s get moving!

Video 1 - Why is Entity Resolution at Scale Hard

In this kick-off to the series, learn about the enormous scale of the entity resolution problem, a number of the challenges hiding inside and why non-machine based approaches to solving it are insufficient.  Also, find out how Tamr’s battle-tested approach can take 500 million records from unprocessed all the way to golden records in under three hours and at a reasonable cost.

Video 2 - Making the Easy Part Cheap

To successfully and practically address entity resolution at scale, you have to be efficient.  That means making the easy parts of the process as effective and inexpensive as possible.  In this video, learn what Tamr means by easy and what we mean by cheap and how Tamr’s data processing workflow streamlines pre-grouping, feature extraction, blocking and pair generation. When it comes to measuring success, learn how Tamr reduces the necessary future processing  effort from 500 million records down to under one million pairs, setting us up to focus our machine learning not on the easy cases, but on the interesting ones that exist on the decision boundary. 

Video 3 - Making the Hard Part of Entity Resolution Scale

In this video Nik gets into the details about what the hard parts of entity resolution are and what being able to address them at scale really means.  When looking at Tamr’s data processing workflow, you’ll learn specifics around signals, similarity, clusters and consolidation, all of which result in remarkable processing time and costs and near perfect parallelism that scales with the compute environment.

Video 4 - Ensuring Effective Learning for Entity Resolution

Learn about what we mean by learning and how to think about effectiveness when it comes to learning on very large datasets.  Dig into the machine learning workflow phases around bootstrapping, refining and validation and learn about measuring success in terms of the time and number of labels it takes to converge on a consistently accurate model.

Video 5 - Coping with Change 

When it comes to data, change is like taxes…inevitable. In the last video of the series, explore the types of change a production system must accommodate and what it means to effectively cope with that change. Learn about the change processing workflow and how success can be measured by the stability of IDs over time in the face of data changes, schema changes, model changes, manual edits and shifting requirements.

Ready to learn more? Schedule a demo.