Written by Tamr
Last week Reuters reported that the U.S. government hasn’t notified the victims of a May Office of Personnel Management (OPM) databank hack that compromised the personal information of 21.5 million Americans (mostly government employees and contractors).
The reason behind the 2-month delay, according to an unnamed OPM official?
The government’s attempt to build a centralized notification system has been hindered by “the complicated nature of the data and the fact that government employees and contractors often move among different agencies.” The OPM official guessed that it’s going to take weeks before a centralized system could be implemented, with OPM “expected to hire an outside contractor” to complete the work.
We’d be glad to take a crack at this for OPM. It’s exactly the kind of problem Tamr’s machine-driven, human-guided data unification platform was built for. Dirty, even dark, data spread out across many disparate silos that needs to be cleaned and unified simply and quickly.
A traditional rules-based approach to integrating the diverse agency data would be to develop code customized for the respective record structures, with rules identifying and de-duping exact matches on employee/contractor name, address, tax ID and other attributes. But rule-based matching rarely accounts for the messy data reality the government is facing. Tamr instead relies on a bottom-up, probabilistic approach to matching attributes and entities. With this sort of “fuzzy” matching, Tamr can make educated guesses that multiple similar fields refer to the same entity, even if they’re describing it differently (e.g., IBM = International Business Machines).
Our algorithms automatically match attributes and entities across the full range of data sources – often accomplishing up to 90% of the task without human intervention. When human intervention is necessary, Tamr generates questions for data experts, aggregates responses and feeds them back into the system — a solution that scales to thousands of potential data sources and enables Tamr to continuously improve its accuracy and speed over time.
Tamr’s bottom-up, probabilistic approach to cleaning and unification reduces by as much as 90% the time and effort of connecting multiple data sources to achieve a unified view of siloed data. We’ve seen organizations using Tamr complete data unification projects in days or weeks versus months or quarters, dramatically accelerating time to analytics — or in OPM’s case, notification of 21.5 million Americans of potential ID theft.
To learn more about Tamr’s data unification platform, download our technical white paper.
To see how Tamr has radically reduced the cost, time and complexity of a customer’s data integration project, read our case study.