Written by Matt Holzapfel
Enterprise data and IT professionals often ask us: “Why do I need a solution like Tamr? We already have rules-based systems for data mastering.” It’s a valid question, and the answer is vitally important for enterprises that want to thrive based on data insights.
When Rules Are Meant to Be Broken
Most enterprises today use top-down, rules-based methods for managing data. Here are two of the major issues with this approach:
First, a top-down approach causes data disconnects. Let’s say you’re a market analyst trying to find out how a recent promotion performed. You saw good sales results in a presentation, but they aren’t reflected in the aggregate data reported back to you. In order to figure things out, you go to the database managers (the “gatekeepers”) to explain the problem you found, and ask them to look into it.
The obvious issue here is that the person who knows the most about the data in context—you—isn’t the one in control of it. This data quality management scenario happens constantly in real corporations, creating delays, false reporting, and many other obstacles to sound analytics.
Quite often, those with the knowledge to recognize a quality issue do not have the skillset or permissions required to fix it. This knowledge distribution itself isn’t the problem; it happens naturally as organizations grow and businesses specialize. The problem is the feedback loop: namely, the effort and time required to go from spotting an issue to fixing it. If only you could democratize data and still retain governance, the enterprise could take advantage of institutional knowledge and make smart decisions, much faster.
Second, rules make sense to most of us. If X, then do Y, else if A, then do B. Simple, right? Not anymore in enterprises that want to build scalable systems to handle terabytes of customer, supplier, research, and other data. Consider these situations:
- You have to change one of the rules somewhere in the middle of a list. You have to go through a lot of them to find the right one and repeat this process each time a rule needs a change.
- You have to add another 1,000 rules. You start to add to the list, and after a while the list becomes too difficult to manage and has a lot of redundancies.
- If you have one rule out of place, it can cause a number of false positives, which can be a big loss to the business.
- The person who wrote the initial rules for you leaves, and you have to spend time and resources to catch up on the long list of rules.
- The rules are difficult to trace back in the event of any exceptions, and a person needs to know all the rules that are in place in order to do this.
You can already see how, in the era of big data, rules-based decision systems get out of hand. Research shows that humans can only remember about 500 rules. After that, the problems become too complex. In today’s world where big data is the key to success, it’s not feasible to stick with rules-based decision systems.
Why Machine Learning-Based Data Unification is the Way Forward
Tamr Unify is designed to help enterprises democratize their data by shortening feedback loops between end users and gatekeepers in ways such as:
- Integrating Expert Sourcing: Tamr creates a streamlined workflow for your organization’s data stewards to leverage knowledge from as many and as varied experts as necessary. A few simple “yes or no” questions give Tamr Unify’s supervised learning models the ability to correct and refine.
- Leveraging All Data: Often a limited number of databases and tables are used in analytics because it would be too expensive and resource intensive to integrate the long tail of sources, which are critical to getting a complete picture. Machine learning enables you to cost effectively integrate this long tail, and begin leveraging all your data.
- Preserving Provenance: One key feature of Unify is the ability to track and preserve all the changes made throughout the data cleansing process, providing a level of tracking that can be difficult with rules-based systems.
- Gaining Feedback Throughout the Consumption Stage: Many people only work with data at the final stage: consumption. Enabling these users to provide feedback to data quality issues where they spot them has been critical in ensuring the most visible data quality issues get addressed.
With Tamr, scalability and data reliability are no longer issues. Tamr’s enterprise data unification method combines machine learning and human expert guidance to unify data sources across an organization with unmatched speed, scalability, and accuracy. The platform’s core capabilities include “connecting” data sources across an organization to align relevant datasets to a unified schema, “cleaning” the unified dataset through entity deduplication and mastering, and “classifying” records within the clean, unified dataset to a client-provided taxonomy for more robust downstream analysis.
The solution unifies data sets as they come in with heavy assistance from machine learning algorithms and continuous learning integration software. By constantly matching and connecting incoming data to other available data sets, all business units have broader access to the enterprise-wide data asset. This results in faster, more consistent, and scalable analytics.
I may have raised more questions here than I’ve answered here, so feel free to contact us for a demo. We’ll clarify the differences between legacy and machine learning solutions and show you how you can shift your data analytics into overdrive.