Better Feedback for Better Data Quality
Here’s a basic scenario: You’re a marketing analyst looking for some numbers to gauge how well a recent promotion did. In checking the May sales reports, you find the numbers you need, but something seems off. You saw better May sales results in a presentation just last week, but they’re not reflected in your data. The reports are at an aggregate level, so you can’t look through every possibility to see where the discrepancy is coming from. In order to figure things out, you go to the database managers (the “gatekeepers”) to explain the problem you found, and ask them to look into it.
The obvious problem here is that the person who knows the most about the data in context – you — isn’t the one in control of it. This data quality management scenario happens constantly in real corporations, creating delays, false reporting and many other obstacles to sound analytics.
Worse, this kind of data disconnect instills distrust between teams. It’s a hard reality that institutional knowledge of data and processes is scattered across any organization. Quite often, those with the knowledge to recognize a quality issue do not have the knowledge required to fix it. This knowledge distribution itself isn’t the problem; it happens naturally as organizations grow and businesses specialize. The problem is the feedback loop — namely, the effort and time required to go from spotting an issue to fixing it.
But what if those loops could be shortened? Is it possible to empower your analysts, data scientists, and business developers with visibility and access to the data while maintaining data governance all along the way?
For the sake of empowering our marketing analyst above — and great data analytics everywhere — he answer must be yes. Integrating opportunities for data quality corrections throughout the pipeline, creating seamless workflows that take advantage of diverse institutional knowledge, and leveraging the bottom-up feedback of experts are the key steps in creating a data democracy where accessible, transparent processing and storage are considered integral to high-powered analytics.
Tamr is designed to help enterprises democratize their data by shortening feedback loops between end users and gatekeepers in the following ways (some already in production, some aspirational):
- Integrate Expert Sourcing
Tamr creates a streamlined workflow for your organization’s data stewards to leverage knowledge from as many and as varied experts as necessary. A few simple “yes or no” questions give Tamr’s supervised learning models the ability to correct and refine. Tamr increases your ability to collaborate efficiently and effectively between teams, even among the most disparate datasets.
- Leverage All Data
Often the same tables and databases are used across the enterprise for different purposes. Rather than sorting through requests and ranking them appropriately, Tamr gives your enterprise the ability to efficiently find and catalog all necessary data … and you the confidence that it’s actually what you’re looking for. Cataloging and connecting all your data within Tamr gives your analysts and data scientists the ability to focus on finding key insights to propel your business instead of focusing on data prep.
- Preserve Provenance
One key feature of Tamr is the ability to track and preserve all the changes made throughout the data cleansing process. Tamr uses intelligent sampling to ensure that even when data is altered, the keys and mapping remain consistent. Tamr delivers a “metadata map” that provides information about transformations, schema mapping, and deduplication.
- Streamline ETL with Predictive Transformations
Tamr can tell when the model needs to be altered or requires more feedback because of non-standard or otherwise difficult-to-process data. In these cases, the Tamr system will provide suggestions for transforming those columns or sources. Combined with Tamr’s expert workflow, this approach creates a proactive collaborative environment for your data processing.
- Gain Feedback Throughout the Consumption Stage
Many people only work with data at the final stage: consumption. Whether the data is displayed in a Tableau dashboard, an SQL query, or even a single number in an email, there are still opportunities to find quality issues. In fact, the farther downstream from your data sources that you identify quality issue, the more complex communication and cooperation becomes in order to isolate and resolve the issue. Tamr’s goal is to be able to work with all consumption media in order to provide feedback directly to the data preparation process. Here, a few clicks of the mouse would access the comments of those using the data to make decisions – enabling a much more empowered, democratic approach that embraces the fundamental data diversity within the enterprise.