Written by Afsana Afzal
- Is it taking too long to onboard new customers due to risk, compliance and due- diligence screening?
- Are your sales reps not getting proper commissions for their sales?
- Are you missing important KPIs in customer service?
- Are you not getting the return you expected from that new Customer360 project?
- Do you really know how many customers you have, what each is actually spending, and what they’ve bought as you close out this fiscal year? What about as you plan for next year?
If your business is experiencing any of these symptoms, then you may be suffering from dirty, duplicate data. It’s a disease that afflicts growth-oriented companies of all sizes and industries. Any business that’s invested in Know-Your-Customer (KYC) programs, whether driven by fact (regulatory/compliance requirements) or fortune (growth strategies), is at risk.
And these aren’t the only symptoms, making dirty, duplicate data very much a universal Silent Killer.
A Common Problem
In working with companies across many industries to help them to unify, clean and classify their customer data, I’ve learned two things: (1) there’s no one-size-fits-all solution and (2) staying on top of customer data demands constant vigilance. You may have great processes or sophisticated data models in place, but if they’re not driven by continuously clean, updated and correctly classified data, you’re dead.
The challenge: Customer data (perhaps more so than most data) is extremely messy and volatile. Structured and semi-structured data is housed in multiple data sources (silos) throughout the enterprise, such as ERP systems, CRM applications or Salesforce.com. Multiple people are constantly adding data in different systems for different reasons, creating a natural drift toward chaos. Some businesses also want to enrich this data using external databases, introducing even more data variety into the mix. Routine “bad” customer data–similar, erroneous or incomplete corporate addresses, disparate customer IDs, unclear contacts or phone numbers that are missing digits–become weapons of destruction if incorrect or duplicated.
All of this data is constantly changing, and not in a nice, neat, single-schema-driven way.
Consider: It’s the end of the quarter. A salesperson, after not easily locating his customer in the CRM system to input a new order, might create a new record (data duplication) or enter the order in an incorrect or incomplete format (creating a null value), errors that might live in perpetuity. The eventual result: lots of duplicate, “dirty” data. (And that possible missing commission.) This kind of data has likely been accumulating in the business since Day One, exacerbated by mergers and acquisitions, other corporate changes and application sprawl. Via a long, winding and not-so-obvious road, it eventually manifests itself in symptoms like those above.
Data management technologies like Master Data Management (MDM) and ETL (Extract-Transform-Load) are useful. But they both involve largely hand-crafted, proprietary rules that require lots of manual effort to update and maintain whenever data or requirements change. Rules may work okay if you have 10 or 25 of them, but they tend to max out at 50. However, we’ve seen businesses with 500 rules or more, which outstrips human capacity to deal with them.
Imagine a war-room-like scenario with dozens of analysts staring at multiple screens as they attempt to enrich their customer data with data from several external databases for risk management purposes (our first symptom above). I don’t have to imagine it: I’ve seen it. (And helped fix it.)
Getting to “ground truth” about customers–data that’s unified, clean and trustworthy enough to power transformational analytic outcomes and critical business processes–is increasingly impossible. Using rules alone, Golden Records remain elusive and “one schema to rule them all” a fantasy.
A Timely Prescription
The obvious solution: Use machines and automation instead of people.
More specifically, use machines AND people (Tamr’s formula). With Tamr’s machine-driven, human-in-the-loop approach, our AI/machine learning models can handle integration of low-hanging-fruit data (~80%) by taking a probabilistic approach (scientific guessing) versus a deterministic, coded approach (like MDM), particularly useful in deduplication. The system involves human experts only when necessary–for example, to resolve non-obvious relationships between two data records or fix other outliers, particularly good at higher-order tasks like clustering, mastering and entity resolution. This human intelligence gets continually fed back into the models, which get smarter and more autonomous over time.
This translates into agile data deduplication, clustering, schema matching, and entity resolution and mastering at scale. The Tamr system takes advantage of the computational power of the cloud (it runs on AWS, GCP and Azure), a common element of KYC strategies today, making it a natural fit technically. (In fact, one Tamr customer is currently using Tamr in moving its revenue-producing information products to the cloud.) Data unification becomes repeatable, MDM rules scalable, true Golden Records possible, and data more trustworthy and analytics-ready.
In working with Tamr customers on various KYC data projects, Tamr Data Ops engineers and data scientists typically take a three-step approach:
Identify the metrics that matter. Every business is different. Our data engineers/scientists work with you to finely identify goals: what entities are critical, and what would constitute success for your particular business problem.
Develop a corresponding data plan: Once we have identified the metrics that are core to your business, we develop a plan on how to tackle your data. We work with you to identify datasets, tables, entities, and rules and their flow and prioritization.
Build, train and deploy Tamr models: Once the data is ready to be ingested, we utilize your subject-matter expertise to train an initial Tamr model to address your data challenge. We then iteratively improve on those results by continuously integrating your feedback until we have achieved our defined business and data objectives.
The Prognosis: Business-Transforming Analytic Outcomes
Once this upfront work is completed, Tamr models are quickly on the case, running in the background to help perform KYC “miracle cures” like these:
- An electrical components manufacturer needed to understand how many customers it had, across 212 data sources (tables, mostly in SAP). Tamr consolidated, cleaned and classified these data sources, yielding an accurate dataset of almost 125K customers (versus 226K customers previously). This new dataset has fed transformational business analytics, including the revelation that the company’s customer distribution was in fact dominated by mid-sized customers vs. low-end, lower-spend customers. Previously, data variety was naturally skewing analytic answers, creating misinformation instead of clarity.
- Thomson Reuters is one of the world’s most trusted providers of answers, helping professionals make confident decisions and run better businesses–including providing information services that speed customer due-diligence. Thomson Reuters has used Tamr’s machine-learning approach to overcome its data integration challenges, expediting data integration efforts by several months, reducing the manual effort needed to integrate datasets by over 40%, and achieving precision and recall rates of over 95%. Particularly given the scale at which TR operates, such numbers are business-transforming. TR now offers Tamr services to their customers as a Tamr partner.
- ScotiaBank used Tamr to create a “data factory”’ for its customer mastering/Know Your Customer project. In just over six months, the bank ingested and profiled 35 large data sources with 3.7 million rows of data to produce 325,000 clusters of customer records. The KYC team is now able to onboard a new system from landing data to mastery in just 5-7 days, and create a new Golden Record in a maximum of two days.
- Healthcare company Sunovion achieved a 75% reduction in the manual effort related to customer-data integration and made accurate, detailed analytics possible for their next-generation analytics platform. “End users don’t need to worry about what datasets are coming in anymore, or if they’re matched and merged properly. It’s all irrelevant–the right data is just there now,” says Naresh Murthy, Managed Markets Analytics Industry Leader.
- To seamlessly serve its customers across all channels (Web, retail, apps), a leading automotive company embarked on a project to understand and predict the needs of its customers and make each interaction feel cohesive, in turn enhancing customers’ experience with the brand. The resulting 40% reduction of duplicative customer records feeds better data for analytics and operations, increasing efficiency and business value.
While dirty, duplicate data may never completely go away (it’s a fact of enterprise life, like head colds for humans), it’s now more under control. The smarter Tamr models get, the easier it is to add and integrate new data sets and set up automated workflows with very little manual work.
We’re constantly coming up with new, advanced therapies, such as next-generation data stewards and low-latency matching. Both make data consumers (like our gone-rogue salesperson above) part of the data-unification solution instead of the problem.