datamaster summit 2020

Master Data Management (MDM) Demo



On-demand: Customer Data Mastering

15 minute demo of how Tamr tackles disparate, inconsistent data to empower organizations with a mastered, 360 view of customers to drive business insights.

The demo covers:

• How mastered data impacts operational source systems such as Salesforce
• How to master data in Tamr using machine learning
• How mastered data drives analytics for business insights



This demo looks at Shield Inc, a B2B semiconductor company that is trying to capture a 360 view of its customers but struggling to reconcile data sources and tackle data inconsistencies to gain a true customer view. Lenovo is one of Shield Inc’s most important customers. Shield Inc’s relationship manager for the Lenovo US account, Anjali has an account review meeting with Lenovo in the US to negotiate contracts for the year ahead. She’s primarily been dealing with their manufacturing plant in Whitsett where the relationship appears to be thriving, but now she wants to make sure that she expands the account to other locations. She wants to look for opportunities to upsell in the account hierarchy by sales region and across the corporate parent, and come into the discussion armed with information on product sales so that she can recommend relevant products across the account. As a sales rep, the CRM tool is her natural starting point to find the information she needs, In this case, it’s Salesforce. Here she’s come to the specific account page for Lenovo in Whitsett, but it’s a very limited view. There’s no revenue information flowing through which means Lenovo Whitsett is listed as inactive customer, even though she knows they’re a major customer

Lenovo United States Inc is listed as the parent account so Anjali can click into the hierarchy to see if there’s more information on the Lenovo account as a whole. We can see a long list of accounts associated with Lenovo United States in multiple locations, both  US & international. Some of these appear to be duplicates  making it hard to make out what the distinct account locations are. Anjali’s original goal was to identify opportunities to cross-sell to other US Lenovo accounts but she’s struggling to see opportunities amidst the mess. She’s now concerned about not having basic information for the account review at Whitsett and worried about jeopardizing the relationship.  The issue here is of course messy data in our CRM. But not only that, Shield Inc is a company that has grown through acquisition and they have multiple ERPs and CRMs, as well as data on indirect sales received from distributors, sometimes sent in spreadsheets and often extremely messy. So the information Anjali needs exists, but it’s scattered across various siloed data sources and business departments. 

The solution to all of this is data mastering.

Let’s look at this same hierarchy account page after Tamr has mastered all of Shield Inc’s customer data. We can see straight away that it’s a much cleaner view: we have one instance of Lenovo per location and we’re including the parent hierarchy. It means that as a sales rep, Anjali can now see exactly where the account sits within the wider group and recognize that Fujitsu and Motorola Mobility are part of the same global Lenovo Group. Now, when she clicks into the specific Lenovo Whitsett account, she can see much richer information flowing through.

She now has visibility on the total run rate revenue of $8.4M for the account, the average discount rate of 2.7%, the direct sales and the different product lines purchased which help Anjali to enter the account negotiation better informed for upsell. All of this data comes from the output of Tamr customer mastering,  and not only is Tamr providing the complete cleansed set of values, it’s also providing a TamrID , which is a new universal Customer identifier for use across all of Shield Inc’s enterprise applications. 

If I click on theTamr  ID, it will bring me through to Tamr’s user Interface. Here in the Tamr UI,  we can see the Golden Record for Lenovo in Whitsett. Golden Records are one of the outputs of a Tamr process called customer mastering. Golden Records datasets are trusted sources of clean, harmonized and enriched data that can feed any enterprise application, whether that be CRM systems such as Salesforce, or analytical dashboards, which I’ll show you an example of later. We can curate golden records in this UI if we like, but by default each golden record is populated automatically by Tamr with data originating from 1 or dozens of source records.

We can click on the golden record to see those original source records. Here we see 8 original records that make up what we call a Cluster.  A Cluster is a group of records that all represent a single distinct customer. We can see here that these records come from a variety of different source systems, for example Marketo, Salesforce and SAP.  We even have some external data such as Duns and Bradstreet. All of these data sources, which were previously siloed,  have been brought together to create a unified dataset of golden records. And because Tamr gives you the cluster information as well as the golden record, you get the data lineage which allows you to link the golden data back to the original sources. Shield Inc tried to do something like this in the past using a traditional master data management tool, but the issue they had is that they weren’t able to match these source records together as being the same customer, and so without being able to match them, they couldn’t unify them to get the complete golden record. This is because the traditional MDM tool was heavily based on rules,  and this rules based approach doesn’t scale when there is a lot of data variety.

For example, you can see that none of the source system ids are the same. The names have slight variations such as Lenovo Co and Lenovo PC,  there are abbreviations and misspellings in the street address, and there are missing values in some fields as well. You might be able to solve this with rules if it was just 8 records, but we’re not creating a Golden Record only for Lenovo here, we’re doing this for every one of Shield Inc’s 270,000 customers. So the overall scale of data variety here is enormous. And with the traditional approach, you find yourself having to develop 100s , sometimes 1000s of programmed rules to try and handle every possible type of variation. And not only do these rules become unmanageable but they still don’t work.  

So how does Tamr do this? Well, we know it’s tricky for rules, but as human beings, right now we can look at this data , and by simply glancing at the values we can confidently say that these are all the same customer. For humans it’s easy, we’re just not fast enough, and so on our own we don’t scale. 

So that’s why Tamr uses an approach based on human-guided machine learning. Instead of developing lots of complex rules, you simply give Tamr a few examples and Tamr will learn to think like you do. So let me show you Tamr’s ML based Customer Mastering workflow so you can understand how it works. The first step is to register your source datasets. Here we see 23 disparate datasets from a variety of different source systems. What they all have in common is that they contain information about customers, so we need to unify ALL of this data if we’re going to have the complete picture.

If you look at the tabs across the top they represent the journey of a customer mastering project. First is the source datastets as i mentioned. Now, since these datasets come from different source systems, they each have different structures, you can see they have different numbers of attributes and these attributes are all named differently in each system. So the next step is Schema Mapping , where we align all our source data into a single Unified Schema,  so that everything is lined up and we can start doing comparisons. In the Unified Dataset tab we can preview our unified data,  and then we have the Pairs tab which is where we start training the machine learning.  So let’s jump to the pairs tab to see how this is done. 

I’m going to filter my assignments, and then I’m going to select a pair. This is what we call a pair, it’s two records side by side, and we are teaching Tamr how to automatically identify matching pairs and non-matching pairs. Since Tamr has already received some training, Tamr’s able to give an answer, and we see here that Tamr thinks that this is a Match. But we can give feedback to Tamr:  we can say Yes you’re right this is a match,  or No this is actually not a match. And then Tamr learns from this input and so the ML model becomes more accurate. 

So who gives this feedback? You can see it’s a simple matter of looking at the data and clicking either Match or No-Match, so you don’t have to be a data scientist or a developer to train Tamr’s ML. The best person to give this kind of feedback to Tamr is what we call the Subject Matter Expert, or SME, someone who knows the data domain, and how this data will be used. In essence, someone who knows about Shield Incs customers. You might ask, why do Shield Inc SMEs need to be involved at all, why can’t Tamr just provide a black box that does everything. The reason is because Customer mastering can be subjective . What Shield Inc calls a single distinct customer, another company might say is two different customers. It depends on how they run their business. So although Tamr can provide pre-trained ML models to kickstart the process, SMEs will always have the option to review what Tamr is doing and give feedback to Tamr, this gives Shield Inc the confidence that Tamr is behaving as they would like it to. And this is very flexible. Shield Inc  if they like , can run multiple customer mastering models in order to have different views of their customers. 

You can see why this might be tricky to solve with rules …   A lot of the fields don’t match,  but they have various degrees of similarity; some of them very similar, some of them blank.  You can imagine the number of possible combinations of similarity here is infinite. Trying to write rules to handle all the different cases and not contradict each other is a massive design effort in itself.  

Tamr’s approach is much more agile than the rules based approach. Instead of having to spend months designing everything up front , you can just send the data straight through Tamr , immediately start looking at the output, and then tune the model by simply giving feedback to Tamr with simple yes/no examples.

So coming back to the workflow,  Tamr has learnt to identify matches and non-matches and from this it can group the records into what we call clusters, so this brings us to the final tab where we can review these clusters. You can see Tamr has taken 672,624 records from those 23 raw data sources, and has grouped them into 270,439 clusters, where each cluster is a single distinct Customer.

And if I filter to my assignments, here we have the Lenovo Whitsett cluster again which is the same cluster we were looking at earlier on the Golden Records screen. The mastered data not only feeds operational applications like Salesforce or SAP, but it  also feeds analytics that empower key decision making on the accounts. Anjali is not the only set of eyes on Lenovo: they’re a key account so the CRO and senior management want to be able to review the account performance. Decent account-level analytics were virtually impossible before we had Tamr mastered data – nobody had trust in the data due to duplicates and poor data quality, and important fields on sales metrics were missing.

If Anjali looks at the dashboard now for Lenovo Group, she’s now armed with all the key information to enable cross-selling. She can drill down into the account hierarchy to view all accounts within the Lenovo Group by parent and ultimate site location. She can now see that as well as having the Morrisville account for cross-sell, there’s also the opportunity to expand to Motorola Mobility Chicago. She can also make informed recommendations for what to sell. She now has visibility on the products purchased by the Whitsett site, as well as the entire Lenovo corporate family, She knows what product ranges are most popular – there are 3 products that stand out across the lenovo group: the ava 500, the ava 938 and the hal 329 – so she can use this as a starting point for recommendations with the Morrisville and Motorola Chicago sites. Anjali now has a plan for how to drive account cross-sell and ultimately grow the revenue for the account.

If we scroll down further, we get a sense of the breadth of analytics that’s been unlocked by this mastered data. There are lots of great analytics tools out there, but they all share the same problem: without the correct data they are useless. Tamr ensures that across the enterprise, all analytical reports, dashboards, and operational systems can be fed with clean, up to date, harmonized data. Data that can be used immediately and effortlessly and more importantly,  data that can be trusted.

We looked at the positive impact that Tamr Customer Mastering can have on day to day operations using CRM tools such as Salesforce. 

We saw HOW customer mastering is done in Tamr using a unique agile approach based on human guided machine learning. 

We then focused on the analytics that mastering enables, and most importantly, how the data can inform key strategic decisions and ultimately drive significant return for the business. 

Thank you again for your time, and if you’re ready to modernize MDM and drive growth for your business, talk to one of our data experts today.