datamaster summit 2020

Tamr Customer Mastering for Financial Services


Guillermo Gomez

Mid Market Sales Manager

Gain unparalleled insights in your data to power customer 360. Learn how you can break down data silos to achieve high quality, consistent customer data in order to improve regulatory operations, lower their risk and enhance customer experience.

Master customer data to support up-sells and cross-sells by creating a single view of each customer, including transactions, interactions, and products.


 Hello, everyone, and welcome to Tamr Demo Day’s Financial Services Edition. My name is Guillermo Gomez and I’m a sales engineer here at Tamr, specializing in the financial services vertical. What I’m going to show you today is going to be a high level overview of who Tamr is as a company. And then I’ll go into a demo of one of our solutions for the financial services vertical. So to provide you some background on Tamr.

00:27 – 00:59 Guillermo Gomez

Right, we view ourselves as a modern alternative to master data management, right where we provide data unification, curation, enrichment and mastering services for the modern enterprise at scale on the cloud, right? So we are cloud native across all three major cloud providers, which allows us to integrate very easily with customer ecosystems and achieve high scales with regards to both volume and data variety. And as I’ve included here on the slide, you can see a few of our high profile customers within the financial services space.

01:00 – 01:52 Guillermo Gomez

As far as the background of Tamr and where we came from, we were actually the core algorithms that underlie our machine learning. Guided data mastering workflow came from the MIT Computer Science and AI Research Lab, who was actually headed there by Turing Award winner Michael Stonebraker. So back in 2012, he and Andy Palmer, our CEO, got together and noticed an opportunity in the market. And so what we’ve been hearing for the past several decades, right, is that big data is the answer to our analytics and operational problems, right? If we collect a lot of data, will be able to make better decisions and have better business outcomes. And so there is this push in what is now the legacy push for these on premise data lakes and data warehouses in these complicated master data management systems.

01:53 – 02:26 Guillermo Gomez

What we’ve seen recently is the next evolution of that development in the market is this push to the cloud right where we want to avail ourselves of elastic compute and storage capabilities and the rest of the ecosystem that the cloud provides. The problem that we’re still faced with, even in this cloud based world, is that the siloed nature of our source systems prevents us from having a single view of our customers and having complete and impactful analytics or operational processes.

02:27 – 03:08 Guillermo Gomez

And so the, you know, take that Tamr has as a way to address this problem, right, is to embrace a best of breed modern data ops ecosystem. And so the idea here is to go and determine, you know what, what is the best tool for each part of my data management workflow? And how can I best accomplish this, this modern approach where I get a unified view of my data, you know, whether that is my legal entity data, my reference data, my customer data, and how can I make it easily accessible in clean and curated right? And so we’re Tamr fits into this picture is around the unification, curation and mastering pieces.

03:10 – 03:36 Guillermo Gomez

So to give you a high level overview of our capabilities, right? The idea is that you’re going to have lots of data coming from several different sources. These could be third party reference data sources. These could be internal upstream sources. And the first step is to figure out how do we get all of these together, right? How do we align these to a common schema and get a single view of what we’re looking at here in this case? Is the Lenovo Group, right? They could be coming from several different CRMs. Transcripts are created by using machine-learning software      

03:37 – 04:15 Guillermo Gomez

And the key here, right? The idea for how Tamr tries to solve this problem is to leverage human expertise and then combine it with the automation provided by sophisticated machine learning models. We bake into this process and enrichment capability where you’re able to level up your quality of data throughout the process and then control the publishing of that data to your various analytical and operational downstream consumption and points. What exactly I mean by human guided machine learning and how that factors into tamers approach to mastering data will become apparent during the software demonstration in a few moments.

04:17 – 05:04 Guillermo Gomez

Before diving into that, however, I want to just give a quick overview of the solutions we provide within the financial services space. So the one that I’ll be discussing today is going to be around customer data mastering, but we also have several solutions around reference data, mastering, you know, automating that onboarding and enrichment, using third party data sources, household thing, right? You know, being able to build several different models that are mastering at different levels of granularity to support, you know, the relevant business unit and and use case in many more, right? The idea here is that, you know, Tamr provides an agile, high performance way of quickly bringing your data together and making it such that when you do make this move to the cloud right, you’re going to be consuming high quality master data easily and widely throughout the enterprise.

05:06 – 05:45 Guillermo Gomez

So to set the stage for the demonstration that I will give, we’re going to put ourselves in the position of Anjali, and Anjali is a account manager at Golden Horizon Bank, which is a multinational financial services firm, and that they they’re facing a lot of the similar problems that a large financial services enterprise face today. Their data is siloed across various products and divisions and source systems, and getting access to a singular, unified, clean view of this customer data is very difficult because of this siloed federated nature of the enterprise.

05:46 – 06:04 Guillermo Gomez

And so what Tamr is going to do right is going to take all the data from these upstream systems. In this case, we’re looking at Salesforce, Microsoft Dynamics and Marketo. It’s going to master those create that curated single version of the truth. And then it’s going to help publish those curated views to your downstream operational and analytical systems.

06:07 – 06:53 Guillermo Gomez

And so to give you an idea of what might be Anjali’s day to day prior to Tamr, right, if she’s trying to prepare for a meeting with, let’s say, Lenovo with at one of her key accounts, she’s going to go to her CRM system. In this case, we’re looking at Salesforce. And when she goes to look at this account page, she’s noticing not a lot of data, right? And this is inhibiting her ability to accurately prepare for this meeting. She figures, let me go, check out the hierarchy. Maybe there’s some other related entities to this account that I can get information from at this point. However, she’s noticing the hierarchy isn’t in any possible order. It’s not accurate, and she’s now starting to lose faith in the data that she’s consuming.

06:55 – 07:42 Guillermo Gomez

This is where Tamr enters the picture, right? Once Tamr has has entered the the ecosystem and is able to unify data from these various source systems, this view becomes a lot different. Right? All of a sudden, this hierarchy view is now very informative. Right? She can see exactly where that Lenovo Whitsett account that she’s preparing for a meeting for sits within this larger hierarchy. And when she goes back to that account page, you can see all of the information that’s going to be very helpful for preparing for that meeting, right? You know what? What products they own. How much business they do with Golden Horizons Bank, et cetera. And if you look here, you’ll also notice a Tamr ID. Now this Transcripts are created by using machine-learning software is the output of that team remastering process. It’s the unique identifier that is signaling the single entity for this Lenovo Whitsett set account.

07:42 – 08:34 Guillermo Gomez

Right. So if I go ahead and. Click on. This cluster is what it will take me to is a Tamr user interface where we can see that golden record, right? And if we look here, we can see that we have populated high fidelity information that is actually coming from several sources. So if we dig into it, we can see that there’s actually eight records comprising that single version of the truth that she was consuming in that sales force view. And Tamr was able to take those eight records identify that they’re actually all representing the same entity despite high levels of variety across. For example, address, we have missing data here. We have different iterations here on name. Tamr, despite that, was able to use its human guided machine learning approach to identify these records is all representing Lenovo, which set.

08:36 – 09:32 Guillermo Gomez

So how did Tamr get here? The first step is data ingestion, right? You know, we’re looking here at a data sets page where we can see we’ve got twenty three different data sets that are all fueling this data mastering process. What happens after this is Tamr is going to align all of these myriad upstream sources to a common unified schema, right? Figuring out which attributes actually map to other attributes, right? These systems, as I’m sure you’re familiar, do not have consistent naming conventions. So figuring out something as simple as which columns contain the same names is often a tricky step. And that’s what Tamr tackles first in the schema mapping module after this unified data set has been created. That’s where the machine learning when it comes to the matching model enters the picture. And what I mean by that is this is where a user is going to come in and look at pairs of records that Tamr surfacing as potential matches and solicit feedback on them.

09:37 – 10:17 Guillermo Gomez

So, for example, this will be a familiar record for us, given the sales for few we just had. We’re looking here at Lenovo Whitsett and Lenovo Co.. Right. We can see there’s differences on address here of sixty five forty Franz Warner Parkway versus sixty five for DFW Parkway. All right, this is exactly the time where during model development, Tamr is going to solicit a user for feedback and say, Hey, you know, is this record the same as this record or is it not the same now? With that feedback, Tamr is going to learn from the data patterns hidden in these records and establish a model for automating the matching of future pairs of records. Right?

10:18 – 10:45 Guillermo Gomez

And so the idea here is that, you know, instead of having to go, take the legacy approach of articulating rules to determine how to harmonize these records, you’re just providing examples. And that has a few benefits. One. It’s much faster, right? You don’t have to spend months throwing developers at the problem. You can just have some subject matter experts who know the data best come in and provide these yes or no answers on the straightforward user interface to its scales.

10:45 – 11:25 Guillermo Gomez

Right. The more data team sees, the better it gets. So with the traditional approach, when you have to articulate several rules to harmonize these data, when you bring on a new source, a lot of times these rules become inconsistent and you’re back to square one having to throw more developers at the problem to help solve the match conditions with more rules with the machine learning based approach. You don’t have to worry about developing rules, and all you have to do is provide this feedback so it lets you go faster, lets you scale, and then it also gives you higher performance, right? We see here a Tamr, much higher match rates with our machine learning approach than with the traditional legacy rules based approach.

11:29 – 12:01

Guillermo Gomez Transcripts are created by using machine-learning software      

So what’s the output of this process, right? Once you’ve trained this machine learning model through this simple, straightforward user interface of yes or no labels, Tamr is going to create clusters, right? And so if we look at this Lenovo cluster, we can see those very same eight records that we saw on that Golden Records page. Tamr has identified them all as a match. It’s provided a unique identifier, this being that key that we saw in that Salesforce page. And this is going to be that critical unified entity view that is going to fuel those downstream system.

12:02 – 12:44 Guillermo Gomez

What we have here for the output of that model training step are the clusters of records a Tamr is identified as match in. The idea here is that you have a mature machine learning model that is now accurately identified. These eight records that we saw in that Golden Records page as all representing that unique entity Lenovo Whitsett. In the way that this development flow occurs is that users will come in, provide those paper labels and then review only those low confidence clusters. So Tamr is going to assign a confidence score to each of these clusters that you see on this clusters page and solicit users for feedback on those lower confidence matches.

12:45 – 13:27 Guillermo Gomez

Now, this is a very typical part of any data management workflow, right? Reviewing the low confidence matches using humans. The difference with how Tamr tackles this problem is that any human task that is done in the software is actually going to be captured so that you can eliminate these repetitious tasks moving forward. And what I mean by that is if a reviewer comes in here and validates these records as being a match or decides to provide a review action and correct one of Tamr’s matches, Tamr will continue learning from those actions. So over time, as a reviewer takes actions within Tamr, Tamr is going to keep getting better, thus diminishing the tale of review required at all.

13:29 – 13:50 Guillermo Gomez

And so the key takeaway here is that tamers machine learning approach allows you to leverage the expertise of humans without having to have a high touch manual process necessary to fuel your data mastering operations. And so what can you do with this mastered data for your for your various entities?

13:50 – 14:48 Guillermo Gomez

Right. So we already looked at the impact that this has on the on on Julia when she’s preparing for that meeting. Right. But this data can also fuel analytical downstream consumption systems. And so what we mean by that here is we’ve built a looker dashboard, right where we can see now a fleshed out hierarchy for this Lenovo Group, right? If I’m Angali and I’m preparing for this meeting now, I can see we actually do quite well with an over width set. But, you know, Lenovo Morrisville down the road, we haven’t sold nearly as much to. So maybe that’s telling me as an account manager, you know, hey, that’s a good opportunity for upsell. And the idea there is that tamers master data is making all of this possible, right? Without that unified view. You’re not able to deliver these, you know, operational or analytical business critical outcomes. Thanks so much for your time today. You guys have any other follow up questions, please feel free to reach out to someone on the team or team. Transcripts are created by using machine-learning software