datamaster summit 2020

3 Common Challenges Associated with Customer Mastering and How to Overcome Them

 

Scott DeMers

Head of Global Pre-Sales at Tamr

Gaining a holistic view of your organization’s customers across touchpoints is essential to maintain a comprehensive, up-to-date view of consumers. But, many organizations struggle when consolidating internal data sources (e.g. CRM systems) and connecting structured and unstructured external data sources. This session will demonstrate how organizations like Toyota, Johnson & Johnson and Staples master their customer data at scale to improve sales and marketing efforts which drives increased customer satisfaction and retention.

Transcript

Announcer:
Data Masters Summit 2020 presented by Tamr.

Scott DeMers:
Hello everyone. We’re here to talk about three common challenges associated with customer mastering and how to overcome them. My name is Scott DeMers. I’m the head of global pre-sales here at Tamr. I’ve spent most of my career in the visualization space, about 25 years actually. Most of that with Tableau. With me here today is Mingo Sanchez.

Mingo Sanchez:
Hi, everyone. As Scott said, my name is Mingo. I’m a sales engineer at Tamr and during my time here I’ve worked on numerous customer mastering projects. It’s the most common use case we see at Tamr. So really looking forward to sharing some of the challenges we’ve seen in the field and how to overcome them.

Scott DeMers:
So why is accurate customer mastering important? It’s pretty straightforward problem really. The first one is avoiding churn. When you have high value customers, they literally are your most important asset. And if they’re not handled with care, there’s a solid chance that they can churn. Typically, your highest value customers demand a level of service that’s different than say your everyday customers. And often they make up as much as 80% of the revenue of your organization. Understanding who those high value customers are is really important to managing the relationship with them appropriately.

Scott DeMers:
In addition, there’s legal compliance. And this problem is not as straightforward as you would initially think. There are things like regulations like GDPR, CCPA, et cetera. But there’s also the idea that data can’t go across national lines or can’t leave a region or even a state or a province. And those types of problems are often difficult to identify as well. Thirdly is resource management. When you’re doing marketing spend, don’t you want that targeted at your most appropriate places to get a return on your investment? If you don’t understand who your customers are, what makes them buy, that’s a very difficult problem to solve. And so we often see customers who haven’t gotten the return on the investment that they expect to get from their investment strictly because they’re spending it in the wrong place.

Scott DeMers:
And so these three reasons are why accurate customer mastering is very important and it does take many forms. So for example, churn often looks like a customer retention problem, or maybe stagnating revenue growth. That’s a great leading indicator that you may have a problem with churn. Are your high value customers actually purchasing elsewhere and you just don’t know it? Or maybe you have declining customer satisfaction and this can even be customer satisfaction that’s leveling off. If you are surveying your customers for customer satisfaction and you’re seeing that these scores are actually starting to go lower, typically that means they’re already buying somewhere else.

Scott DeMers:
And then there are other indicators that can really tell you something. Things like, hey this customer has been ordering from us every week and now they’re starting to order from us only once a month. If they’re not ordering with regularity, there may indeed be a problem. And finally, another one, another form that this takes is in declining order size. They’re buying on a regular basis from me, but maybe they’re buying in smaller scale than they were before. My guess is these customers aren’t declining in the amount of overall supplies or services or whatever it is that you sell that they need, but they may be getting that somewhere else.

Scott DeMers:
On the legal compliance side, I think the national laws and the laws of regions like the EU and whatnot are the low-hanging fruit. That’s pretty straight forward. Do this or you may face some type of penalty. But there are also data privacy rules. Maybe only certain people within an organization can see certain sets of information. And if you violate those rules, you can get yourself in a lot of trouble and face heavy fines and penalties. And that same logic is true for cross-boundary limitations on data. There is often a case where this information needs to stay within a certain location and if it doesn’t, all processing must be done in United States or all processing must be done in China or whatever the case may be. And if it doesn’t, again, you can incur fines and penalties and these can be literally world shaking fines.

Scott DeMers:
On the resource management side, inefficient growth initiatives are one of the biggest challenges that companies face when it comes to capturing new customers. Are you spending your money in the appropriate place? Often that comes from a lack of customer visibility or a lack of prospect visibility. If you don’t know your customer, how can it be that you can spend your marketing dollars to find the next level of customer?

Scott DeMers:
And so that targeted spend, you really want to make sure that that is being spent on the customers that you actually can retain and that you can actually capture. And certainly we want to do this with the least cost. If you’re spending more to acquire a customer than you’re receiving in benefits from said customer, that’s not a great application of your resources. And so making sure that you’re keeping your costs targeted at the customers that you can capture is really important. And the same logic applies in retaining customers. So it’s not just your high value customers, but your regular customers matter as well. So we thought the best way to show you this was by way of an example.

Scott DeMers:
We’ve created a fictitious company called Cerebral that is a commercial business doing business with other businesses. And boy, it looks like things are pretty healthy. They have almost 200,000 customers with an average lifetime spend of $78,000 per customer. And notice that they have 22 customers that are labeled high value customers who have over a million dollars in lifetime spend. You can also see that the customer spend over time really has a nice trajectory to it. Although it’s flattened out over the last little bit, and maybe that’s something worth investigating as we discussed earlier. If you’re starting to see things level off, that could be an indicator of churn. And then in the lower left, we have a geographic distribution. And it seems that most of our high value customers are in the South and in the steel belt. And so it’s clear that where we’re doing business probably as in an industrial area.

Scott DeMers:
And finally we have customer lifetime values of our high value customers. And so we can see that companies like Lighthouse and Science Applications International are important to our business. And just by clicking on those, I quickly understand that we’re actually seeing some declining revenues here. And so as we discussed earlier, this could very well spell that Leidos, our most important customer, is perhaps going to churn for us. That’s a problem. Science Apps International is also potentially a problem in that their spend is highly erratic and erratic spend could also be an indicator of problems for us.

Scott DeMers:
So although at the surface level things seem pretty great, just by a couple of clicks we’re identifying that, you know what? We have some problems here, perhaps with how we understand our customers, which may mean customer mastering is a challenge for us. You’ll notice that our most important state is California and well, that’s interesting. So our second most important customer is Science Applications International, but I can just eyeball that we have two of them listed as customers. And so clearly we don’t properly understand our data or something pretty simple that machines have trouble with, telling them that Science Apps International and Science Applications International are one and the same.

Scott DeMers:
This data is actually coming from 21 different data sets we’ve grown through acquisition, like a lot of organizations have. And it’s possible that we missed something using rules to try and pull this information together so that I could have my unified view of the data. And so that’s probably worth us investigating and to do that, we’ve brought in our crack investigator Mingo Sanchez.

Mingo Sanchez:
All right, thanks Scott. So diving into that example that Scott just walked us through. We’ve loaded all those records that we were showing powering that dashboard into Tamr. And immediately you can see the data on the right hand side of this page. These are all the records from those 21 source systems that Scott was mentioning. And as I scroll through here, two things jump out that I want to draw everyone’s attention to. One is that sometimes we don’t have all the information in one source that we do in another source. So look at the street address here or the city, for example. That information is present in some of these tables, but not in others. So that alone presents a pretty big challenge if we have those rules that are relying on several key fields for doing that matching.

Mingo Sanchez:
The other thing that you might notice is that sometimes we have different formats within the same types of columns. Street address is a great example of that, where sometimes we have the fully spelled out street names and other times we have abbreviations like Ave versus Avenue. Now these are really easy data variety problems for people to solve to figure out that, say 123 Main Street and 123 Main St are the same thing. But gets really complicated when you’re trying to program a system to handle all of those edge cases. And we’ve talked with customers who sometimes have thousands, tens of thousands, or even hundreds of thousands of these different rules. So very quickly, it becomes difficult to manage and maintain those rules sets.

Mingo Sanchez:
Now using an approach where instead of codifying all those rules, we can simply teach a machine to think like a person does, we’re able to get much more accurate views on our customers. So turning to that SAIC example that we saw in that previous dashboard, you can see here on the left-hand side that using this machine learning based approach, we’re able to much more accurately group together these records. So, whereas before we weren’t able to get that accurate 360 view of those differences in say the names and the addresses.

Mingo Sanchez:
Here, we’re able to teach a machine learning model that these records are all actually associated with the same customer. And we as people can eyeball this and say, “Yeah, that looks pretty right. All these company names, even though they’re slight variations appear to represent the same company as one another, these addresses all appear to be the same”. And even when we have missing information like the city or the zip code, training machine learning model like this it’s robust enough to pick up on those differences between these different data sources and ultimately put together those records in the way that they should be put together.

Mingo Sanchez:
Now that’s where we end up after you’ve gone through this grouping process, but as for how we actually train up those models it really is a sea change in how we’re implementing these systems. No longer do you have to spend months or years having your data experts, your data engineers and your data scientists spending hours and hours each day codifying these rules. Instead, you can just really simply give access to the data experts, the subject matter experts who know what you want to do, and ultimately what should be considered the same customer as one another. And those people can provide that feedback on the records that should go together.

Mingo Sanchez:
So the way that we got those results that we saw on that previous page, the way that we were able to get those accurate customer views, it’s really pretty simple. All we had to do was surface a couple hundred really simple, yes or no questions to users. And when they answer those questions, they’re able to just very simply get these views of are these two records the same as one another? Or are they different? Yes or no? And every time they’re providing one of those points of feedback, they are in essence training that machine learning model to pick up on the similarities that a person would be using to do that same process.

Mingo Sanchez:
So you don’t need to worry about all those edge cases of abbreviations, misspellings, difference in punctuation, or even presence of different columns altogether. As long as the person is able to figure out whether or not two records are the same, you’re going to be able to train a machine learning model to do pretty much the same thing provided that you just spend a couple of minutes each day at the start providing this training to the system.

Mingo Sanchez:
And what’s even better is that as you provide more and more of that feedback, using a machine learning based approach like Tamrs, you’re able to get much more intelligent with how you want to target that feedback going forward. So you don’t need to spend tons of time analyzing the data, figuring out where do you need to go next to find those edge cases that you haven’t yet addressed. Systems like Tamrs are designed to provide that targeted feedback. So after you answer say a couple dozen questions, that next round of training, Tamrs is going to be really good at handling those cases that you’ve seen before. And it won’t need to ask those sorts of questions going forward. Instead it can figuratively raise his hand and say, “Hey, I haven’t seen cases like this before, provide me a little bit of guidance that I know what to do in examples like this going forward”.

Mingo Sanchez:
So it really is a very iterative approach that gets better and better the more data you provide and the more training that you provide to the system. Now, after we’ve gone through that process of creating those accurate customer views. Again, we get to this stage where we’re able to see all these records that should belong together that we previously weren’t able to accurately group. Now, that’s great being able to identify that these records are all the same as one another, but oftentimes that step of grouping together these records is a necessary step, but not the final step.

Scott DeMers:
Quick comment. This is a key secret sauce item of Tamr. We are not using this interface to create rules in an old school way. We are actually using this interface to suck the subject matter experts knowledge out of their head and use it to build a machine learning model that can be run at machine speed on your data. That means across massive numbers of data sources and across massive amounts of data. Imagine the ability to say, “I can look at individual records and compare them the way a human would” and say, “Well, clearly those are two of the same things. We have a typo”. That’s something that humans do well, but machines don’t do a great job of. But imagine being able to do that at machine speed, that is a game changing technology.

Mingo Sanchez:
Yeah. Great point, Scott. Thanks for adding that. All right. So once we’ve grouped together these records and really had that machine learn the same logic that people would apply as Scott was mentioning, we’re able to get these much more accurate views and group together these records. And again, that’s an important step, but wouldn’t it be great if we could take these 82 different representations of SAIC and create that single canonical view to use for those downstream purposes. So that if you’re entering accounts into a system like Salesforce, or if you’re creating those analytics dashboards, you don’t have to worry about all those duplicate entries that might show up.

Mingo Sanchez:
Well, that’s where creating golden record can be really valuable. And that’s something that Tamr is designed to do really well. So again, turning to that SAIC example, rather than sticking to 82 separate records and having to manage those individually, we’re able to take all of those different constituent records and collapse them into a single master record that you can use for your downstream purposes. So as those underlying data are changing, as you’re bringing on additional sources that might have better information for certain fields, you have the full power to find how you want to create that master record to use for those downstream purposes. So in this way, you’re able to get that much more accurate view so that you’re able to treat those customers appropriately.

Mingo Sanchez:
Now that’s the under the hood how we’re able to clean up the data. Let’s walk through an example of how that can make our analytics downstream much more effective. So you’ll notice here that we have very similar dashboards, the one that we saw before. But notice here at the top that these numbers are different. Not only do we have fewer customers than we thought we did before, probably because of a lot of those duplicates that we were walking through previously, we also have a higher average lifetime value per customer because we’re able to bring together records that we previously weren’t able to. And perhaps most importantly, we have way more high value customers than we previously thought. That’s obviously going to change our business strategy because whereas before we thought that most of our customers were in the South and the rust belt along with California. Now we can see that we have some pretty high value customers all around the US.

Mingo Sanchez:
So we’re going to need to rethink how we execute those marketing efforts to target customers appropriately and find similar new customers who might have similar footprints to the ones that we’re already serving today. And once again, if we go to our Science Applications International Corp example, you can see that if a much more comprehensive view of that customer, many more records that we’re able to group together and a higher lifetime spend overall. And perhaps most importantly, whereas before we had that really erratic view and that was of concern that made us think that we might have a data quality problem. Here we can see that now when we grouped together all of these records, we’re able to have a much more accurate lifetime view of this customer. And we can see that they’re starting to spend less and less with us as time goes on. So hopefully now that we’ve accurately assessed the value of this customer, we’re going to be able to treat them as appropriately as they should be treated.

Scott DeMers:
I’m just as concerned Mingo, I’m just concerned over different issues now.

Mingo Sanchez:
And that’s a great point. We’ve walked through this example with companies that we’re doing business with and wanting to make sure that they are treated appropriately and don’t churn. But as we mentioned earlier, this is just one problem in the customer mastering space. Can you imagine how much we’re spending on marketing to SCIC in multiple places? IF we can just consolidate those views so that we’re able to assign one or two key reps to manage that accounts appropriately, we’re going to be able to save a lot of money on our end and not sending out ticket materials and executing multiple campaigns in order try to win their business and retain their business. Now, this example that we’ve walked through deals with companies, but just as commonly, we’re dealing with companies who serve individuals or people. And a lot of those problems that you see on the company front are also applicable there.

Mingo Sanchez:
And perhaps most importantly, when you’re dealing with people, you often have much stricter regulatory concerns that you need to comply with. So to use a really common example, let’s say that you’re in the healthcare or insurance industry and you have really sensitive information that you’re dealing with. You want to ensure that when you’re showing people records, not only are you showing them all of their records and not leaving anything out, you also want to make sure that you’re not showing them someone else’s records. Because if you do that, you’re going to end up with a huge fine.

Mingo Sanchez:
Just speaking anecdotally from an account that I worked on, we had this customer who previously was using a rules-based system and they were under the impression that by having these rules in place that were going to be extra careful and avoid any of those fines. Well lo and behold, one of those rules said, if you have the same social security number for two different records, that you should group those records together. And as you might imagine, when you have someone switching around the digits in a social security number accidentally, that’s going to end up creating a lot of problems on the backend where you’re accidentally grouping together these records that you thought should be grouped together on this canonical key. But actually because of data quality issues they shouldn’t ever be put together, and a person would be able to tell that even if a machine can’t, using those rules.

Scott DeMers:
So you’ll also notice that Alutiiq is actually our top customer. Was Alutiiq on the before dashboard? It was not. And so we may have been ignoring our literal best customer. And so that’s another example of where churning your top customers can be a problem or your resource allocation is incorrect. Can you imagine if you were the CRO of this organization and that before dashboard showing that the South and the steel/rust belt was where you should be focusing your efforts, and then you come to find out that this is actually your geographic distribution? How do you go back and say, “You know where I’ve been hiring all those sales reps? Turns out it’s actually a more broad of a problem”. And so again, this is showing just some very simple examples of how resource allocation with an incorrect customer master can result in a ton of wasted money.

Scott DeMers:
So to close out the presentation, we’ve shown you why keeping accurate and up-to-date views of your customer is important. And why customer mastering can help you reduce churn, ensure regulatory compliance, and increase your sales and marketing effectiveness. It’s important to note that these challenges exist, whether you’re working with businesses or working with customers and making sure that you manage those relationships appropriately will help you avoid churn. In addition, the regulatory compliance side is more than just national laws that you have to follow. It also includes things down to the individual level, especially if you’re in say the pharmaceutical or insurance space where you don’t want to just limit information to the correct people, you also want to make sure that the incorrect information is not shared in appropriately.

Scott DeMers:
And finally making sure that your resources are applied appropriately to capture and retain your customers is really a critical result that comes from accurate customer mastering. This cannot be achieved with rules-based engines at scale. That’s just a fact. It hasn’t been able to work for the past 30 years. And so this machine learning approach where we, again, suck the subject matter experts knowledge out of their heads and run it at machine speed using an AI and machine learning model is something that’s really and truly special. And if you want to see that on your data, schedule a demo Tamr.com.

Mingo Sanchez:
Thank you everyone and enjoy the rest of the summit.

Scott DeMers:
Thank you. Have a great day. (silence)