datamaster summit 2020

Leading with Data Mastering: Getting the Most from Your Cloud Migration

 

Evren Eryurek, Anthony Deighton & Mike Meriton

Evren Eryurek, Director of Product Management, Google Cloud
Anthony Deighton, Chief Product Officer, Tamr
Mike Meriton, Moderator, Co-Founder & COO, EDM Council

Successful cloud migrations improve the quality of your data as well as move it to a modern cloud infrastructure. Leading migrations with data mastering generates clean, curated and enriched records that set your organization up for analytic success.

Watch this webinar to understand how to leverage data mastering when moving to the cloud, you’ll learn:
– How cloud-native data mastering delivers higher quality, enriched data as a key part of a cloud migration
– Why organizations are using machine learning to handle the heavy lifting around mastering data
– How improving data quality helps align cloud migrations with business goals

Transcript

Carol:

I’d like to welcome Mike Meriton, co-founder and COO for the EDM Council. Mike joined the EDM Council back in 2015 to lead industry engagement, having been the co-founder and the first chairman of the EDM Council. Mike, over to you.

Mike Meriton:

Thanks Carol and welcome everyone. So why are we here today? It’s no doubt that virtually every company is in some stage of implementing cloud for their organization. In fact, the EDM Council last year started the Global Data Management Capabilities work group and upwards today of 70 companies and 200 participants have joined this endeavor to come up with a common framework for accelerating cloud adoption and examining the key controls. So this clearly has a lot of industry attention.

Mike Meriton:

In fact, within the CDMC framework, the centerpiece is all around focusing on your business strategy and aligning your cloud objectives to achieving those objectives. Moving further, migration to clean and curate data to address business needs will generate higher quality customer data for better business decisions and will increase sales, improve ROI for cloud migration strategies. So addressing data quality helps businesses overcome the data challenges that can slow down the cloud adoption objectives. Issues like siloed data, duplicate data, inaccurate or incomplete records will compromise your objectives in moving to the cloud.

Mike Meriton:

So this webinar is all about leading with data mastering and getting the most from your cloud migration. So with no further ado, let me introduce today’s speakers. First let me introduce Evren Eryurek. Evren is the leader of data analytics and data management portfolio of Google Cloud covering streaming analytics, data flow, beam, messaging, data governance, data catalog, and discovery and data marketplace, and is the director of product management.

Mike Meriton:

Evren joined Google Cloud as the technical director in the CTO office of Google Cloud and has led the efforts towards industrial enterprise solutions. In fact, Evren joined Google as the first external member to take a leadership role as a technical director within the CTO office of Google Cloud. Prior to joining Google, Evren was the senior vice president and software chief technology officer for GE Healthcare. He’s also a graduate of the University of Tennessee and holds a master and doctorate degree in nuclear engineering. Evren holds over 60 US patents. So welcome Evren. Good to have you on this webinar for today.

Evren Eryurek:

Thank you. Great to be here.

Mike Meriton:

Awesome. Let me introduce also you’re a fellow panelist and speaker Anthony Deighton. So Anthony is the Chief Product Officer at Tamr and he oversees the product solutions for Tamr’s growing data mastering solutions. Anthony was most recently the CMO at Celonis and senior vice president of products at Qlik. And has over 20 years of experience building and scaling enterprise software companies. Anthony helped found the employee relationship management business unit at Siebel Systems and grew it to over 300 customers and 20 million in license revenue. So, welcome Anthony. Good to have you here as well.

Anthony Deighton:

Pleasure, great to be here.

Mike Meriton:

Awesome. So gentlemen, the way we’ve organized today’s discussion is in four major sections. And what we’re going to do is invite the audience as we talk about these sections to add in their questions. So we can make this as engaging as dynamic as possible. So as Carol mentioned, as you’re thinking of a question, it’s best you type it into the question box along the way, which we’ll find on the panel for this webinar. So let’s hit our first topic. The first section is aligning your cloud strategy with your business and digital strategy. So what is this all about? So successful migrations help the business solve problems by better leveraging data. It’s not new data, it’s better data.

Mike Meriton:

And as organizations migrate to the cloud and use cloud-based tools and services they’ll need clean curated data for discovering insights and capabilities that the organizations didn’t previously have. So here’s my first question. I’m going to put this over to Evren, which is Evren how does aligning your cloud and business strategy help improve your data quality?

Evren Eryurek:

Well, it’s a good way to start the day, but I want to start with this. What happens when you don’t align your strategy so poorly.

Mike Meriton:

Exactly.

Evren Eryurek:

That’s the crack sword because we all know we’ve been talking about data being the next whatever the currency that we want to use, but the key is what does data do to our businesses? If you don’t have it aligned, you’ll have misguided decision-making, which is detrimental to entire business units. If you’re in marketing, you will have ineffective marketing, you will be targeting the wrong people and you won’t have the right forecasting in your businesses, whatever the outcome is you’re trying to drive, or you don’t even have the right side of insight into your competitors if you don’t have the correct data in your business, and it’s done aligned with your strategy. Not to mention revenues, sales, all kinds of extra costs that probably you’ll be putting together.

Evren Eryurek:

These are the bullets that I can think of right away, but then what do we do to align across the business is key. Achieving alignments will mean for business leaders to gain understanding and on their top challenges for their company. Because understanding that with a good strategy, with good data strategy will be first and foremost to help solve those. It is level of clarity that they need and probably to better business outcomes. Of course there are some nice side effects of the alignments. We will see it across the leadership, new data strategy, not an overnight that is key. It won’t happen overnight, or it’s not free either. There are a lot of folks that I talk to, they go, “Oh, how can we make this thing happen fast? And without really putting the efforts.”

Evren Eryurek:

It’s the process, the people, the tools. Everybody has to buy into this thing. This is not easy, it’s complex, but wonderful outcomes and very rich return on investment to the companies.

Mike Meriton:

Thanks Evren. Anthony, any comments on that or would you like me to move to the next question that’s related?

Anthony Deighton:

So I mean, I agree with Evren. I think there’s a tremendous amount of value to be unlocked. So I would sort of underscore that point that he makes that there’s … When you align these strategies, the business value is quite large.

Mike Meriton:

Awesome. So Anthony, a follow up question for you. Companies have a substantial amount of data, both within their legacy applications and the universe of opportunity of data sources that are public or other data sources in the open market. And you hear obviously terms like petabytes, exabytes, ultimately beyond that is brontobytes and beyond. So there’s just an exorbitant amount of data. So what data would help companies best align their digital strategy and their business strategy? Should they find certain type of data? What’s your thoughts?

Anthony Deighton:

So you talk about the volume of data, brontobytes I agree with a lot of that. And also the variety of different kinds of data. I think there’s also different velocities of data streaming or it’s coming in batch and those points that Evren kind of a real leader on. But I’m going to take us out in a controversial point of view here, which is my view is that the big data problem is solved. And that’s a very controversial thing to say. Because I think a lot of organizations may not feel that way. But let me tell you that from a purely technical perspective, that problem is a solved problem. And the answer is put your data in the cloud. Preferably partner with Google and do it there. They have the great infrastructure for doing that.

Anthony Deighton:

You don’t need to sort of invent this from first principles. There are perfect solutions out there available to you, pay as you go. And that problem is solved. And then I’ll also add if you … In the introduction, you mentioned that I’d worked at Qlik, I spent a lot of time in my career working on the front end of the analytics challenge. How do we get dashboards and analysis in front of users? And again, I think that problem is solved. If you want to create pretty dashboards. I mean I have a bias, you should use Qlik of course, but you could use Looker, that’s a great product as well. There’s a bunch of other ones in the market, but the point is that that problem is solved. Like you can create that.

Anthony Deighton:

So we’ve solved the two sort of systemic problems in about how do I create good analysis, align that and achieve my business objectives. How do I solve the big data problem, put it all up in Google. That leaves, I think one of the more interesting problems, which is the behind every dashboard is crappy data. We’re not actually able to achieve these business objectives because we’ve solved the big data problem. And now we’re just analyzing enormous amounts of data with these incredible front end tools. And we can’t trust the answers because the data is junk. And that really is the central problem of aligning your business strategy with your cloud migration strategy is really a question of is the data behind that dashboard something that you can trust?

Evren Eryurek:

Spot on. If I may.

Mike Meriton:

Yeah, go ahead.

Evren Eryurek:

It is spot on, it is don’t be afraid of the size of the data. Are we able to get the insights from the data? Are we asking the right high value questions and the data works for you? That is key.

Mike Meriton:

I think Anthony, the point you’re getting to is you can sort of look at cloud is unlimited compute power, and you can look at all the analytic dashboards. There’s many choices in the marketplace, but we still have many companies struggling with this challenge, which is how do you bring the right data, have trust in that data and make sure it’s fit for purpose. And I think that’s the point you’re driving at. Right?

Anthony Deighton:

Exactly. And I think that there’s people think … Sort of prevailing wisdom is that what the cloud brings to the party is infinite storage and that’s true, but I think that’s only half the story and frankly the least interesting half of the story. The more interesting is what you mentioned Mike, is that it brings effectively infinite compute to the story. So all of a sudden this idea of making meaning of a large volume of data is actually achievable. Pairing that with your choice of visualization tools is how you bridge that gap between what’s my business challenge and to Evren’s point, what’s the insight I need to drive that. But again, at the core of that is often silos of data that are not connected, that don’t have meaning from a user perspective.

Anthony Deighton:

And then when I then do the analysis, doesn’t answer the question I had. So how many people on the call, how about the challenge where you build some beautiful analysis only to find out that division A’s data is not included. Got to go back to the drawing board, figure out how to loop in division A’s data because division a wasn’t included in the original source or that you answer the question and then realize you want some detailed data that’s not included, off the auditor to find that detailed data and figure out how to join it into my sale. It’s great, I understand my customer’s sales history, but I don’t know what products they bought. I’ve got to go back and figure out the product. These are the core challenges that ultimately are blocking getting business value.

Mike Meriton:

And I know we’re going to be peeling back the [Sanyan 00:13:27] with more details about how and where to attack this problem. What’s the right approach because I think every company in the world has space with this. Evren you’ve had a unique opportunity which is this cloud data management work group has been put together. And it includes multiple cloud companies, many different companies in large vertical markets, all working on sort of a common set of requirements. How’s that helping and what’s your take about that project? We’re hopeful to have actually an output of a usable framework for companies to download as a free license, right around the end of Q2. I welcome your quick take since you had yourself and Google involved with the project.

Evren Eryurek:

I’m glad you asked that. I’m excited about it because I equate to some of the things that I have done and seen it be really impactful in early part of my career when I was with Emerson Process, which had no really centered around communications, any protocols, control procedures, how to deal and handle the data and so forth somatically across the industry. What we are doing there is really putting sort of the best practices, guardrails, and how tos for not just regulated industries, but for anyone to take advantage of it.

Mike Meriton:

Exactly.

Evren Eryurek:

And it gives us an opportunity as the cloud technology providers, we work with all our peers from the competition in there together because we all want to solve the same problem. And we do it with the leading companies. I mean you’re leading that organization. It’s such a great place to really understand the problem in hand that they have, and they’re willing to put it on the table for us and with us where we are framing it up. And that gives us an opportunity to shape our short-term long-term strategies around data management, to really better serve their needs. In the end, we’re trying to really solve their problem. And what better way to do it collectively.

Evren Eryurek:

This is sort of agile product development at a very massive scale. There’s multiple companies with multiple industry-leading partners. So I’m really thrilled about it. So we’re very excited to be part of it.

Mike Meriton:

Thanks Evren.

Evren Eryurek:

More to come for sure.

Mike Meriton:

Exactly. We’ve been thrilled the size of the … I mean, a good indicator and I think Anthony, and you would say, “Hey, all these companies wouldn’t be spending this time if everyone had fully figured it out.” And certainly every individual company facing off with every cloud offering in the marketplace is a highly inefficient way of doing it. So it creates sort of a rational approach to speed up knowing what the capability requirements are. And then every company can make a decision on what innovation and what stack they use, but at least there’s no less debate about what are the requirements. So if anyone in the audience wants to learn more, we’ll cover that towards the end of this call, but let’s go to the next major section of questions, which is why is the ideal cloud migration leading with this concept of data mastery?

Mike Meriton:

And what does that really mean? Let me give you a little bit of context and we’ll go right into some of the questions. So migration is a perfect opportunity to clean, improve, enrich your data. In other words, you can move and improve. So better data leads to better insights and new capabilities. So if you’re not mastering your data, you’re really losing an opportunity or maybe kicking the can down the road onto the cloud platform. So this is really a chance to do it better. So let’s start out with the first major question, which is Evren for you, why should improving data quality be part of your cloud migration strategy?

Evren Eryurek:

All right. So I think we sort of alluded already this morning that in order for data to live up to its true potential, we really have to think about improved data quality as we’re moving to cloud, because if you’re going to put a bunch of data that is not really understood in its value, understood what it does, where it is and so forth and we haven’t validated, verified. We haven’t gone through what we call as a data quality best practices. Then we’re going to end up having the same problem that we have in our existing on-prem trolls of lots of data disconnected that is leading to inaccurate insights.

Evren Eryurek:

So we want to have a strategy, first of all, let’s discover all your data is. So that we can actually understand what is in the data. So that is a really important part of the discovery of the data so that we can align that to the goal of the company that you’re in, whether that goal is a short term or a five-year long or 10 year long term, it doesn’t matter, but you have some business outcomes that you want to achieve.

Evren Eryurek:

In the end but frankly, you want to get to a place where you provide real-time insights to your customers, to your companies, to your business units. That’s the ultimate goal. So if we can figure out a way to align our objectives with the data that we have in hand and start this cloud migration with data quality in mind, and what we want to get out of the data in mind, will lend itself to a faster return on investment for all of you. The key is really understanding what our objectives are long-term. Get it in there and there are tools today. There are many tools in all of our hands that we can really make this thing faster, easier, and actually delightful experience for you, for all your users, your data analysts, your business analysts, and what have you.

Evren Eryurek:

But objective of this exercise is to identify the gaps in the current data strategy and map to the desired state, because you already know everything that you have in hand, you’re able to actually map this. There are tools out there to help you with that too. This will inform the changes that need to be made as you’re moving towards your cloud with your data strategy. The strategy around data should have the power and the breadth of the insights to help drive business goals forward. That is key. And remember, we’re not going to just be looking at dashboards of data that happens sometime in the past.

Evren Eryurek:

You want to get to a place it’s real in time. And you’re trying to get to the insights as the events are happening because the world’s data sphere is moving towards in a very fast manner real-time in nature. And that’s why I believe it’s very important to have the data quality strategy as part of your cloud migration strategy. Like Anthony said, we all are the sole, the storage of the data. We have the processing capabilities and understanding by putting the strategy in place first and foremost will really accelerate your migration journey

Mike Meriton:

Excellent Evren. Anthony, do you want to add to anything that Evren just covered and before I move onto the next question for you?

Anthony Deighton:

No, no, no. I think that was perfect. I have no disagreement.

Mike Meriton:

Anthony, there’s been a sort of a mantra of lift and shift. That phrase has been in the marketplace for many years and it has a mixed reaction to it, but sometimes that’s been sort of the default approach to migrations. Is this something that an organization should follow as a mantra? Or is it something, is it a beware of this mantra? It might get you in a murky water.

Anthony Deighton:

So again, I’ll maybe take a little bit of a controversial point of view here. Maybe I’ll share this point of view through an analogy and a very personal one I might add. I once had the … I had to move. I was living in an apartment in Chicago, actually in that case, I was moving to San Francisco and through a bunch of sort of weird reasons I wasn’t paying for the mover, the company I was moving to was paying for the mover. I also couldn’t be there in Chicago when the movers came. So I basically sent the movers to my apartment in Chicago. I said, pack the apartment, deliver it to San Francisco. I’ll meet you in San Francisco. What occurred, so then I arrive in San Francisco and the movers unload all these boxes, which I of course did not pack. And I started unpacking. And what I discovered is that they had done things like pack the trash trashcan with the trash in it-

Evren Eryurek:

Exactly.

Anthony Deighton:

They emptied my apartment in Chicago and just moved everything in that case to San Francisco and then I was left with ha, all right. I do think the full trash can was necessary to move, but fair enough, I guess enough. So I threw the trash out in San Francisco. And the point is that is a very common strategy that you see in cloud migration, which is, we’ll always take the junk we have here and move it up into the cloud. That’s not a great strategy. The better strategy for me would have been, especially if I had to pay for the move, which was done by weight, would have been to go through my stuff in my apartment in Chicago and ask the question, what of this do I really need when I moved to San Francisco. By the way, we have dumb things like moving your full trashcan.

Anthony Deighton:

I mean that I think I could have figured out, or you would have thought the movers could have, but there were also less obvious examples. Like they moved old books that I had, that I probably wouldn’t have moved. They’re very expensive to them because they’re heavy and I probably didn’t need or want them. And I would have been better off donating them in Chicago than the moving them to San Francisco. And again, we see that in cloud migration, as people take stock, they move it up to the cloud where they don’t even actually know. And then to make … That’s sort of like level two and level three is if I looked carefully at the stuff I had in Chicago, the climate in Chicago is very different than San Francisco. I might’ve actually made some pretty optimal decisions like do I want my coat? Do I need a heavy winter coat in San Francisco and we all know.

Anthony Deighton:

We could have … So if I aligned my business objectives or in this case my living objectives, I suppose, in San Francisco, I might’ve made different decisions about what. So as you move data to the cloud, if you know your business strategy, what you’re trying to achieve, and you may make very different decisions about what data you migrate to the cloud. And so-

Evren Eryurek:

I love that analogy. Oh my gosh, that was … It brought [crosstalk 00:24:51].

Anthony Deighton:

It also happens to be a very true story.

Evren Eryurek:

I got to tell you my side of the story because I moved my family so many times while at GE. And yes we did in one of the first ones, we did have several trash full of cans moved, but then the chief data officer in my family, my wife figured this out. She was on top of it. She was always sorting things out, what to move, what not to move and so forth. And I had all kinds of crap that I love keeping. And they were looking at it at once. Like my patents, they came all in this wood frames and stuff, they were huge, huge, and not reading anywhere, not even seeing a day of light. After I finished that move, I asked where are my patents? Oh, we left them behind in the first move. I didn’t even know that it wasn’t with us, since then we had moved two more times. So leaving behind things is okay. Just have the right data officers in charge who understands why we’re doing what we’re doing.

Mike Meriton:

So we’ll be sure to update the bio to say that Evren has 60 patents, but they’re in different houses in different locations.

Evren Eryurek:

Exactly. Somebody has access to all my class.

Mike Meriton:

New home owner now is enjoying them up on the wall. So that’s great. Well, so a follow-up, which is I’m almost hearing Anthony and Evren an overarching theme that says, if you take your time to align your migration to your business objective, it then should set the lens on what quality objectives you want to achieve. And that should be built in to your plans before you actually commence your migration.

Evren Eryurek:

Exactly.

Mike Meriton:

Is that the-

Evren Eryurek:

Exactly.

Anthony Deighton:

I think that’s a-

Evren Eryurek:

A very good summary.

Mike Meriton:

And now the there’s a bit of a corollary question, which is, this is a chance to do it right. And to clean sheet that environment as compared to the many legacy environments companies have dealt with. So what if we’ve already moved to the cloud, is it still possible to keep that orientation of keeping your data in good quality shape? Would either of you like to jump on that because-

Evren Eryurek:

Absolutely, I’ll jump on it right away. A lot of folks have done it. So I don’t want people to think, darn I moved my data already and I missed the opportunity. Absolutely not. These tools are there to serve us, they’re there. We learned a lot in the past five, 10 years as we dealt with the data in different places, as we start seeing more and more data migrations to cloud, we are building our own best practices. This is why we’re working with the NDC to really establish some of these guidelines, if you will, best practices to share with folks and knowing if your data is already in cloud, congratulations. Awesome.

Evren Eryurek:

Now let’s figure out what that data strategy, what the tools that we need to use to really cleanse it. There are a lot of digital officers that I deal with day in day out, that they come, “Hey, look, I need to figure out a way to bring insight from the data. I don’t even know where my data is, what I have in my data. And all my data is either in this cloud or in a hybrid mode.” We can begin that right away. And there’s a lot of tools that we can accelerate the work that they have to do. It’s never liked.

Anthony Deighton:

If I dive, I would go further and say that the cloud becomes the perfect sort of mechanism for achieving or resolving the question of what’s high quality data, et cetera. And we talked about this before, but the cloud gives you large amounts of storage. Yes, it also gives you large amounts of compute. So it also gives you the opportunities that are processed through that data. Look at the data. We can talk a little bit about sort of how to do that in a bit, but look at that data and then figure out what of that data is actually valuable. So maybe to overextend my analogy, it’s a little bit like combining a cleaning service and a moving company in one. So as you make the move, you also clean the house. Maybe that’s a big business opportunity. Someone on the call can take that and pay me a royalty.

Mike Meriton:

I’m also hearing Anthony and Evren, it’s then keep your cleaning service employed so that you’re continuously monitoring the data that’s in the cloud, any new additions so that this idea of continuous data quality monitoring improvement is part of your fundamental strategy to do it right.

Anthony Deighton:

There’s also this, I think that we often falsely believe that taking advantage of the cloud is a one-time move. So it’s like, okay, I’m going to move my data to the cloud. No, no, no more likely what’s happening is you’re making acquisitions, divestitures, you’re starting new product lines. You’re onboarding your operational systems, et cetera. More data is showing up all the time and that data needs a home.

Evren Eryurek:

And also the habits don’t die fast. Even if you’re on cloud, you will continue to operate the way you did past 30 years. That might actually lend itself in kind of putting some really not necessary stuff in places and not really figuring out how you should label your data and how you should treat your data. Some of that happens. That’s why it should be a continuous process as part of your data strategy.

Mike Meriton:

So gentlemen, this raises and brings us into the third major topic, which is how do you really improve data quality? And some would argue with the amount of data that’s involved on prem and your plans for what you want to carry and operate in the cloud, which could be an even larger set of data resources, that machine learning would be an essential way to improve data quality. And that manual rules-based approaches may not scale in this new operating environment. So let me move on to the set of questions. So Evren, how can the cloud companies help data quality problems? I mean, in other words what’s the set of tools and processes that are going to be available from cloud companies to help attack these issues and opportunities?

Evren Eryurek:

Well, look, it’s one of the reasons why I’m always very excited to work with Anthony and the team at Tamr does a very nice job in here. And you will remember it, it was one of my very first questions when we started talking about it. You know this rule the systems won’t scale. What are we doing to bring in the smartness into the processes that we’re dealing with? Now, the beauty is there are a lot of great capabilities. Now we have on the clock platform. Anthony mentioned it, we can deal with massive sets data with very powerful compute engines underneath it. And you can actually enable your users to set up data quality tests with almost no efforts that gives you the framework, those tools that actually detect, learn anomalies, these soft learning algorithms. We really improved.

Evren Eryurek:

Going back to my school days, the neural networks, AI’s machine learning is not what it was 30 years ago. It’s wonderfully capable today and very simple and nicely integrated into tools that we’re using. And more importantly, you have the transparency within the cloud-based solutions and it brings visibility at all levels to all your stakeholders. From your executives, to team winners, to your data quality owners, whatever. You have all these as sort of a toolbox. So the richness of those capabilities, if I were a data engineer, this is the era to be a good data engineer because so much available for us to do our jobs better. So I love really how we do things with Tamr and how they help as customers because they bring very smart capabilities to what we already have on GCP.

Mike Meriton:

So Anthony let me carry this topic a little bit further, which is everyone’s heard of MDM or Master Data Management solutions. They’ve been around for some time. And conceptually this idea of I do all this work with my data for certain core data domains, like customer data or product data. And then people can use that as an authoritative source and pull it into whatever application they’re looking to run. So does that really work and do we need sort of a different approach to master data management in light of both the opportunities and issues of cloud migrations?

Anthony Deighton:

Sure. So traditional MDM was designed in a time before the cloud. It was designed in a time where most data was on premise. There wasn’t a large amount of it generally speaking and compute resources were relatively scarce. And so a rules-based approach 10, 15 years ago when these systems were invented, made sense. The opportunity and I would frame it as an opportunity, the opportunity that the cloud provides those as we kind of combine what we were talking about a few minutes ago, infinite amounts of data with infinite amounts of compute means we have the opportunity to take a different approach. And that approach is a machine learning based approach.

Anthony Deighton:

So rather than … So how does traditional MDM work? Well, it works through it’s rules-based metaphor. So each time I add a new source or for that matter add a new column into a given source, I need to sort of literally hand code rules that define how these sources come together. I get scales by the number of people. You have building rules, which is not very many, it’s incredibly time-consuming difficult, it’s fraught with errors. And it also is what I call brittle, which means that if anything in the data changes or the schema changes or anything changes, it doesn’t break a little, it breaks entirely. And then you’re back to start from scratch again. So what’s powerful about a machine learning based approach is that it scales to an infinite amount of data.

Anthony Deighton:

It takes advantage of this large amount of compute and it also scales gracefully. So as data changes the model adapts and learns for that new data. So sort of there are a couple of key, I think, components of what a machine learning based data mastering solution ought to do. The first is schema mapping. So are these two sources of data or two or more, are these 10 sources, are they talking about the same data? Are these columns aligned? The second is record matching. So within that now new Corpus of data, are there duplicates. So I have two divisions, they all serve the same customer. I should be redressing that customer as the same customer.

Anthony Deighton:

Then there’s record mastering. So now I have these clusters of … In this case, in my example of cluster of customers that are the same customer, how do I create a golden record, a single view of that customer? 360 view of that customer. And then the last but not least is categorizations. So being able to take a record in a data set and say what’s the meaning of this? Is the meaning of this record in this data set the same as the meaning of this record in this other data set?

Anthony Deighton:

So if you’re looking at, for example a part that you’re purchasing saying is this screw that’s defined here the same screw that we define over there, that categorization problem. And when you do that, you can train a model to do those things or you can train multiple models to do each of those things. Then you effectively can turn over this hard work of creating high quality data to the computer. And then last thing I’ll say here is, I wish I could say that this is something that is completely unique to Tamr, but I think it’s also fair to say that we see the machine learning destruction occurring in many, many different industries. So this idea that I would own a car and drive it, that’s the rules-based approach to automobiles. The self-driving car, the auto-driving car is the machine learning based approach to driving.

Anthony Deighton:

So we’re seeing machine learning, disrupting all kinds of industries, this payments focused on I think one of the hardest problems in the enterprise, which is getting clean view of your data across your customer and supplier as parts products, et cetera. Now that’s a really hard problem but it’s not the only place we’re going to see machine learning changing the way we work.

Mike Meriton:

So, Anthony, a quick follow up, back to the root question. Does machine learning eliminate master data management or augment it? And then the second question that everyone always usually thinks about and this autonomous future, does this eliminate people engagement or change that as well?

Anthony Deighton:

Sure.

Mike Meriton:

I mean, do you mind just commenting on those topics and Evren it’d be good to hear your take on those two topics as well?

Anthony Deighton:

So I think that the machine learning based approach came as approach to data mastering is a next generation approach to master data management, which is to say that it solves the same problem in a dramatically lower cost, higher value, higher fidelity way that MDM did. And then the second part of your question was-

Mike Meriton:

People.

Anthony Deighton:

So the downsides to a machine based approach is how do you train the machine? And that’s where people come in. So it turns out people are really excellent at for example, looking at two customers and saying, yep, these are the same, or no, these are different. And so at its core, the approach Tamr takes to training its machine learning is to use human feedback to that machine learning. And again, that’s not a new idea. That’s at the core of the machine learning based approach, which is you need training in order to guide the machine. Machine won’t learn without that feedback. And so that’s the approach we’ve taken. So what’s important about that, however, is go back to where I started, which is in the rules-based approach, the human has to intervene in every case.

Anthony Deighton:

It’s like a artisan approach by hand artisan approach to data mastery. With the machine, the human intervenes only in the cases where the machine is having trouble. In fact the core technology for Tamr came out of research at MIT. The actual research was in how do we find the smallest number of questions we can engage a human with to get the highest uplift on the machine learning. That’s the PhD thesis that provided the foundation for the Tamr core technology. And it’s a really good question because it allows you to focus human intervention in the place where it’s most valuable, where the biggest impact on the quality of the data and leave the machine to do the basic blocking and tackling.

Mike Meriton:

Cool.

Evren Eryurek:

Well, I couldn’t agree more. Look, there is a lot to do. I know there’s a lot more for us humans to do. So if a machine can do it, let it do it because there is much more for us to do to make it better performance. What have you, the smartness will still come from us. I don’t want folks to think that if it is machine learning, there is no job left for me, none of that. Actually you have much more to do in today’s world because machines are able to do a lot more crunching than we could do, but they can only do that. So our roles are really, really crucial as you interact with these machine learning algorithms. And it’s great to see where we’re headed, but what we bring to the equation is priceless it seems.

Mike Meriton:

Awesome. So gentlemen let’s move to that fourth section, which is what to look for in a data mastering solution. So should this be cloud native? And what I’m hearing is it sounds like part of what you should be thinking about is machine learning as being part of that solution set and keeping humans involved. So let me start Evren with you. What should companies look for in a solution that helps improve data quality?

Evren Eryurek:

Well, first I think our book published today and I got the sneaky version of it. They sent it to me over the weekend. We spent quite a bit of time discussing the guidelines around data governance and what it means. And data quality is an essential part of that. And we have a full chapter on it. If we are talking about as the end game, a curated data that serves the business needs, the customer needs that we have, achieving this seamlessly is the key. And there are things that we need to be thinking and doing and sort of at the data integration stuff, whether your data is coming in streaming batch, it doesn’t matter, but that’s that integration layer. You need to start thinking about the data profiling as part of your integration and as part of your pipeline design, which there are tools in here that you’re seeing unique to be thinking about data quality. And you have to have the integrity enforcement of your data and governance aspects of your data in the beginning.

Evren Eryurek:

This is why they spent the time to really share our best practices, what we have learned and so forth. I come from a very highly regulated industry of healthcare, and I think they mastered it there and there’s more the rest of the industries can do. And then the fourth one that I want to highlight is this, everyone talks about it. Well, data lineage is really important. Well end to end data lineage with the data pipelines that you’re building is the essential aspects of it.

Evren Eryurek:

Otherwise, you’re going to end up having a very narrow window within, you may have some lineage understanding, but you want to have as part of your entrance strategy. Now on Google Cloud Smart Analytics platform you’re seeing in here, data fusion, data flow, big query, data catalog, or data loss prevention which is your machine learning based system that automatically takes anomalies and trucks and so forth with our data plex. We are building these and Tamr plays a significant role in terms of us providing these solutions through our users, easy to use tools, seamlessly integrated capabilities. Really would love to hear from all of you. You can find us, what you’re trying to achieve and how together with my partners in here, we can help you solve your needs because your needs are not unique.

Evren Eryurek:

They’re really sometimes very common to the nursery that you’re in. Somebody may have solved that problem already. We have the patents, we have the lessons learned and best practices that we can share with all of you. And we can try to make this journey as easy and as painless as possible for you.

Mike Meriton:

Awesome. So Evren we’ve had some good audience questions come in and we definitely want to get them out on the table. Let me start with the first one. It’s from a JT that goes how do you see regulated industries being able to overcome natural resistance, insensitive data issues to moving from a predictable and reproducible rules-based quality to instead having trust in a probabilistic model? Which of you would like to jump in on that topic?

Evren Eryurek:

I’d let Anthony to go first, that’s his world and I’ll share my experience too.

Anthony Deighton:

Happy to. So first of all, I think it’s a great question. And I’m not sure that I would say that regulated industry. I think the challenge is one that’s broader than just regulated industries. When you move from a predictable rules based approach to a probabilistic approach, that’s sort of a challenge for everyone. And the first answer to your question is simply the quality of the results. What we find is that in these rules-based systems, they simply don’t scale. So you simply can’t maybe one or two sources, but when you have 10, 20, 30, 40 sources and it’s running on the cloud, it falls apart. So it becomes sort of possible versus impossible. But I think the root of the question of what JT is asking is, well, in a rules-based system, I can sort of backtrack and say, well, how did I get to this result?

Anthony Deighton:

Is that true in a probabilistic system? And the answer is it is. So we can still take the results of the model and explain why it is that the model … Why did the machine make the choice that it made, but actually the real sort of intervention point is having the human validate the results. So a common thing for especially in regulated industries, although I would say this kind of occurs across many is for the machine to process enormous amounts of data, hundreds of sources come up with its view of how to bring this stuff together, but then a human goes and checks and says, “Yep, I agree. Nope, disagree.”

Anthony Deighton:

And the key there is that the feedback into the model. So not only do you get a reproducible answer to be able to say, yes, I validate this resolved but you actually use that as training into the model itself. We see this as quite a few of Tamr’s customers SocGen as an example are in for example the banking industry. SocGen which is using Tamr in conjunction with Google, they are in a regulated industry and this is a really important. They’re looking to actually avoid regular regulatory fines, et cetera. Risk is a big driver for banks to get a handle on their data. They absolutely have to have this. So it’s not that the probabilistic approach doesn’t allow it. It’s they do kind of come at it from a different perspective. [crosstalk 00:49:01].

Evren Eryurek:

Just two seconds to add to what Anthony said. I come from a healthcare industry, and I remember the first day, some of you may not even remember when these mobiles became the thing. So everybody has smart phones. I have these pads and so forth. Well, these healthcare providers are wanting to use these tools. We got to make them work on the environment and everybody was up in arms. Oh no, the regulators won’t let us use it. And this and that and so forth. And I remember having this conversation with my team. I said guys, did we even talk to the regulators that they actually objecting to this? We didn’t. So let’s go, guess what people? They understand the technology is moving and no, they were not. We just needed to do the validation. That’s the key. V and V, verification and validation. There are really two major differences in it, the validation, I’m trying to just explain it very carefully.

Evren Eryurek:

The other aspect was we were putting more and more algorithms into the decision-making part of a healthcare. Now, how do you make decision? Always is repeatable. How do we know that it’s not a black box up approach? Again, you go with the data that you’re generating. You show that it’s repeatable, you show that it is validated by humans. You show that it is a common outcome, and in the end, there’s still the expert looking at it and making the final call. So we were able to overcome many of the mythical resistance from regulators by working with them. And that’s exactly what we’re doing in here these days as well.

Mike Meriton:

So Evren great point, in the CDMC global work group. Part of the processes were actually interval checking back in with some of the major regulators, many that are actually members of the EDM Council-

Evren Eryurek:

Exactly.

Mike Meriton:

Because at the end, the marketplace will be operating better if there’s common trust on these controls and these approaches. If every company and every regulator in the world has to one-off understanding the requirements, the controls, this becomes a very chaotic to assure compliance. So I think your idea of include the regulator in a proactive discussion is a critical idea for companies to gain comfort that doing this as good for their business. So we have one last audience question then Anthony and Evren I’m going to ask the two of you, we’ve covered a lot of topics for just like a final takeaway for the audience as they go back to their offices and their desk and their next calls for the day.

Mike Meriton:

And then we’ll also cover for all the audience who have some housekeeping resources that we’ll make after this particular webinar. So please stay tuned so we can cover those. And you can know how to get more information beyond what you’ve heard in this session today. So this question comes from Robert and it says, when you talk machine learning for data quality, are you talking about building the solution in-house or leveraging a commercial solution? I’m not finding many data quality vendors are in a position to use machine learning with their products. So either of you want to jump in on that sort of final audience question, and then we’ll do a-

Evren Eryurek:

Anthony lead the way.

Anthony Deighton:

Sure. So I mean, Tamr is a commercial product that is built from the ground up with a machine learning based approach specifically to solve this problem. I could imagine building from first principles a solution here. I suspect it would be quite difficult. I know it’d be quite difficult because it’s difficult to build a software company that’s doing the same thing. Again, I’m biased. So, I think you should look to Tamr as a solution to that challenge

Mike Meriton:

Awesome.

Evren Eryurek:

I’m biased too, but having been on the other side of not having these tools and having to deal with rule-based systems and so forth, if you can look on our platform, we make it relatively easy and free. Come and try us, Tamr is a very nicely integrated, managed service, come and try is and there will be free opportunity for you to really learn and improve what you can do and what you can achieve in a very quick manner.

Mike Meriton:

Cool. So guys wrap up now, just final remarks for the audience. If you’re thinking about these topics and you’re within your company, what do you go and do next? You leave this call, what are some practical guidance, one or two thoughts you’d give this audience as takeaways. Evren, would you like to go first?

Evren Eryurek:

Sure.

Mike Meriton:

Your final thoughts?

Evren Eryurek:

Look, the journey might have different twists and turns. Begin the journey, don’t be afraid. The first thing is don’t be afraid. That is sort of where the pack is moving. Let’s begin the journey, your data with cloud infrastructure to really help you get better insights. And also remember journey is towards real-time insights. Don’t lose sight of that. And look what happened during COVID. All that, everything is real-time e-commerce is in software. It’s really been tremendously, well, that’s not going to change. It’s going to get even bigger. So be aware of where things are headed, and there’s lots of tools up there available for you.

Evren Eryurek:

Don’t be afraid to get your hands dirty and start trying it. There’s no really a wrong way of starting. It’s a journey, it starts from one angle. Don’t think it’s going to happen overnight, we will make mistakes, but we are here to share all the lessons learned. We’ve made a lot of mistakes in our careers, and in the past five, 10 years in this journey, we’re ready to share what not to lose as well as what to lose.

Mike Meriton:

Thank you, Evren. Anthony, your final takeaway thoughts from the audience.

Anthony Deighton:

Sure. So I mean, a couple of key takeaways. Number one is clean your house before you move it. So the move to the cloud is a fantastic opportunity to think about what you want to take with, what you want to leave behind and getting a view of that. The second is aligning that to your business strategy. Having a clear picture of why and what data aligns to the key business questions that you have. And I would also add to that, that it’s very likely that the customer is at the center of that business strategy.

Anthony Deighton:

So 99 times out of 100, starting with customer data as a kind of key driver often connects your business strategy to what data really matters. Third is absolutely cloud native only and to always. Starting anything today that isn’t built natively on the cloud and takes advantage of the managed services existing on the cloud is a fool’s errand. It’s absolutely the case you want to start on the cloud. And then the last point I’ll make, which in fairness is incredibly biased, but there you go. It’s my opportunity. Absolutely start with a machine learning first approach. Doing this stuff, trying to do a rules-based data quality MDM initiative is simply not going to work. And building from a machine learning based approach is absolutely the right approach.

Mike Meriton:

Now I thought, Anthony, for a second, you were going to say and don’t pack your trash bag and bringing it into the cloud. So with that, I want to thank both of you. This has been a great discussion and really good insights. So thank you Evren, thank you Anthony.

Evren Eryurek:

Thank you.

Mike Meriton:

And also thank you to the Tamr team for all the preparations. Everyone’s going to receive two emails, one within a few hours, you’ll get a recording of this entire session. Please share it in your company. You can even pass it on to friends and colleagues outside of your company. This is an educational service, so we welcome it. You’re absolutely good to share it as you see fit.

Mike Meriton:

Number two, in a few days, we’re going to take the Q and A and we’re actually going to put together a formal Q and A response and a copy of slides that you saw today. And then finally, if you have any follow up, please visit the Tamr website. Great team, also visit with the Google team with Evren. And then also you can visit the EDM Council website. There’s a gallery of webinars with these type of topics.

Mike Meriton:

And there’s also training courses on a lot of these topics as well. And finally, if you’d like to be involved with the cloud data management work group, just hit our homepage. You’ll see right at the top menu, just click on the word cloud and you can learn more about the activity. So we’re at the top of the hour or excuse me, we are the bottom of the hour. Everyone have a great day. And thank you for joining us today. Take care.