datamaster summit 2020

Accelerating business outcomes with Tamr and Amazon Web Services

 

Shashi Raina

Partner Solution Architect, Amazon Web Services

Learn why customers are turning to Tamr and Amazon Web Services to master data at scale. Topics this session will cover include the cost saving generated by leveraging Tamr’s cloud-native capabilities, how Tamr works with the various data services of Amazon Web Services, and what customer should know about migrating to Amazon Web Services from on-premise systems.

Transcript

Automated Intro:
Data Masters Summit 2020 presented by Tamr.

Louise Baldwin:
Hi, everyone. And thanks for joining us. I’m Louise Baldwin solutions director at Tamr. I’ll be moderating this session, accelerating business with Tamr and Amazon web services. Today, you’ll be hearing from AWS’ Shashi Raina. Shashi is a partner solution architect and helps organizations leverage the AWS platform to better manage their data. Shashi has extensive experience in the field. So prior to joining AWS, he held cloud engineering leadership roles as Toys R Us, Knolwedgent, and Equinix. Also joining this discussion is Suki Dhuphar, Tamr’s, EMEA leads. He’ll share why companies are running Tamr on AWS and the benefits that they’re seeing by choosing to deploy in the cloud instead of on-prem. So lots to cover, so we should probably kick things off. First of all, welcome Shashi and Suki.
There are three areas that we’re hoping to go deeper on during the session. First, why companies should use AWS for data mastering. We’ll then have a chat about how Tamr and AWS work together. And finally, we’ll discuss the benefits of running Tamr on AWS. So Shashi, I’d love to start things off by chatting about the continued momentum that we’re all seeing for AWS. Can you talk about why companies are switching from on-prem to AWS?

Shashi Raina:
Yeah. Well, before I answer the question, I want to thank Louise and Tamr team for providing me with the opportunity to be part of this event. Now, back to that answer. So there are like five reasons companies are switching to AWS from on-premise. The first one is agility. AWS lets customers quickly spin-up resources as needed, deploying hundreds or even thousands of servers in minutes. This means customers can very quickly develop and roll out new applications. And it also means that teams can experiment and innovate more quickly and frequently. If an experiment fails, you can always deprovision those resources without any risk.
The second reason is cost savings. If you look at how people end up moving to the cloud, almost always the conversation starter ends up being the cost. So what AWS does, it kind of allows the customers to trade capital expense for variable expense and only pay for IT as the consumer. And this variable expense is much lower than what customers can do for themselves because of AWS’ economies of scale. For example, [inaudible 00:02:50] has estimated that migrating its data centers to AWS will contribute to a global savings of around a hundred million dollars in infrastructure costs.
The third reason is elasticity. Customers used to over-provision to ensure they had enough capacity to handle their business operations at peak level of activity. Now they can provision the amount of resources that they actually need knowing they can instantly scale up or down with the needs of the business, which also reduces cost and improves the customer ability to meet their user demands.
The fourth reason is that cloud allows customer to innovate faster because they can focus their highly valuable IT resources on developing applications that differentiate their business and tasks from customer experiences and stuff, undifferentiated heavy lifting of managing infrastructure and data centers.
The fifth reason is that AWS enables customers to deploy globally in minutes. AWS has concept of a region, which is a physical location around the world where we cluster the data centers. We call each group of logical data centers as an availability zone. Using AWS customers, can leverage 76 availability zones across 24 regions worldwide, and we don’t plan to stop there. Over to you.

Louise Baldwin:
Wow. Incredibly impressive in terms of the cost savings to the agility to the global deployments. Suki, when we take it back and we look at data quality within that, how does the cloud sort of change things from a data ops and a data master management perspective?

Suki Dhumphar:
Yeah. Thanks, Louise, and thanks for having me here. I think actually a lot of them go back to the points that Shashi was raising. I think that the key thing that we’re looking at now is the volume and the velocity of data that people are seeing. I mean, it’s ubiquitous, it’s all over. And the type of organizations that we’re working with tend to be really large enterprise organizations who really need to concentrate on both regionally in terms of how they deploy these solutions, but also then globally. So I think when we look at things like, cost saving is obviously critical to all organizations, but it’s also the speed and agility that they can actually get these solutions out. MDM is a part of… and data ops is a methodology, and it obviously lends itself well to the way people were developing apps before in sort of in the DevOps world. So all we’re doing is sort of mirroring what was done before but also at the same time learning the actual lessons that we’ve taken from developing apps for end-users into developing data wraps for end users. So when we look at it that particular way, a lot of the points that Shashi was raising absolutely play into sort of our customer’s hands in terms of getting things out quicker, faster, and of higher quality.

Louise Baldwin:
I really like that phrasing of data apps for end users. And I think sometimes when we look at the transition to cloud, we see a position does a lift and shift approach, which we know doesn’t tackle kind of the underlying data challenges, data quality challenges within that. How can data mastering better enable enterprises within the cloud?

Suki Dhumphar:
Oh yeah. No, absolutely. And some people will talk about the lift and shift as you’re moving one problem from one place to just another place, but we have to realize the arena is completely different, right? So when we’re talking about on-prem, the challenges that come with the on-prem sort of virtualized environments compared to what you’re getting with companies like AWS, all of the sort of extra bells and whistles that you need for a production system, it’s really, really important to understand. Yes, you’re shifting the problem from one place to another, but now you’re putting it in a place where the challenge can be actually managed, right? So with the elasticity, with the flexibility of cloud vendors and products like AWS provide, coupled with what we’re doing in the MDM world, really comes together very, very well. When you’re looking at technologies like ours, which use machine learning, we need sort of cloud vendors to be there, to help us with that sort of flexibility that’s required in order to deliver the results that customers are actually looking for. So I think, yeah, it’s just a different arena, but it’s a better arena. It’s a better place to be actually tackling this problem.

Louise Baldwin:
Could you talk about the portfolio of AWS services and if there are any in particular that really benefit companies that are looking to master their data?

Shashi Raina:
Yeah, absolutely. So AWS has like more services and more features within those services than any other cloud provider by a large amount. AWS has been like continually expanding its services to support virtually any cloud workload, and it’s now around 175 fully featured services we have. And that includes like storage, databases, machine learning, artificial intelligence. And in addition to having the greatest breadth of services, we also have a lot of depth of this functionality. And coming back to your question on mastering, I believe that ML services stack on AWS can benefit companies in such use cases. We already talked about AWS Lake Formation service, right? Now, in addition to that, like AWS supports like all the major frameworks like TensorFlow, MXNet, PyTorch, Caffe [inaudible 00:08:30], et cetera, which is suited for expert machine learning practitioners, including advanced developers and data scientists, which are comfortable building, fine-tuning, training, deploying, and that kind of stuff, right?
So we have like everything for everyone basically. And now… But we also understand that if you want to kind of make machine learning more adaptive and kind of use it in an expansive way, we really need to make it accessible to a lot of people who are not machine learning practitioners. And for that, we have built SageMaker Amazon SageMaker, which is a fully managed service that removes the heavy lifting, complexity, guesswork of having a step-by-step machine learning processes and can empower the developers and scientists to successfully use machine learning. So that’s basically the power of like ML stack, which I love to share with you. This is a perfect opportunity for me to do that, so thank you. Thank you for the question.

Louise Baldwin:
For sure. And, Suki, you touched on the machine learning aspect there, and obviously machine learning and AI are often mentioned when discussing approaches to data mastering. So we’re naturally big advocates for a machine learning approach at Tamr, but, Shashi, would love to hear your take on how these technologies can help.

Shashi Raina:
Yeah. This is a really interesting topic you kind of brought up, the machine learning and AI. I mean, we are seeing a lot of traction. Obviously, you guys are doing, seeing the same thing. If you think about in fullness of time, I would say that virtually every application will be infused with MLM AI. Most customers we work right now, they’re very interested in machine learning, like tens of thousands of customers are running machine learning on AWS. And basically one of the reasons is that like it has kind of… the SageMaker adoption, which is Amazon’s machine learning tool, has kind of spurred this kind of adoption on a very broad segment. We have like companies like Slack at Capital One, [inaudible 00:10:30] NBC like who are actually using this.
While an incredible amount of progress is made in organizations with MLM AI, we actually are still in the beginning of this process or journey, I would say. And I can say that we have about twice as much ML being run on AWS that you would find anywhere else. And this is still very early for most of the organizations. You mentioned like data mastering use case, right? So as an example, AWS has a Lake Formation service which contains fine matches, it’s a ML transform that enables you to match the cards around different datasets, and it can also identify duplicates with little or no human intervention, actually.

Louise Baldwin:
Yeah, absolutely. Really interesting to hear, obviously, in terms of both the reduction in the human intervention but also just how early we are in terms of adoption of AI and ML tools as a whole to approach mastering is definitely something that we believe in at Tamr. Suki, we are a cloud native. When companies are looking for MDM solutions that talk about their cloud capabilities, what should they look for? I mean, are there differences between a virtual machine, VM deployment, versus cloud native, for example?

Suki Dhumphar:
Yeah. Good question. And I think the way to address that is, one of the things to remember, with a lot of the what I call the newer companies that are out there, we’ve been built on a cloud backbone, right? We’re not built with cloud in mind, but we’re built with the backbone. So it’s not in mind, it’s front and center of the way we think. So being cloud native, working with technology provided, interacting with all of the tools that companies like AWS will provide is actually a part of our sort of DNA. So when people talk to us about being cloud native, we’re fluent in that conversation, because we’ve developed our product using cloud as a backbone. So I think that’s an important factor when we look at organizations that have been around for quite some time. They’re buying technologies to fit the mold of cloud, whereas we’ve been designed with cloud in mind from day one.
And I think that makes a big difference. That interoperability, the ability to drop in, do what we do well, and yet allow organizations like AWS to enhance our capabilities with the tools that they’ve got, especially as I said, when we’re going from just pure experimentation and just pure trial and error through to absolute production pipelines. We’re there all away. We’re there for the beginning and the end and as our organization such as AWS. So I think that’s where we work very well together.
And to address the sort of second part of that question, I think Shashi’s points earlier sort of cover the difference between cloud native and virtualization. These things are available online for people to see the difference in terms of costs, speed, agility, elasticity, the ability to run, as Shashi and yourself were just saying. When we talk about being at the infancy of using machine learning and AI, it’s only going to grow, right, which means demand is going to grow, and organizations are going to start to get more data mature and data fluent, and they’re going to want more sophisticated pipelines.
And in order to do those sophisticated pipelines, they want demand on tap. They want to be able to switch something on very quickly, try something, see it working, and then move on to the next bit. So I think that’s where it’s really important for organizations to think about platforms that are inherently cloud native instead of people that are thinking, I’ve made it adaptive to cloud. So that’s just a personal opinion that I’ve seen across sort of our clients, and that’s how we’ve helped our clients as well.

Louise Baldwin:
Really interesting to hear, whether or not cloud is truly at the foundation of deployment within it. And we’ve discussed a lot of the benefits of moving to cloud, and they are vast in terms of what we’ve covered. Would love to go a little bit deeper on either some of the operational or the analytical benefits, Suki, using some of your experience because I know you’ve worked very, very closely with many of our customers. So would love to hear, or for you to share any stories of what you’ve seen in terms of benefits.

Shashi Raina:
Yeah, no. Absolutely. I mean, as I say, the proof is in the pudding, and sometimes when you’re actually delivering something and you’re working closely with customers, you get to see sort of warts and all exactly what’s going on. I think the biggest difference I’ve seen, and this is not just with Tamr but throughout history of working with organizations and working with data, is now the speed. No longer have you got the sort of three, six, nine month lead-up time to spin-up a virtual machine to get your data onto it. The accessibility now… I mean, we’ve worked with customers where we ran a mastering project from start to finish within eight days, right? And we’re talking about product mastering. We’re working very closely with AWS on Toyota Motors Europe, where we’re training different countries in mastering their customer data within two weeks. And this is like multilingual data as well.
So just taking that… For us now, it’s become two weeks is too long, right? We want to do this in eight days, and now we’re going to try and beat the eight days. But the reality is, that’s the speed we’re talking about. Just a quick anecdote. I remember working on a data quality project for a large German bank. And it was a four year program, which just to master data, four years. Now, when I look at it using products like AWS and Tamr, and when we can spin things up very quickly, those projects are not taking four years. In fact, customers want them in six months or nine months, across their enterprise. So yeah. Huge differences that we’re seeing because of cloud computing, and it’s, as I said, when we talk about our data ops pipeline, we always say cloud first. We say to every customer, if you want to do this well, select a cloud vendor, go with a strong cloud vendor, and be cloud first, right? And now we’re starting to see that momentum shift towards the cloud.

Louise Baldwin:
Wow. Really showed as a game changer. When the unit changes from years to months to days, you can really scale the significance of the impact. Shashi, would also love your take on the same question in terms of the impact you’re seeing.

Shashi Raina:
So impacting in the analytics space? Like is that…

Louise Baldwin:
Yeah. I think that’d be fantastic. Either some of the operational benefits or the analytical benefits that you’re seeing from moving from on-prem to AWS.

Shashi Raina:
Sure. Yeah. I mean, as Suki was saying, like the timelines have… our expectation has kind of changed from multi-years project to essentially months now, right? And I mean, one of the things which is kind of making it possible is that the amount of services which are in the innovation which is happening in AWS cloud. So it’s kind of amazing how much analytics have changed and what is available right now in the cloud today. Because [inaudible 00:18:05] would be easier to kind of collect like store and licensed share data then we can do it right on the cloud. And that’s because not only like cost effective in the cloud, but also there’s a whole range of services, which are kind of available at our disposal, which kind of makes that possible. Now, if you think about that, right?
I mean, the foundation of the whole thing is storage, right? Now, most customers in the cloud use Amazon S3 as their storage and data link solution because it has more functionality and unmatched availability and reliability and scalability, right? Now, not only this, but it’s very secure too. I mean, this is the only service which allows you to block public access at the bucket and account level. And you can get like inventory reports of all the objects you have there. Now, over the years, customers have accumulated so much data, and lot of that actually lives in different silos. And we all know that, right? And that really makes it hard to do anything with the data, including analytics, right? So the customers like pull the data together in a data lake. That’s the most popular choice. And the S3 is actually the choice for that basis of data lake.
The AWS hosts like tens of thousands of data lakes on S3 today. And it’s the object store and data Lake that gives you the most ways to get data into it. Now, you have like so many services which you can use to get data, right? You can use like over the internet, for example, or the wire, we call it, right? We can use direct connect service, for example. We can use the AWS like backbone. We can use like Kinesis for streaming data. You can have IOD devices. You can use IOD in the storage gateway, for example. Then you can also use like appliances we have, like we have a Snowball appliance to move like data. If you have a lot of data, then you can use like Snowball, but I don’t know if you’ve seen that 45 feet container, which is just like out of like some futuristic movie where it’s like a big truck with a 45 foot container.
You put all the data and kind of moves your data from your premises to the data center. So that’s also a possibility, right? Now, if you think about that, like almost every imaginable way we have provided to customers to move data into S3, which is significantly more than you will find anywhere else. And with like a new service like Lake Formation, customers can build like a secure data lake in days instead of like weeks or months. Now, we don’t stop here, right? Like beyond like the S3, we also offer like most analytics capabilities like as compared to anyone else out there. Customers can process like [inaudible 00:20:50] vast amounts of data with EMR, which can suppose like around 21, also open source projects, like Hadoop, Spark, [inaudible 00:20:59] and more. And you can even run like real time analytics on Amazon Kinesis, right?
And then we can use like Elasticsearch, which is a really operational dashboarding tool. There’s a [inaudible 00:21:15] side, which can be used for like BI visualizations and embedding machine learning. And there’s something like [inaudible 00:21:23], which can use for [inaudible 00:21:23] using your standard, like SQL. And then there’s Redshift, which is like a scalable fast warehouse service, which allows customer to run complex queries at like massive scales. Redshift delivers like 10 times faster performance than any other data warehouse by using machine learning. And we’ve talked about machine learning as a team for today and keeps them coming up and up, right? And [inaudible 00:21:49] users like massively parallel query execution engine and column storage on a very high performance disc. And now we also have Redshift Spectrum, which allows customers to run queries on the unstructured data, which is an S3.
So in addition to what data you have on Redshift, you can now run query on the data on S3 without actually moving the data or transforming it or loading it, which is huge. And then we also have like something on Redshift Federated Query, which actually you can run basically coding across Redshift, across S3, and also across your [inaudible 00:22:23] your databases like, I don’t know, Postgres. So now we are covering like a multitude of like sources which you can combine with something like Redshift Federated query.
And now we also introduced recently [inaudible 00:22:39], which is like advanced [inaudible 00:22:39] for Redshift. It’s coming in 2020, I think. Which is a very innovative way to hardware external cache, to use the hardware external cache that can provide like 10 times query performance than other cloud warehouse solutions. And with like Amazon Redshift [inaudible 00:22:57] instances, we can actually separate the storage and compute. So you can see like how much like an evolution is happening around AWS platform and how the ecosystem for analytics is like so rich and so useful for customers to do like all the things they possibly want to do right now.

Louise Baldwin:
Wow. It’s actually really interesting to hear and honestly, I feel like we would probably need hours to do justice to what you just covered within that, but always fascinating to hear kind of the breadth of services and capabilities. And also some of the pipeline of what you mentioned is coming up as well from AWS, which is fantastic. Suki, to wrap things up, we’re presenting this as a great partnership between Tamr and AWS. Could you comment on why AWS and Tamr are better together?

Suki Dhumphar:
Well, yeah. No, absolutely. I think… Well, you’ve just had all of these services that AWS provide, and that’s probably just a small section of them. But, I mean, ultimately, all of these services that are being built, including Tamr, we’re there to serve a customer and serve their sort of pipeline to make sure that they can use… they get the best of breed when they’re coming to build their pipelines. I mean, we’ve all been in this space for such a long time. We realize no single pipeline is the same, right? Everybody is looking at using different sort of infrastructure and technology to get the best of what they can get. So we work together well. When Shashi was mentioning the S3 bucket storage, data lakes, well, ultimately customers, want to have that sort of democratized data across their sort of organization, mainly because you’re looking at outside factors. Outside factors include regulatory factors, competitiveness, or challenges that we just recently have.
And in order to cover them, you need to be agile. And you need to be able to work in a way where you can accelerate your sort of development and have that master data management as the part of your sort of data pipeline, data ops pipeline, as we talk about. And that’s why we work really well together because it’s not always running at the same speed, the pipelines. Sometimes you need to really turn it on to get the best out of it. And sometimes it’s a bit of a trickle. But either way, when you’re looking at it, we work together well, depending on customer’s needs. We flex together, and when the technologies are working together, as we’ve sort of spent most of our time doing, customers are getting the best they can out of all of the knowledge from Amazon and from Tamr as well.

Louise Baldwin:
I liked that, flex together, like really speaking to kind of, to me putting the customer first and that agility and adaptability in deployment, which is really…
Suki, wow, you really put it in the context. When you talk about that agility going from years to months to days. Shashi, same question to you, what’s your view on some of the analytical and operational benefits of moving from on-prem to AWS.

Shashi Raina:
So I think we talked about this. This is good. We have to go to the last one, the summary question, which I mixed up with the first one.

Louise Baldwin:
Ah, so the very final, the to close-out one.

Shashi Raina:
Yes. So this is good. We’ve talked about [crosstalk 00:26:27]

Louise Baldwin:
Perfect. Perfect. Okay. All right. So, Shashi, to close out, so the final words are yours. Could you summarize how the Tamr partnership benefits AWS as customers?

Shashi Raina:
Absolutely. I think Suki made really great points, in his summary of the partnership. So as you’ve seen, right, we talked about how AWS makes it easy for customers to collect, store, analyze, and share data. But they are challenges which customers face around siloed and dispared systems. And we talked about like, Lake Formation service, how it kind of helps customers to build a secure data lake in days instead of like weeks or months. We also talked about the whole ML stack of services, which can help customers to solve some challenges. What Tamr does, and how Suki was saying, Tamr is like built like cloud native, so it’s not the cloud was after thought, it was kind of built cloud native. So Tamr like helps like AWS customers overcome data mastering challenges while leveraging the company benefits. Since it is designed on AWS Tamr’s cloud native data masking solution combined the machine learning models and rules and human feedback to automate, which is a keyword for me, to automate and quickly publish accurate enterprise data. So I think that I think is a real value, which Tamr brings to the eras customers, honestly.

Louise Baldwin:
Fantastic. Okay. That was a perfect note to finish on. I thank you very much, Shashi and Suki, for providing insight on how Tamr and AWS are coming together to really drive those end business outcomes. Thank you very much, everyone, for joining. We hope you enjoy the rest of the data mastery summit.