datamaster summit 2020

Scaling in the Cloud: Azure

 

Mike Flasko

Partner Director of Product Management, Microsoft

Learn why customers are turning to Tamr and Microsoft Azure to master data at scale. Topics this session will cover include the cost saving generated by leveraging Tamr’s cloud-native capabilities, how Tamr works with Azure’s data services, and what customer should know about migrating to Microsoft Azure from on-premise systems.

Transcript

Speaker 1:
DataMasters Summit 2020, presented by Tamr.

Andy Palmer:
Mike, it’s great to have you here with us at DataMasters, really appreciate you making the time and really thrilled. This is a really interesting time in data management, and really excited to get your thoughts and hear what you have to say as somebody really driving all the data products at Microsoft. And thanks for being with us today.

Mike Flasko:
Yeah. Thanks, Andy. First off, thanks for having me. Always enjoy a chance to talk about data management and what’s happening in the landscape. So looking forward to having a discussion with you.

Andy Palmer:
Cool. So as you guys have been through the process of reconsidering everything that is data management, as you built out Azure and all of the native solutions in Azure, what are your thoughts? What’s different or the same from the previous versions of data management into what you guys are doing today and what you see your customers doing?

Mike Flasko:
Yeah. Good question. I think probably like a lot of technology trends, there’s a lot of stuff that’s the same, in terms of the types of challenges people face and some of the key things they want to be able to do. But I’d say some of the things that are notably different, at least over the last number of years, and probably comes as no surprise, is, at least for the customers that we’re working with, the number of systems that they’re using is just more than they traditionally had. Everybody is now in a hybrid environment. Whereas traditionally, we weren’t thinking about locations of data quite as strongly. We weren’t thinking about the scale that a cloud might offer quite as strongly. And it’s a little bit more localized of a problem, not to say it wasn’t a difficult one, but a little more localized.
And I think the big differences we’re seeing right now is one, it’s hybrid. Two, it’s sizes, shapes, kinds, expectations of scale, and expectations of elasticity are just far higher as people are just expecting a lot more out of their data, out of their data platform. And those are probably the two biggest pieces of it. And then I think over the last little while as well, number of years, if you mix in some of the responsibility we all have as data custodians, as data owners, as data processors, and you look at the landscape of privacy or regulation, or even reach or anything else that goes on, I think there’s just a heightened sense of responsibility as a data custodian as well, in the new world of data.

Andy Palmer:
That’s great. Well, so we’re really excited at Tamr to be running natively on Azure. And we had originally, at Tamr, a data cataloging project that we had started. This was maybe four years ago. And we intentionally stopped because we saw lots of other people building data catalogs, and were really excited to have seen what you guys have built with Azure Data Catalog. Can you talk about Azure Data Catalog and its adoption and what’s going on there? I know we talk to lots of our customers that are getting behind it and really excited about it.

Mike Flasko:
Yeah. Thanks for asking. I think it’s one of the hottest topics actually right now. And I know the concept of a data catalog has been around for forever. But I think speaking to the last question that you were asking around just volumes of data, types of data exploding, not having a central understanding of that data and its description and who owns it and everything else, I think is just a non-starter these days. And so from a data catalog perspective it’s probably one of the most asked for things that we hear from a customer perspective. It’s considered now, I think, just to be critical, and to get everything out of your data, and expose that data in a way that all users can understand what’s there, what’s valuable to them, how to get access, and so forth. And so Data Catalog and the whole area of cataloging is something that we’ll continue to invest in pretty significantly, just given, I think, both customer demand, but when you look at where it’s coming from, it’s just becoming a core fixture in the platform in almost every customer account.

Andy Palmer:
Well, it’s really, when I was a CIO, it was always that question of, “Well, what data do you have,” was always a hard one because we didn’t have catalogs back then. So it feels good now. When I see Azure Data Catalog, I look at it, and it really makes me feel confident that Microsoft’s customers are going to be able to answer that question definitively about what data they have. And then, when it comes to actually moving data from source after discovering in catalog and moving it around, the Azure Data Factory, you guys put a lot of work into that. And these data movement functions are a bit different in a cloud native environment than they were in the on-prem world with things like Informatica and Talend. How do you think of Azure Data Factory now, and the adoption of Data Factory, and how people are moving data or not?

Mike Flasko:
Yeah, good question. I think it’s another one of those things that have taken off just from these trends that we’ve been seeing, which is people are always going to have to go reach out to valuable data wherever it is, bring it in, allow people to combine that data. And so that’s where it started with Data Factory for us was, there was a lot of interest in leveraging scalability of the cloud, economics of the cloud. But, obviously when the cloud started, it wasn’t where all the data was born or even the majority of the data was born. And so for us, we looked at Data Factory as a data highway system. Move the data when you have to, bring the data together when it’s advantageous to do so.
And so, we’ve been investing quite a bit in that, just helping our customers connect to data wherever it is, bring data into the right tools at the right time to be able to work with it. And so that was one of the biggest enabling capabilities. And then since then, as the collection of capabilities grew out in the cloud, the innovation that you guys have brought with Tamr natively on Azure. I look at some of our native capabilities and data warehousing and big data and whatnot. Then we evolved into saying, “Oh, it’s not just about enabling people to bring data together of all sizes, shapes, and whatnot. It’s helping them create end-to-end workflows, and understanding the end-to-end workflows, and orchestrating the whole thing.” And doing that across systems and across boundaries.
So that if you’ve got data coming from on-prem up to the cloud, bringing it together, getting it mastered or cleaned with a lot of the things that you guys are up to, putting it in a warehouse, seeing it through to a BI report. That was just the natural evolution of what we’ve been after. But that’s one where at least adoption has just been awesome to see, I’ll be honest. It’s one of the projects I’ve been with for quite a while. And I think it’s just so inherent to analytics that it’s one where a lot of people have the need.

Andy Palmer:
Yeah. Well, like I said, everywhere we turn, not only are people picking up Catalog, but also Data Factory. And then we also… And it’s been amazing. I’m a database guy from way back, my partner Mike Stonebraker and I, and it’s always interesting to us where people end up putting this data. And so it looks like Synapse has really taken off. And we’re really excited about that. We’ve had a couple of customers deploy where they’re taking master data that they’ve worked with with Tamr and deploying into Synapse. Can you tell us a little bit more about Synapse and what the plans are?

Mike Flasko:
Yeah, sure. And I think it follows the story that you were talking about, is I think our goals with Synapse and our plans with Synapse were around really making it easy to analyze and get the most out of your data. To your point, everything was a relational database some time ago. That was the jumping ground for everything that we’re doing. And when I started at Microsoft, that was certainly true, too. But I think over time, as we saw some of the innovation in big data, in data lakes, in data warehousing, it’s been very enabling for customers, but at the same time, there was a lot of innovation, a lot of systems.
And so I think our goal with Synapse was to say, “Let’s take the best of big data processing. Let’s take the best of analytical capabilities.” And instead of offering them, if you will, a bunch of systems that they had to, at least from us, bring together, configure, et cetera. We said, “Wouldn’t it be nice if you could have at least the beginning of your analytical platform from Microsoft that says, ‘help us manage data as it transcends the needs of a system in big data and the needs of a system in data warehousing, and just make it as simple as possible for people to manage that progression of data.'” And that’s been our goal with Synapse, is to say, “Look, if you’ve got an analytics problem, whether it’s big data or warehousing or integration, we’ll give you our core platform capabilities pre-integrated together so that you can just get on with the job to be done, and then integrate with any other systems that you may need, whether it’s for mastering data or cleaning data or visualizing in Power BI or that kind of stuff.”
And so really our goal with Synapse was to make it as simple as possible to people that would go from raw data, if you will, to Insights and then help them take away this feeling of, “Oh my goodness, the data at the space has exploded over the past five years. What do I use when? How do I stitch it together? This thing’s incompatible with that thing,” and all that kind of stuff. And so…
You smiled there and I kind of joke that it’s never been a better time to be in data because there’s more tools, there’s more systems, et cetera. But at the same time for customers, there’s more tools, there’s more systems, et cetera. And so, trying to help them understand how these things come together into a solution. And so I think being able to show, here’s Synapse capability, here’s how it cleanly integrates with other capabilities, and just kind of removing that complexity out of the conversation.

Andy Palmer:
It’s really remarkable. And we, at Tamr, we use Azure DataBricks for compute. And one of the things we’ve been really impressed by is the open nature of Azure as a platform and the ability for partners like ourselves and DataBricks to work and function effectively in the environment. And from our perspective, it’s ideal, both in terms of elasticity and the ephemeral nature of it really does lend itself well to our very compute-intense workloads. And it’s remarkably easy to deploy Tamr with Azure DataBricks.
And talk to me about the ecosystem a little bit more. I mean, we’re having a great experience. It seems like a huge departure from traditional Microsoft. And you’ve been there for a while, you’ve seen all this stuff play out. Does is feel different from the inside? Because it certainly feels different and more open from the outside.

Mike Flasko:
Yeah. Great question. I think you kind of hit on it, which is there are so many needs from customers right now that enabling them with the best options at the right time I think are critical, and just going backwards from there.

Andy Palmer:
Best of breed. Best [crosstalk 00:13:27].

Mike Flasko:
Yeah. Just enabling people to solve the problem with the tool that does it best for them, yet at the same time, marrying that with other stack that they’ve chosen. And I think that’s the balance we always want to strike.
And I think that in terms of, does it feel different? I don’t know. I think it’s one of the things probably that I see is just really going customer backwards all the time. I think that has been something that’s been quite transformational going… It sounds a little cliche, but it had been kind of going through Microsoft, which is to say, “Let’s make sure the customer’s happy, their workload is solved, we give them access to the best tools at the right time.” That’s almost always going to be us. It’s going to be partners like yourself, and collectively we can give them the best experiences and solutions possible.
And I think enabling these types of clear ways and simple ways by which people can understand how to get their data, how to bring it to the system, how data flows. I know we’ve been talking for a while, how do we make sure that it’s simple for customers to do this versus feeling like they’ve got two pieces of Lego that weren’t intended to ever rendezvous on data? I think that’s the biggest thing, is making sure that we’re going backwards there and we’re collectively making it as easy as possible, that rarely can steer us wrong when we’ve got that kind of point of view, I think.

Andy Palmer:
Well, it’s amazing. A lot of people, when they start, it seems like they start with migration and figuring out how to get some of these workloads over. What do you see customers doing in terms of how much heavy lifting? At Tamr, obviously we’re interested in helping them master their data, and sometimes people want to master their data before they move it or while they move it or after they move it. How do you see those patterns playing out in terms of where people want to clean up their data and/or do other housecleaning kind of activities?

Mike Flasko:
That’s a great question. Personally, I feel like the cloud offers a great opportunity for that type of work, simply because, at least for a lot of the scenarios I see is… Excuse me. Given the scalability nature, given the cost profile nature, some of the pay-per-use nature of the cloud, it tends to be a great place for combining data that hadn’t been combined before, or getting more holistic views of data and bringing it together. So, which tends to lend itself very well to things like integration workloads, cleaning workloads, mastering workloads. And so I find that for a lot of customers, they look to the cloud as part of migration projects or broader ambitions with data, oftentimes combining data in new ways or leveraging data in new ways. And they often look into the cloud for that.
And so I feel like oftentimes when I’m having just a, have to say, a pure migration conversation, there’s almost always some other business goals that they have in mind over time. And I think some of the things that you guys have been looking at is, how do you master data at scale, how do you clean data at scale, how do you offer that elasticity? I think fits really well with the types of, I think, ambitions people have for the cloud becoming that opportunity to work across data silos that they’ve maybe been unable to do before, either because of scale or because of cost or other challenges.

Andy Palmer:
Oh, that’s great. Well, it’s great to hear you say that. So, maybe if you don’t mind, maybe go up the stack a little bit and talk about Power BI. I’ve been a huge fan in the last five years as Power BI seems to have taken off and the feature and function set. How do you guys view, as you think about data management infrastructure relative to Vis-Tools and the more consumptive tooling, how are you thinking about Power BI, and how does it fit, and also how desktop-centric versus cloud-centric do you think Power BI is going to be in the near future?

Mike Flasko:
Got it. Yeah. I think it’s interesting because, at least for the types of areas that I’m engaged in most, information management tiers, data governance tiers, data privacy tiers, analytics, that kind of stuff, your increasingly seeing a blend of who’s a data worker, if you will. To me, it’s quickly becoming, what’s the type of questions you’re trying to answer and what is your comfort or desire for different types of tools? And what I mean by that is, I see a lot of discussion with customers. It’ll be about, they want to hear from us the data platform capabilities in Azure, the processing capabilities, the cleaning capabilities. Then they’ll want to hear about the experiences that are available on top of those platforms for their technical audience, their data engineers, or data scientists. And then in the same breath, they want to hear about the experiences and capabilities that will be exposed up through to the business worker, if you will, or the information worker, up into Power BI or other areas.
And so I find that it’s increasingly becoming, if you will, part of the data stack. It’s not just visualization, et cetera, but there’s this expectation that your business workers are part of self-service access to data and self-service cleaning to data. They’re a data consumer, if you will, in a very real way. And so for, at least for me, we’ve been spending a lot of time thinking about, how do we just make it absolutely seamless to think about our cloud platform and how Power BI consumption is just a very native, integrated piece of the experience for business workers? Going from there. I think the web versus desktop, I think everybody has different takes on this kind of stuff. I think for a while, you’ll see things in both. I think it becomes a little bit of a personal preference thing, personally.
But at least for us, we’re spending a lot of time… If I were to connect the conversations as well, back to your conversation earlier with data catalogs, we see you can’t have a catalog conversation, you can’t have a data platform consumption conversation without explaining the path to consumption by the business worker, as well as the path to consumption via your technical worker and how they can easily stay in coordination with each other. And so, maybe touching on it from those few angles shows that it’s… I think it’s just going to become an integral part of the data stack. And those types of users are no longer, how to say, just the consumers over there. And we have a conversation within IT, I think it’s a very blended conversation now between all different types of consumers of data.

Andy Palmer:
Hmm. Cool. Well, it’s hard for me, we’re running out of time, but it’s hard for me to spend the time without talking about Windows a little bit. And I know way back you were in the Windows Kernel Team, I think [crosstalk 00:21:24].

Mike Flasko:
I was in Windows Networking Group for a number of years. Yeah.

Andy Palmer:
So tell me, what are your thoughts on the state of Windows today, and what do you think is good and what scares you?

Mike Flasko:
You know, good question. I think it really becomes a conversation personally, personal experience, personal preference, around… There’s my consumer desktop life, and then there’s the server side. I think on the server side, you’re seeing just a very native blending into the cloud. Because what used to be thought of as a server is now this extension of a cloud.

Andy Palmer:
It’s a bunch of resources.

Mike Flasko:
Yeah. It’s a bunch of resources. And so I think that the state there, if I look at a lot of the innovation that people are doing is, it’s becoming a natural extension and integration of the cloud. So the whole definition of what is a server has really blended as being an edge of the cloud and starts sharing a lot of traits there.
On the desktop side, I’ll be honest, I’m now in the fully remote school situation with my children during this extra special year of all kinds of challenges for us all. So I’ve been pretty excited on the future there. We quickly had to get some devices set up for our kids to go through the education site and everything else. And I think some of the work they’ve done on Windows and the experiences on some of the new hardware has got me pretty excited.

Andy Palmer:
Wow. That’s cool. Well, it’s really great to see you, and thanks for taking the time to talk to us, and thanks for all the great work you’re doing at Azure. We at Tamr are really, really excited when our customers tell us how they’re going to deploy on Azure, because it’s always really easy, really fast, and we’re just thrilled with the new capabilities in Catalog and Data Factory and Synapse. So thanks for all the great work you’re doing. And again, really appreciate you being with us today.

Mike Flasko:
Thanks, Andy. I appreciate you having me. And I’d just like to echo the sentiment, and say it’s been great working with you guys. We think about how do we solve our customer’s data challenges, from ingestion through cleaning, mastering, and everything in between. So it’s been a pleasure. Thank you.

Andy Palmer:
Thanks. Well, it takes a data village. We’ll get there together. Thanks again, Mike. Cheers.

Mike Flasko:
Thank you, Andy.