datamaster summit 2020

Master to Migrate with Google

 

Andrew Psaltis

APAC Technology Practice Lead Data Analytics & AI at Google Cloud

Most companies have come to the conclusion that the future is cloud. Although it is no longer a question of if, but when, to move – there is a lot to consider. Join this session to get advice from Google on where to start, and the importance of feeding other technologies with good data.

Transcript

00:00 – 00:05

Speaker 1

Data Master Summit 2021 presented to you by Tamr.

00:07 – 00:31

 

Ravi Hulasi

Hi, my name is Ravi Hulasi and I’m Thomas Chief Cloud Evangelist. Welcome to this data master session where we’re going to talk about the ins and outs of migrating to the cloud. I’m thrilled to have Andre Psaltis joining me today from Singapore. For those who haven’t met Andrew, Andrew leads with data and analytics and AI technology practice across the APAC region for Google. Andrew, welcome and thank you so much for joining us across the world.

00:32 – 00:35

Andrew Psaltis

Thank you, Ravi. It’s my pleasure to be here and looking forward to this conversation.

00:36 – 01:07

Ravi Hulasi

Fantastic. Let’s dig into cloud migrations then. So most companies have now come to the conclusion that the clouds here and here to stay. So it’s no longer a case of when they need to move to the cloud, but they just need to acknowledge it’s going to happen. So there’s a lot to consider in moving to the cloud, moving data and applications. And just wondering, Keith, tell us a bit how Google advises their customers to get started. Some of the considerations that you’ve had as you’ve worked with so many Fortune 500 companies to do this.

01:08 – 01:53

Andrew Psaltis

Yeah, it’s a great question. I think one of the things that we see in your IoT everyone is is moving. They’re on some phase of that journey, whether they’re already there all the way or usually some way on that path. And we see it unfolding in multiple different ways that they do that. But the key thing that we always try to start with is what is the business outcome you’re after? So instead of chasing shiny objects and technology to really start with a business problem and then work back from there. But then as you do that and you migrate, it’s really not this lift everything and lift all the data, even once you identify the use case and just put it there. And it really does provide that opportunity to take a look into what you’re doing in that process and to re-evaluate.

01:54 – 02:19

Andrew Psaltis

Now, a lot of times customers have all sorts of legacy technologies and they’ve been working on there still quite different than what you see in the cloud. So we often even know there’s sometimes these migration patterns of in a shift and then modernize that completely cloud native. More and more customers look at that, and that’s like a long time migration process, and they don’t want to do it right for very good reasons because it’s already complex enough.

02:19 – 02:54

Andrew Psaltis

So you look to go from what you’re currently doing to what would be the optimal solution in the cloud. And that solution in the cloud is often not the same way you’d built things on. The scale is different. The capabilities are different. So I’d say that don’t look at it as the traditional approach to upgrading a version of a relational database or a big data platform like that where you’re stuck with kind of that same architecture. But to really step back, look at the business process, look at the best patterns of how to achieve that goal.

02:55 – 03:27

Andrew Psaltis

And then as you start to move the data, I think that’s where Cambridge plays. It just an absolutely critical role and is just like, how do you start to move the data intelligently and how do you take care of making sure that what lands there is actually clean and goal, right? And really go through and get all the data there? And I think I’ve heard it referred to before, and I think it’s classic. I noticed he was like the washing machine, right? So get that data through and out the other side. And then at the same time as achieving that business goal, leveraging the best patterns on how to do that kind.

03:56 – 04:44

Andrew Psaltis

Start with something small. First off, and oftentimes something that is that you can’t do on ground, right? Prove of the value of what you can do. Look for a use case that’s going to add to the bottom line. If you have used cases where it’s really just and it costs and processes operational, the reward for going there isn’t that great. All right. So it’s a one. I think it comes down to training people and incentivizing people that want to do it and finding a use case that those ads the bottom line. And often it comes to use cases that are really hard to achieve with traditional technologies. And so how regardless of what the businesses? What’s a use case that’s kind of forward thinking in that business and drive towards that outcome?

04:44 – 05:22

Andrew Psaltis

I’m going to get everyone in the organization fairly excited about it because it’s something that’s going to help move the business forward, something that helps them grow, something that retains talent, which is always a critical thing that you have them engaged. And it’s not just take this 20 year old program and now go make it run in the cloud. All right, when people start to fall asleep thinking about it. So we look at, you know, not doing resume driven development and chasing shiny objects, starting with the business use case, oftentimes helping it be a business partner and really looking towards something that helps solve a business need that you can really benefit from there.

05:22 – 06:00

Andrew Psaltis

But starting small is critical. Ideally, you could have small success and you look at just software over the years and one adopted kind of like these agile methodologies and 20 years ago. But then sometimes people don’t apply that to migration. And so it’s really a matter of going, how do I pick something that’s a critical business use case that I can’t do today and go do that on the cloud and kind of show value, show wins and then keep iterating on that process. And it’s oftentimes a transformational experience. And businesses are kind of transforming as they do that and reinventing what they do and how they perform.

06:39 – 07:35

Andrew Psaltis

Yes, that’s a great question. So and I think that’s right, and we realize that even though of course, we like to say that, you know, every customer is just using Google, that’s not the case. They’ll never be the case that customers do run in multiple clouds. And in some cases, you see in some industries that it starts to become a regulatory requirement. When you think about this risk concentration of running the same, we’re all workloads on one. You take a small country that may have say, like four major banks if they’re all running the same workload on one cloud. We all like to say that it would never go down. There’s service interruptions that happens. So what if what if that primary cloud goes down that’s running the same workload for four major banks in the country? You put the stability of that country at risk. So you see this, you know, this risk concentration that you’ll see regulators across the world start to adopt enforcing different entities to run a multiple clouds.

07:35 – 07:56

Andrew Psaltis

So I think when you do that, there’s lots of different ways to handle it. You know, we see some that Ron infrastructure on one analytics and A.I. and another, we see some that run multiple workloads. There’s data gravity. There’s different ways to solve it and there’s different ways to look for it. And that again, comes down to oftentimes adopting best practices.

07:57 – 08:32

Andrew Psaltis

But it starts with, you know, when you look at it, I think where Tamara fits in there beautifully is, how do you have that clean data to go across? How do you get data going to multiple places? Right. So in some cases, it’s not. I want to run the same exact workload and multiple clouds. Oftentimes, we see customers that don’t want a duplication. They want the flexibility to move. And we offer technologies like BigQuery Omni to allow them to query data wherever it is across them. But oftentimes it’s I’m going to do A.I. on this cloud or analytics on this, on this particular cloud. And then I may run our infrastructure on cloud actions that.

08:33 – 08:57

Andrew Psaltis

So we see them taking that approach, and it really is understanding the journey. And I’d say before jumping into more than one cloud to be successful on what and understand it or understand the limitations and understand where it goes and the idea of I’m going to create just a generic platform so I can move everything from one to the other.

08:57 – 09:20

Andrew Psaltis

You know, that’s back to the same argument that we have in the industry. Years ago, I need to write everything to be database agnostic is going to change from Microsoft to Oracle, possibly next month. And the reality is you don’t you don’t just make those drastic changes overnight. So I’d say optimize for where it makes sense use of tools that make sense and then where workloads make sense on different cloud.

09:21 – 10:14

Andrew Psaltis

Then you look there, and oftentimes it’s also the same thing. You have data residency. Sometimes data is born in one cloud, and it just doesn’t make sense to move it, so operate under their share results possible. So there’s different strategies as customers adopt this multi-cloud approach. Sometimes it’s multi-cloud with different workloads, and we often see also hybrid cloud, where now it is. I want to run on Prem and then I want to do things in the cloud. And we see that across machine learning pipelines. We see it across monetization of SPARC applications to run on Prem migrated to the cloud. So we see different approaches that customers take. And there really isn’t a one size fits all. And then we’ve seen some that are dedicated. They’re just all in on one cloud and networks for their business.

11:16 – 12:01

Andrew Psaltis

Yeah. And I think that’s really good, and I think you’re spot on customers leap too. I don’t have governance on Prem and or it’s been weak, so this is great. Now I get to our data governance in the cloud and they and they jump to it. And I think one way that you can get people, we try to help people kind of start to think about it. Everyone’s moved homes at some point in time, and it’s kind of you can think about that same way. If you live in a house for 20 years and you go to move across country or across, you know, domestically or internationally, you’d be the same thing. I’ll say I’m just going to pack up everything I’ve had for 20 years, put it in a truck and everything will be great in this house that I move into. But the reality is that’s a complete waste you end up with, you know, garbage in, garbage out as opposed to let me go, do reduce spring cleaning.

12:02 – 12:26

Andrew Psaltis

Let me kind of get out all the things that I haven’t used in 15 years and then put it on a truck and then have. When you get there right, you end up with a clean inventory list and a clean way to unpack on the other end. And it really is trying to help people get out of the mindset of, well, we’ve always kept this data drain for what? But we’re not quite sure what we may need it.

12:26 – 12:52

Andrew Psaltis

Find archival. All right. You know, you haven’t used it in know a dozen years. You don’t have a regulatory reason. You know, and really, I think it also then comes back to go through that business use case. Governance has its place, but having a catalog of data that’s irrelevant or dirty doesn’t help. I think that’s exactly what you’re going to talk it through. If I go to run an algorithm over this data and the data is garbage.

12:54 – 13:21

Andrew Psaltis

Now, what do you? Right, so it’s thinking through and helping customers see that data quality has of paramount importance to happen first. So therefore, you could solve the use cases. You can add the catalog into it after the fact. But cataloging something that is dirty, if you will or relevant doesn’t get you there. People now could find something that’s no longer usable anyway.

13:22 – 13:52

Andrew Psaltis

I think we try to work with customers and try to work through of, you know, there is a process to do this of how to actually achieve the business outcome. But you can’t really skip, skip some of the steps along the way of really first thing of the year on the use case. Figure out something that’s compelling. Figure out the data that’s needed for that use case. Find the sources. Is it clean? Is it not massaged? Put it together. And then achieve these guys and then you can catalog it.

13:53 – 14:24

Andrew Psaltis

But I think people try to boil the ocean and sometimes start with I have governance because I never had it before and I needed. And they failed to think about the other steps. And I think it’s often times out they’re not used to having quality data in their legacy systems to begin with. Everyone that’s built any sort of data pipeline over the years, there’s so much of it that’s involved in just triaging and massaging data and having all sorts of ways to work around things that should just be cleaned before being migrated.

14:47 – 15:08

Andrew Psaltis

Yeah. So I think it fits perfectly well, and I think we’re on the same page of how it fits together.. Well, you look at it as customers that run, you know, data that we have for doing data processing that do machine learning. I think if you break it down, Tamr shows a great reference architecture of how to build an application like that that’s modern in the cloud.

15:08 – 15:54

Andrew Psaltis

I’ve had to take something and say, Look, we’re going to follow these best practices of Enrons using data product, and then it’s using machine learning on it, which training models and patterns and basically continue to integrate those. So it does show you start to look at it and you look at your documentation, the architectures you start to break it down. It shows a great use case for customers of. This is how you can think about a modern application. So it does leverage and runs in a customer. It uses our technology runs on top of it, you know, natively and very well. So it really does kind of show you could get data coming in. You could get there. It’s a big query. You can populated catalog, you know, really kind of integrate with everything on all around it.

15:54 – 16:27

Andrew Psaltis

All right. So follow best practices on platform, integrate what things are on the platform, which makes it seamless for customers. And there is part of the marketplace and so very easy. We see for customers to use it in naturally fits together, naturally integrates with the surrounding technologies. And then as you step back, you can also have an opportunity to use as a teaching tool for customers to explain this type of architecture is how you should think about building things, not just lift something that’s 20 years old and plug it in the cloud.

17:37 – 18:01

Andrew Psaltis

I think you hit its spot on when you said that it’s not just the migration, but it’s continuous records. And I think you look at it as businesses today are no longer static and everything’s changed. And I think this pandemic is showed of how people need to have agility, how businesses need to be able to adjust immediately, how they change, how they process information, how they need to look at new sources of information.

18:01 – 18:49

Andrew Psaltis

So I think the way that this fits in really well is it’s not just that initial migration, that initial workload. It’s great. You need to have a clean data to make that really successful and really shine. But then after that, there’s going to be, yes, continuous data that’s coming in for that use case. But undoubtedly, there’s going to be another source of data and it’s going to be another stream of data and they’re just going to keep going. And you just look across even the simplest use cases that customers start to try and figure out. How do I transform my business? How do I think about things differently? Everyone is now a data company or they’re changing to be everyone’s an A.I. company that all takes data and off the many different sources of data to produce a meaningful result. So I think that’s exactly right. It’s not just the migration, but we look at it as well.

18:49 – 19:44

Andrew Psaltis

Great. You have one data source today. You’re going to have 10 tomorrow. How then do you figure that out? You’re going to have customer information that’s coming from a multitude of sources. How are you going to understand is the same customer every time you see a data piece of data coming in? Yeah. So I think that’s right on of. And I think that chart shows it correctly that it’s there in that middle because every time data is flowing, it shouldn’t go through this process. It’s not a one time I am migrating or cleaning. Shut it off. I guess if your business doesn’t change, you know, for the next decade, there’s no new new data on that source for new news sources. Maybe that works, but there’s not a lot of businesses that are staying in that state. Everyone has transactions that happen. Everyone has new customers to be a thriving business. And as things continue to change, there’s many more data sources constantly coming online.

19:45 – 20:22

Andrew Psaltis

So being able to stitch those together, being able to identify the same entity, whether it’s a person, whether it’s a piece of machinery, whether it’s something you’re building, all the different things that you want to do to have a master record for whatever that entity is in a business, it becomes essential because you want to keep building on these use cases, you want to build you more and more advanced types of things. Data is the key and everyone talks about data is a new oil and is like clean data is the oil. Otherwise, you just have sludge.

21:08 – 22:00

Andrew Psaltis

First, find that business problem that’s that’s digestible. Then you can have a quick win with that will show results that will help you get to that next level. You get further buy in from the business.. You show results doing it. So how say start with that problem identifying that first? The data sources have a way to clean that data coming in using Tamr to get there and then look at what are best practices of how do you actually solve that business problem? So I would say break that out into identifying it. So you have tangible results for us. And then from that, figure out what data you need to solve that use case, what the source or most likely sources are, how do you get that there and then follow best practices for how to achieve that goal? And you know, I think it’s it’s a classic thing of there’s always next version.

22:00 – 22:57

Andrew Psaltis

All right. Not boiling ocean, but how to figure out to do exactly what you’re talking about, right? If it’s successful, you’ll get more investment to have more sources of data. So how do you show success if it is mergers that happen or you know, divestitures where now you get the leave as a new company that often gets the gift of legacy systems to take with you to start off a new company? How do you start to master stuff together? So I would say business probably always first. So you’re not just chasing shiny objects and you keep people honest. And then from that then start the back and go, What we need? What does it look like when it’s clean? Now, after that, what does it look like to build the applications, whether it’s analytics or A.I. on top of it, to then solve that problem? And as you grow, governance makes sense. But first, I think you have to be able to say like, you know, pour water through the whole thing and have the whole thing working and understand it. Show results.

23:34 – 23:38

Andrew Psaltis

Thank you so much for having me. Have a wonderful day and appreciate this.