Data Masters Podcast
September 23, 2020

DataOps and the state of digital transformation

Andy Palmer
CEO, Tamr

"Andy Palmer is a serial entrepreneur who’s helped found or fund more than 50 companies, including Vertica, a pioneer in the database management industry. He’s currently CEO of Tamr, a data mastering platform that uses machine learning to do the heavy lifting around consolidating, cleansing, and categorizing data. Andy’s also served as CIO of Infinity Pharmaceuticals, a biotech startup, and ran data and software engineering operations at the research group of pharmaceutical company Novartis.

Andy’s approach to data mastering involves dataops. This concept has emerged in recent years and shares traits with dev ops. He discusses what dataops is and how it helps companies with their digital transformation projects. He also talks about Tamr’s upcoming DataMasters Summit and how the event will provide people with practical advice on where to start their dataops journeys."

To find more information about DataMasters Summit 2020, including the complete list of speakers, and to register to attend, please visit: http://tamr.com/summit2020.

I'd rather read the transcript of this conversation please!

Anthony Deighton: Welcome to Data Masters. I'm Anthony Deighton, chief product officer at Tamr. Today I'm joined by Andy Palmer. Andy is a serial entrepreneur, he's helped found or fund more than 50 companies, including Vertica a pioneer in database management. He's currently serving as CEO of Tamr, a data mastering platform that uses machine learning to do the heavy lifting around consolidating, cleansing, and categorizing data.

Anthony Deighton: Andy's approach to data mastering involves the concept of DataOps. The concept that's emerged in recent years and shares traits with DevOps. And we're going to talk about DataOps and why it was developed. What its connection is to DevOps and how DataOps benefits organization. Andy, thank you for joining us today and welcome to Data Master.

Andy Palmer: Thanks, Anthony. Great to be here.

Anthony Deighton: So Andy, maybe we could start with you sharing a bit about your professional experience and the connection to data and how organizations make value out of data.

Andy Palmer: Yeah. I started working in AI back in the 1980s and as a good friend, Marvin Minsky used to say, there are these two things that really matter in AI systems. One is having enough great data to make an algorithm useful. And the other one is the human and the machine working closely together. And so for me, data has always been front and center in all of the systems that I built, and the separation between data and software I think is kind of an artificial separation. As we all experience on the modern web every single day, the software that makes up the modern worldwide web would be mostly meaningless without the data that sort of integrally woven into all the applications that we use as consumers on the modern internet.

Andy Palmer: Data is part of the lifeblood, part of the fuel that serves the modern consumer internet, and increasingly large companies are starting to realize that data is the fuel for the next few decades. That, it's a precious strategic asset, for the most part has been underappreciated and under-managed.

Anthony Deighton: You've often talked about this idea of being both the CEO of a company that's building DataOps software, but also you've been a customer of the software in your professional experience. You want to share a little bit about that?

Andy Palmer: Yeah. As chief information officer at a biotech startup called Infinity pharmaceuticals and then went on to run data and software engineering over at Novartis research group. And in both of those experiences, working in modern biopharmaceutical drug development and discovery, it was clear to me that data was the core strategic asset that was driving the ability of those organizations to discover new drugs that made a difference in the lives of patients, and managing that data strategically became an imperative, both at Infinity and at Novartis, and Vas who's the CEO of Novartis now talks about this on a regular basis. That data and analytics is a core capability of the next gen modern enterprise, not only in biopharmaceuticals, but really across the board for all industries across the global 2000.

Anthony Deighton: So would it be fair to say, and I think this is a theme that's come up in the Data Master's podcast. Would it be fair to say that, in a way every business is a data business, there's no longer solely building your competitive advantage on your manufacturing facilities or on your scientific accomplishments or on some piece of infrastructure that you have, at its core every business needs to build a data capability?

Andy Palmer: Yeah. I don't think this is any different than the industrial revolution, where back then every business had to integrate power and energy into the fabric of who they were as a company. We're at a sort of a transition now where every business across the planet, many are integrating data as a part of their core business. Many of them put it in context of their "digital transformation", but oftentimes those digital transformations really do start with the management of their data as a core strategic asset, and a primary driver with how they're going to serve their customers. Whether those are B to B or B to C customers over the coming decades, that those digital experiences are all based on that company's ability to manage data as a strategic asset and apply it every single day, in how they operate as well as the strategic decisions that they make.

Anthony Deighton: Got it. So really thinking about data as a core asset. So you have your buildings and your goodwill and your ad brand and your customer relationships and your data. And then building from that, if we think about data as a core assets, organizations have invested a lot of thought in how they manage their people, how they manage their physical assets, et cetera. And they haven't invested as much energy in thinking about how to manage these data assets. And that's where you've come up with this concept of DataOps. And maybe you could share a little bit about just the concept of DataOps at a high level, and then we can drill into a bit of detail.

Andy Palmer: You bet. This is no different than the last 30 years, where large companies had to figure out how they were going to integrate software into the core of what they do, in the evolution of their IT organizations and the emergence of the chief information officer is a key executive role. Now we're going to the next level. And the emergence of the chief data officer is a reflection of this importance of data as a strategic asset in the company, and DataOps is really not only a set of software, but also a set of business processes and a collection of expertise that enables organizations and empowers chief data officers to deliver more analytic and operational velocity using data across their entire business.

Andy Palmer: The reference in DataOps is very specifically to DevOps. So DevOps evolved in the software industry over the last 20 plus years, as a primary method for the big internet companies to increase feature velocity so that they could compete on the open internet, consumer internet. DataOps is really doing a very similar thing for the large enterprise, but the goal is to increase this operational and analytics velocity and decision-making velocity for large enterprise. And it really all does start with managing data as a strategic asset and building out the data pipelines required in order to serve that data to many different consumers inside of a large company with tremendous confidence that the quality and the veracity of that data is very, very high and reflects who their customers are, what their customers buy from them every day, who their suppliers are, what they buy from their suppliers, who their employees are and what their employees are working on. Some of these very basic questions are almost unanswerable in many modern enterprises because of the complexity of their organizational structures and the idiosyncrasy of all the systems that we've automated over the last 20 or 30 years.

Andy Palmer: And so a huge part of DataOps is bringing all that data together, aligning it, using machine driven human guided approaches so that you can answer some of these very simple, basic questions that serve as the foundation for much more complex analytics downstream as businesses get more and more data driven.

Anthony Deighton: I think it's very interesting, this connection between sort of feature velocity and how you compete at sort of internet speed as a core driver for the introduction of DevOps and really tightening that loop between what you deploy to your production systems and what you develop from a software perspective and create that same idea and concept and the connection between your data and then how that data is actually consumed through the analytical or operational outcomes that you're hoping to achieve. Is that a fair articulation of that connection?

Andy Palmer: Yeah, very well said. And this idea that your data just like your software with DevOps is continuously built, tested, and released. That it's not a static thing. People used to think of software as like a static product. It's really not like that. Software is ideally implemented and delivered as a service and DevOps was kind of all about that.

Andy Palmer: Same thing with data and DataOps that the best way to think about data as a strategic asset, to view it as a continuously changing thing. It's constantly sort of prone to entropy. It's constantly prone to degrade, my partner Mike Stonebraker likes to use the term database decay. The infrastructure that you put in place has to sort of recognize the physics of data and the realities that it does decay over time. You have to constantly be working to shape the data into the best form you can possibly deliver for end users. And the quality of that data really is what determines the quality of the decisions that a lot of people are making on the front lines of your business every day.

Anthony Deighton: I think those are some important connections between this idea of DevOps and DataOps. There's also an important organizational impact, when we think about DevOps from a software engineering perspective, we also think about the agile development methodology, really thinking about using feedback in a very tight loop between, what feedback you get from customers, what features you're delivering and how you guide and shift your development processes. Is there a similar sort of linkage or analogy between this DevOps and Agile and DataOps and Agile?

Andy Palmer: Absolutely, and modern DevOps... What they really need is a closed loop system where you're collecting feedback directly from the consumers of data on a regular basis about what's good and what's bad, and what's good and what's not. We have a component of our system and Tamr called Data Steward, which provides the function that lets you aggregate that feedback. I like to call it Jira for data. But in many organizations, this function doesn't exist. There's no place for people to go to say that this data is good or bad. And it's very similar to the feedback function that people are used to using in modern mobile apps. Where if they don't like something, they have a way to review and register, and provide input as to what's what's wrong and what's not working. We need the same function inside the enterprise for data where you have this bi-directional flow of feedback that's constantly coming in from users that helps guide and shape the data as it's being systematically prepared for consumers. So that, it improves over time.

Andy Palmer: Again, I'm an old AI guy, and so it's an active learning system at its best. Where you automate a tremendous amount of that feedback in the implementation of that feedback, into changes with how the data is prepared for people over time. But it's hard to do, like you said, it's an organizational challenge.

Andy Palmer: A lot of modern companies just are not set up organizationally to do this, but this is how we spend a lot of our time at Tamr, partnering with the biggest companies in the world, to help them manage their data as a strategic asset and implement these bi-directional flows of feedback throughout the organization.

Anthony Deighton: Let's pull on that thread for a second, in the regular economy we talk about the digital native organization, built from the ground up. A moderate organization as we think about DataOps, is this something that's only relevant for startups that are just starting out and they can build up from scratch, data infrastructure and be a data driven organization, or is this relevant even for large enterprises? And how do we think about that distinction?

Andy Palmer: Yeah, I think it's even more relevant for large enterprise, and because oftentimes they have more valuable data than anyone else. The average startup is just getting off the ground and probably hasn't had the time to build up some core asset of data. It's one of the big ironies of DataOps is, many of these large companies are sitting on treasure troves of unbelievable data, but it's not being managed or prosecuted very proactively. One of the human behaviors that we struggle with a lot, when we're working with our customers are these sort of core behaviors around data hoarding. Where the people that own the data or run the databases are sort of got their arms around the data. And they view it as a method to promote their own careers or they're just worried that people are going to get a look under the covers and not think the data's clean enough.

Andy Palmer: On the modern internet data wants to be free and sooner or later, it almost always is. Inside the enterprise, a lot of these behaviors around drawing boundaries around data, that is really resulted in what a lot of people today call data silos, is really limited the amount of opportunity that large enterprises have had to prosecute their data as a strategic asset.

Anthony Deighton: I love this idea that the silos of data coming out of an organization reflect the organizational silos, this kind of data hoarding, so like if I can keep my data private to me. There's a principle in software engineering called Conway's law that says that, the structure of the software reflects the organization that built it. And I wonder if we see actually at a similar point here, which is by providing a mechanism for breaking down these silos. We're also in a sense, lubricating the organization, allowing the organization to operate more efficiently.

Andy Palmer: I think it's dead on. I think we should coin the term right now, Deighton's law.

Anthony Deighton: That's very fair. I'll take it. So beyond just making the organization work more effectively and efficiently, from your perspective, as we help organizations build a DataOps culture, what are some of the other benefits that organizations could expect to see without that investment?

Andy Palmer: There's these core things that, if you have really good clean data, there's core behaviors that start to change inside of an enterprise. We have a lot of customers where when we first started working with them, they had misconceptions about how big or small their customer base was, what the distribution of customers looked like in terms of their size. Some of them really massively out of sync with reality. And so when you become data-driven, agile organization, you seek to understand the truth and you seek to understand the validated view of the truth. We see those kinds of companies getting better and better in terms of their strategic decision making, that it's almost like many of the functions that large companies have outsourced to the consulting firms like McKinsey or Bain or BCG for years are all of a sudden now, if they get they're sort of data feet underneath them and they have some new next gen analytic capability. They're like, well, why can't we sort of own and drive a lot of those things that we used to rely on Bain and BCG and McKinsey to do.

Andy Palmer: And, I think it's a very healthy thing for these big companies, because it sort of gives them ownership and control of the data itself and how the decisions are made and sort of minimizes the amount of frivolous consulting expenses that they have to pay. But more importantly probably empowers the people running those businesses, to control their own destiny a lot more. And to not spend a lot of time with outsiders, who are recent liberal arts grads to try and tell them how to run their business. Just because they can't get their hands on the data.

Anthony Deighton: I think that's a really idea. We might think that the connection for DataOps is really about adding more revenue streams and decreasing costs for mastering data and those sorts of things. But your point is no, there's a direct connection between DataOps and corporate strategy. But if you could actually create a DataOps culture in your organization, in way what you're creating is a responsive organization. One that can sense and understand signal from the market, from your customers, from your suppliers, from all of that inbound data and then react and respond appropriately. So you're actually creating strategic leverage in the business.

Andy Palmer: Yeah, exactly, you know big time. Oftentimes our customers get a little frustrated or confused because, oftentimes the way you implement your first DataOps projects is in context of either something that's going to save you money, oftentimes by doing spend optimization, help you grow faster by doing cross selling or upselling or reduce risk in your business, oftentimes, doing compliance projects. And so your first projects may be, something focused on one of those three things. But if you do these kinds of projects over and over and over again, eventually you achieve this state that your entire organization is much more data aware. And once they're aware that this better data exists and it's available for them to use, they become more data driven and inherently more analytical and better strategic decisions start to happen that are based on real immutable data. I think it's a pattern that we've seen over and over again in some of the biggest companies in the world.

Anthony Deighton: Yeah. So from data awareness to data driven decisions, to like a responsive organization, that's built on a data strategy.

Andy Palmer: That's right. Before long, it even infects the boardroom, where people start asking questions about where the data came from and are you sure it's clean? The former CEO of P&G was a very analytic... Kind of integrated analytics into the core of P&G. And that sort of behavior change still exists today in the culture of the organization.

Andy Palmer: Many organizations will come to this reality that the infrastructure that's available for them, now to do next gen data management and DataOps, is relatively inexpensive compared to what it was like to try and do this stuff even 10 or 15 years ago. Where you had vendors previously like Oracle or Teradata or Informatica, that tried to charge you outrageous sums of money to do even the simplest of things.

Andy Palmer: Now with cloud native solutions, like the one we provide from Tamr or products like DataRobot or DataKitchen, you can actually do a tremendous amount of amazing work at a relatively low cost. And I think this was the ultimate result of the... What my friend Christian Chabot, used to call the democratization of analytics and the enterprise, that the more democratized analytics became, the more hungry they are for data. And now we have to democratize the access to high quality data in the enterprise. And again, this is at the core of a Tamr's mission.

Anthony Deighton: Yeah. Following on to your prior comment about Deighton's rules or laws, I've had this perspective for many years, that there's a rule in enterprise software that whatever's happening in enterprise software, is what happened in consumer software five years ago. It makes it very easy to work in enterprise software, because you just need to go and look at what happened in consumer software five years ago and copy it.

Anthony Deighton: And very much to your point, the idea that cloud computing and machine learning algorithms that can operate at scale, against a highly elastic compute infrastructure on data that's sitting right there in the cloud. Well, those are things that have been true for consumer software in B to C advertising or in consumer websites for around five yeara. All we need to do is copy that approach and apply it to the challenge of DataOps.

Andy Palmer: Very much so. And I think this is exactly what's going on is that, when we were first starting up at Tamr. We started as an academic project at MIT. And one of the organizations that we connected with, as we were doing the academic research were the folks over at Google and their knowledge graph team. It was remarkable, the similarities between a lot of what we were building in Tamr and what they had built at Google knowledge graph, to deliver those simple info boxes that you get in the upper right hand corner of your Google search results now.

Andy Palmer: These highly curated forms of information. A lot of what we're doing at Tamr is really aligned with building out the curation infrastructure, so that every end user inside of an enterprise can get access to data as simply and easily inside of their enterprise, as they do when they go home and they use Google at night.

Andy Palmer: But it's a nontrivial problem, primarily based on what you referred to before, all of these behavioral challenges. In some ways the expectations that enterprise consumers have, which is much lower than they have when they go home in the modern internet. There are a lot of people that can probably relate to the idea that oftentimes when they go into work, they're not expecting to have the same kind of information experience that they're used to having on the modern web. And there's really no reason for that disconnect.

Andy Palmer: Quite the opposite, when you look at the amount of money that large enterprise spends on information and information technology, the experience that enterprise consumers have when they come into the office, should be as good, if not way better than what they get on the modern internet. But we've got a big gap to fill and a long way to go. We really believe that starting with great high quality data is the right beginning of that journey to digital transformation.

Anthony Deighton: Yeah. This is a really nice idea, this idea that if we can do what Google did for the consumer web, but for enterprise data, so structured data, and give people full access to it, clean it, normalize it, cleanses, categorize it and make it available for decision making. That would be of tremendous value. People would have an experience at work, which was as good or better than their experience with the consumer web.

Anthony Deighton: I'm sure anyone listening to this is wondering to themselves sounds good, but how do I get there? And I know you're going to be speaking about DataOps at the Data Master Summit in October. Maybe give us a quick preview of what people should expect to hear when they attend the Data Master Summit in October.

Andy Palmer: You bet. We're going to spend a lot of time talking, not only about technology and the seven major components in any modern, open DataOps ecosystem. But we're also going to talk about the personas associated with the next gen data organization, the consumers, the folks that prepare the data, as well as the data suppliers. And finally, we'll talk about the processes and the methods that you use, and you need to put in place in order to do DataOps at large scale in a modern enterprise. And so, all of these are necessary in order to do DataOps really well. And we have some great folks that are real practitioners in the industry that are practicing many of these things have been for years, maybe didn't call it DataOps initially, but now we're calling it DataOps.

Andy Palmer: It was kind of a name now for what they've been doing, because it just made sense to them. And we've really tried to focus with Data Masters on delivering a platform for practitioners to talk about what they're doing with real data every day, in the modern enterprise. As you said earlier, I've been a consumer of these kinds of technologies and implementer of technologies inside of the large enterprise, more times than a seller of these things for the last 30 years. I'm really excited to listen to all of our trusted colleagues and good friends talk about what they're doing to implement modern DataOps in these large companies. It's really an exciting time.

Anthony Deighton: So what you're saying is, it's not only going to be an inspirational event and summit, where people can understand the vision and the strategy, both of Tamr, but of DataOps in general, but also relentlessly practical with a real practitioners view. Is that fair?

Andy Palmer: Absolutely. I think that's exactly right. And the balance in the enterprise, in order to get stuff done is, you have to be pragmatic. You can't really afford these big boil the ocean IT projects anymore. The world just doesn't work that way. History has shown that, if you have projects that you scope out that take years to complete, it's likely to fail the whole... Before data mastering at scale, there was a master data management in the enterprise. I think that big multi-year projects characterized traditional MBM and almost all those projects failed. I know the ones that I did, failed miserably, as we were trying to build the one schema to rule them all.

Andy Palmer: The important way to embrace DataOps and to start to make the strategic change inside of your company, to manage data as an asset is to start with these very pragmatic methods and very, very straightforward projects that are probably measured in weeks and months, not quarters and years.

Anthony Deighton: So, I really love this idea of this practical approach and stealing from the concept of DevOps to create these short cycle times, when we start showing people value in days and weeks, not months, quarters and years. What's one key takeaway that you would expect an attendee for the Data Master Summit to take away.

Andy Palmer: A lot of the people that we're bringing together are there to help all of their colleagues from other companies get started on the DataOps journey, and really begin to manage their data as a strategic asset as the first step in their digital transformation as a company.

Andy Palmer: And so, you should walk away from Data Masters with very pragmatic, very practical advice, on where to get started, on the things to do. And also probably a few warnings about pitfalls and things to avoid doing. That are mistakes that other people have made, they're going to share so that you don't have to make those same mistakes.

Anthony Deighton: Well, Andy, thank you for joining us on Data Masters. Thank you for a fantastic conversation, and I look forward to hearing you at the Data Masters Summit.

Andy Palmer: Thanks, Anthony. I look forward to the academic paper on Deighton’s law.

Suscribe to the Data Masters podcast series

Apple Podcasts
Google Podcasts