DataMastersPodcast

DataMastersPodcast

Episode 7 — released July 15, 2020 • Runtime: 28m57s

An Introduction to Dataops

Chris Bergh

Chris Bergh

Head Chef, Founder, CEO at DataKitchen

Data ops has emerged as a way for organizations to better manage their data pipelines. You’ll hear more about this concept from Chris Bergh, head chef, founder and CEO of DataKitchen, a software company that aims to help data professionals regain control of their data pipelines. He’ll talk about how his experiences in the software industry lead him down the data ops path, define what data ops is and why companies should give data ops a try.

Transcript

Chris Bergh:

If I’m in a stick mud I would be worried if I’m a data and analytics professional. In the carrot mood I actually think that we have a real opportunity to fundamentally change how teams work.

Nate Nelson:

Hey, everyone, and welcome to the Data Masters podcast. My name is Nate Nelson. I am sitting with Mark Marinelli from Tamer, who’s going to introduce the interview guest and the subject of today’s show. Mark, how are you doing?

Mark Marinelli:

I’m doing great, Nate. Good to talk to you again. Let’s get on with the show here. Data ops is something we’ve talked about an awful lot. It’s emerged in the last few years as a way for organizations to better manage their data pipelines and organizations. In this episode, we’re going to hear from Chris Bergh. He’s the head chef, founder and CEO of Data Kitchen. They’re a software company that aim to help data professionals regain control of their data pipelines. He’s going to talk about his experiences in the software industry, across the board, and how those experiences led him down the data ops path. He’s going to define what data ops is and why companies should give data ops a try.

Nate Nelson:

Okay. Here is my conversation with Chris Bergh.

Nate Nelson:

The subject we’re going to be talking about here, Chris, is data ops, but before diving in, I want us to provide some context. So let’s talk about where data was a decade, even half a decade ago. What didn’t we understand then that provided an impetus for a new approach to data management?

Chris Bergh:

Yeah. So why don’t I tell that in terms of a story of my career? I’ve spent many years working in software, both as an individual contributor and research labs and managing teams. And then about 2005 I switched over to the world of data and analytics. And I actually had to explain to people what it was. I had to take a shorthand for, “Hey, it’s just charts and graphs.” So it wasn’t like now where analytics are in sports, or you see advertisements for big data or AI on TV shows. And so the world’s changed a bit in the last 15 years. And my experience in running those data and analytic teams for years is having someone who does what we now call data engineering or data science or data visualization working for me, was that my life was a bit frustrating because I often found that things weren’t going right, that the data that was provided to me was broken, that a system went down, that someone put some new things in the production that broke.

Chris Bergh:

So dealing with the inevitable phone call from pissed off customers, that something went wrong was really frustrating for me. And then also the pace at which I could change something once it was in production was also really a challenge. And so it wasn’t about having a new tool or it wasn’t about… it was about the speed at which my team or myself could kind of create something and whatever tool and get it into the hands of our customer to get feedback. And it was taking weeks or months to deploy things. And then finally just the pace at which I wanted to innovate and the team wanted to innovate. So this sort of feeling of frustration at the slowness, challenging with the errors, and overall desire to do better led me on this path to look at factory methods to improve quality, how software had fixed its own deployment and waterfall problems, and just how teams collaborate in data and analytics.

Chris Bergh:

And then my co-founders and I started this company, Data Kitchen, about six or seven years ago. And we had this idea, we didn’t know what to call it. We called it Agile Analytic Operations. We called it Dev Ops for Data Science for a while. One of our guys wanted to call it “Agile-ytic Ops.” And I’d read an article from Andy, who’s the CEO of Tamer, and he had this term data ops. And we finally settled on that as the right term cause it was short. And so that’s where we started to talk more about it. We wrote a manifesto we’ve been writing and finally wrote in a book and now people are talking more and more about data ops as a thing that they should do.

Nate Nelson:

For folks not familiar with the concept, could you provide a definition of data ops?

Chris Bergh:

Well, I think it’s a set of technical practices and architecture patterns and cultural norms that focus on four things. And I think the first is really the speed at which you can kind of get things from your brain as a data scientist or an engineer into the hands of your customers, from fingertips to monitor of your customer, and making that fast so you can innovate quickly. And then second is trying to reduce the amount of errors that come from data quality, your production errors. And then the third is being able to think of how you work together as a data and analytic function, centrally distributed, [inaudible 00:05:03], and collaborate. And then the fourth is really focused on measurement. How do you measure your process to improve? And I think the North star in all of this is that you want to make your customer successful. And so the idea of these agile methods, being able to strike the balance between the chaos of everyone running everywhere and the lockdown of people going slow and that sort of happy medium between rigidity and freedom. It’s prevalent in software, it’s prevalent in industrial manufacturing, and I think we’re trying to strike that same customer focus idea and balance between lockdown and chaos.

Nate Nelson:

And what exactly is required to make data ops work?

Chris Bergh:

Well, I think there there’s some assumptions and the first is that if you’re going to do, as an individual contributor, something with data, you have a wide variety of tools you can choose from. If you’re more technical, you can write Python or write SQL. You can use tools that have very nice graphical UIs. There’s a whole market for ETL or ELT tools, for data science tools, for tools that actually build charts and graphs and all those things are great. And so it’s a different perspective and it’s not about replacing those tools with either any super tools or changing how you work. Because people love their tools and their best able to express. Like, I’m a SQL Python guy, but other people like to use graphical UIs and like to do things and MapReduce jobs and that’s fine. And so what it is, it’s about being able to think differently about how you use all those tools and see them as a system and being able to take that system and look at it from a perspective of “how can I iterate faster? How can I deploy faster? How can I get more feedback from my customers while reducing my reducing errors?”

Nate Nelson:

So that covers more of the technical side of it. What are the more intangible skills and approaches required to do data ops right?

Chris Bergh:

Well, I think it starts with, first of all, a change in perspective and a realization that you can reclaim some control of where you are in the organization. And so a lot of data and analytic teams we talk to are a bit downtrodden, to be honest, because they’re caught between this rock and hard place of like, “Hey, my data providers gave me crappy data” and. “Hey, my boss expects me to deliver a new insight the next day, like Amazon.” And then “my organization has taken the data and analytic functions and sort of sprinkled it around and the business lines are in IT,” and “how do I make this all work?” And so that idea that you can actually go fast, deliver new insight, that you can be able to do that with very, very low errors. So your customers trust the data and third that you actually can work across this big distributed set of tools and people, that’s the first realization, that you don’t have to live with the sort of crummy chaos that you have now.

Chris Bergh:

And there’s a method in a way to get out of that. And so I think that’s the first step is that you can reclaim some control of your analytic projects and stop them from failing and stop you sort of feeling bad about your work. And then the second is exactly what should you do to do that? And a lot of times what we do is we talk with customers and say, “okay, you’ve got all your data and all your tools fine, but how long does it take you to deploy something from your data scientists into production?” And we hear months of time and then we ask them, “Well, how often do your customers trust the data? How often are you getting errors on that data or something being late?” And oftentimes they say it’s dozens or hundreds of times a month.

Chris Bergh:

And it’s crazy to me that people live that way. And then we sometimes talk about the Hatfield and McCoy between the self-serve people that do self-service analytics and the central IT team. And so trying to find where you start in this, what piece that you want to focus on Do you want to focus on improvement customer data trust? Or do you want to focus on getting more feedback from your customers and iterating quicker? Or do you want to focus on helping collaborate across the whole supply chain and that’s where people should start? So it’s a very different perspective. It’s not about “I need a new tool or I need more data.” You need to focus on the factory that things work. And then by focusing on the factory that creates analytics, focusing on the system at which you do data science and data engineering, you can then kind of reclaim control of your destiny as an analytic team.

Nate Nelson:

Could you speak to the consequences from a business standpoint when your customers don’t trust your data?

Chris Bergh:

Yeah. And I think data trust is big, right? Because what are we trying to do in doing our data work? We’re trying to influence people into making decisions based on data. We’re trying to help them become a data-driven organization. And the biggest challenge is when you don’t trust the data or, more specifically, you don’t trust the data and the team that creates the data, then business users and customers, they’re just going to go follow their intuitions and you get suboptimal results. And so if you believe in a data-driven world and that data can make a difference and help people with their daily lives and their business lives, then helping people to be data-driven is important. And then trusting that they trust your team and they trust the data that your team produces is essential because there’s a lot of reasons why people want to trust their gut before the actual facts that are in front of them.

Chris Bergh:

There’s a whole area of inquiry in how to get people to be more data-driven adopt insights. And there’s sort of a whole literacy program around data. But if you want, assuming that people are literate, one of the greatest ways for them not to follow and take insights from data is to attack the messenger, the team who delivers it or attack the message, the data. And so I think to do that, you’ve got to be able to build a system that produces data of very, very high quality with low errors. And you also have to be able to respond at the speed of business when you present a great insight and your customer asks you 10 followup questions, which is not a reason for you to walk out of a meeting with your shoulders down, it’s a reason to feel successful. Or when you produce a great insight and your customer says, “fantastic, can we put that in production to my 5,000 sales reps next week?” And in both cases I’ve seen and witnessed people and teams, who’ve done great work and had success, walk out of meetings with an “Oh crap. Now I’ve got to industrialize this. I’ve got to put it in a factory, or now I’ve got to try and do 10 more things.” And those are really marks of success. And instead they’re challenges and compound the powerlessness that a lot of people in data and analytic teams feel.

Nate Nelson:

Do you find, typically, that people already know what data ops is, or that you generally have to educate them to some level.

Chris Bergh:

Well, the engineer in me likes… we’ve spent a lot of time trying to define a very precise definition of data ops, right? And we wrote a manifesto and a book and a talk on it. And so it’s the engineer in me likes good precise definitions. And the reality is a lot of people don’t know what data ops is, or they’re using it in a different way. There are some vendors who use it as a renaming for ETL or a halo term on top of their existing tools. But I think the most important part is that people are talking about ops and they are seeing that the changes and the benefits that have happened in software development by adopting agile and dev ops principles, that one side of the organization, in some of the companies that we work with, have already started to do dev ops, already run in an agile way.

Chris Bergh:

And they want their data and analytic teams to work in the same way and to achieve those same benefits. And if you can deploy new website code in a day, why is it taking you three months to deploy 20 lines of SQL from your dev environment to your production data warehouse? Why do you have 10 predictive models backed up and only one has gotten into production? And so I think the… Well there’s lots of different meanings of the term, and perhaps that’s what the tech industry always does is it sort of mutates terms in different ways. I’m really happy that people are talking about it because, three or four years ago, when we were talking about these ideas where we would go to conferences and give out wooden spoons and wear chef jackets, and people looked at us like we were aliens from another planet. And having some name recognition around data ops is helpful to… Even though we don’t go to conferences anymore, because they’re all canceled, but having more and more people talk about it, I think is good because I do really think the operational side of analytics has a lot of value and it’s not just for lesser beings. It actually is really important and can liberate people’s creativity and help them reclaim control of their analytic destiny.

Nate Nelson:

So Mark, Chris is giving us some idea of it, but what does the modern data ops ecosystem look like on a sort of broad, big picture?

Mark Marinelli:

Yeah. The big picture is a big conversation, some of which you just had, and it could be its own thing, but if I distill it down in a nutshell, it’s a collaborative group of data specialists leveraging loosely coupled best of breed tools to rapidly deliver comprehensive, accurate data, to those who are going to use it the best. And that’s in contrast to a monolithic platform used by an insular technology team that is always trying to keep up with the demands of less technical data citizens who just want their data. That kind of how we’ve been doing things before. Think of a contrast here of like a NASCAR pit crew versus a locomotive repair yard, right? A NASCAR pit crew is all about nimble fast delivery backed by this rapid response team of dedicated specialists. They each have their own task, all optimized around getting moving in whatever direction we need as quickly as possible. And I think that there’s just a big breakthrough here with technologies, with the skills that we can bring to bear, with the ways that we’ve decided that we need to work together, that you pull it all together. The people process systems that are involved there, there’s your data ops ecosystem.

Nate Nelson:

What’d you say data ops is more geared towards helping the IT side of data or the business side of data?

Chris Bergh:

Well what I think is that the process of doing work and getting insight from data is kind of being broken up into different parts of the organization. You have, IT, your more technical people, and they have a set of technical skills that they’re using and tools. And they’re actually building things like data enablement teams and doing data unification and master data management, building data lakes, and data warehouses. And then some organizations have large self-service teams that are embedded, and they’re trying to help give insight to business people. Maybe they’re using Tableau or Trifacta or Looker, or some data science tool like Alteryx, and so they’re all doing similar things with data, taking data, integrating it, visualizing it, putting it together. The customer’s at the end of this value chain, right? A centralized IT group that provides data, a distributed group that does visualization data, maybe a third group in some other part of the company that does data science. And when the user says, “this looks weird,” the business user looks at the data and goes, “this is weird. This isn’t right.”

Chris Bergh:

This red alert fires up in these multiple organizations, did somebody configure Tableau wrong? Is the model off? Did somebody put the data in the data warehouse, did the MDM team get it wrong? And you have these sort of firefights that happen across different parts of the organization. And so that leads to distrust. It leads to business people wanting to hire outside consultants instead of using their internal teams. And it leads to the feeling of these teams that when they come in in the morning, this… Like the feeling I had of sort of dread of “Oh no, what happened today? What went wrong? Is it my fault? Is it our team’s fault? Some other teams fault?” Finger pointing and blame. And I think you can… the ideas in data ops allow you to get away from that and identify if there is a problem before it gets to your customer, or if it happens after your customer, where did it happen? And allows you to fix it quicker, deploy it into production quicker so that you can respond and learn from that and also put in tests and monitors to make sure it doesn’t happen.

Chris Bergh:

And I think so whether it’s a self-service team or a data science team or an IT type team, all of which need to follow these principles, because at the end of the day, they have a joint shared process that’s distributed across the organization. And it’s a process that actually is governed by code that they write and maybe their code is Python, or maybe their code is embedded in a Tableau workbook, but they’re all sort of dealing with the complexity of running a factory, but also doing a software engineering-type role.

Nate Nelson:

All right, Chris, I have a challenge for you. Plenty of companies, plenty of people out there are quite content with their data operations and how they’ve been going for years now. How would you pitch somebody who’s already pretty comfortable with their data setup on why this data ops approach is necessary or would at least help them?

Chris Bergh:

That’s a great question. And it depends on my mood that day. So if I’m in a carrot or a stick mood. So if I’m in a stick mood, I would be worried if I’m a data and analytics professional. We’ve gone through this big boom period, we’re entering in a recession. A lot of, in fact, 60 to 80% of data and analytic projects have failed in some way. Over half of them aren’t perceived as either giving value or having some challenge. And so in that case, when budget cuts and you’re not delivering value, you may be in trouble. Our data and analytics field has gone through such growth. We may be in a period of retrenchment. And so focusing on the value for your customer instead of focusing on building products, instead of just working arbitrary projects is essential. And so the way to focus on value on your customer is to follow an iterative development process. That’s with a technical environment that does data ops.

Chris Bergh:

And so in the carrot mood, I actually think that if you look at the success of American Motors versus Toyota in the eighties, and industrial manufacturing and applying true up lean and sort of total quality management techniques, you look at the success of Silicon Valley companies and how they’re able to iterate the success of Amazon and the two pizza teams and these ideas of agility in dev ops and software. And you look at the success of some teams doing data and analytics that I think we have a real opportunity to fundamentally change how teams work, have them be to be able to not live with chaos, where they’re dreading coming into work, or people are quitting and moving on and not live with sort of a molasses-like slow process to change or to react. And I think both those things are true. It’s the carrot and the stick. And also, I think it’s interesting how much things have changed in the last two months with, with COVID is that a lot of people have been able to react and change and be very agile. And I think the essential idea here is that organizational agility is really important. And that doesn’t mean that data is different. Data is just another artifact that needs to apply these rules of these benefits of being agile and changing quickly and not killing your team in the process.

Nate Nelson:

Mark, to your mind, why is data ops an effective approach for data management?

Mark Marinelli:

Well, data ops is all about speed of delivery without compromising either the coverage of the source data or the quality of the output data. And that’s new and different. Historically doing data management at scale required painstakingly slow projects, often limited to a subset of the most accessible data because it was very labor intensive and very expensive to do much else. So that slows you down if you wanted quality data. Or you could run fast and loose, but know that your data were not going to be accurate enough for anything but really rough analytics. Qualitative stuff, not really key quantitative decisions in your business. So that was the trade off we had to make. But through a combination of technology-enabled automation, specialized skill sets, and specialized tools throughout this data supply chain, effectively we can process 10 times as much data and do it in a 10th of the time with a 10th of the people. And so there’s really no going back to year-long projects, which might someday produce value, when every month or every couple of weeks, I can get new trustworthy data that, that everybody needs.

You’ve already hinted a couple times that being in the data space, it can be a bit dejecting, especially if your work isn’t quite seen as up to par. Chris, what do you hear around this space from people? What challenges are they facing?

Chris Bergh:

Well, I can tell you my own life experience, right? So take a day in the life of Chris Bergh circa 2008. And so I worked for a company that did analytics for health care, and we had 10,000, 10 to 20,000 users of our analytics from sales and marketing to CEOs. And then we were constantly trying to deliver new insight. And I was the COO of the company, you kind of had to make the trains run on time. And my boss was a doctor who was actually very good at creating ideas and insight. And so he would go off and talk to somebody, talk to a big SVP and come back and “here’s an idea for an analytic.” And I’d go off with a data scientist and a data engineer, and someone does this and we’d whiteboard up and I’d go up, come back to David. And I’d say, “David, wow, we can do this. It’s totally going to take two weeks.” And he’d look at me with his Harvard MDIs and say, “Chris, that’s going to take two weeks. I thought that should take two hours.” I’d walk out of my office with my tail between my legs and I can’t go in it.

Chris Bergh:

Or I walk into my office and I get a phone call from one of our customers. And they’d say, “Chris, the data’s wrong. If you don’t fix it, you’re out.” And so I’d be very depressed. And then we hired a whole bunch of smart people and they wanted to use their new tool. “Chris, I want to use this tool. I want to try this open source library. I want to innovate in this way.” And so for many years, I was just beaten up on the side of the head of “go faster, but don’t do anything wrong. And let me try some things.” And as I tell these stories, I hear different people have different forms of it.

Chris Bergh:

Like one of my favorite stories is the manager of an analytic team on a Saturday morning during his daughter’s birthday party, sitting on the edge of the tub in the bathroom, trying to fix a code, fix a bug in a data pipeline, or maybe a report because he had just gotten an email from the CEO. And I just find that depressing. I find the fact that we assume that that’s the way that we have to work as a data and analytic professional is that we’ve got to be in the bathroom during our daughter’s birthday party. And I just think there’s a better way. I think you can build systems that you can know before you put it out to production that it’s going to work and you can live a less chaotic and more innovative life.

Nate Nelson:

All right. That was my interview with Chris Bergh. I’m back here with Mark Marinelli. Mark, Chris gave his arguments. Why do you believe it’s time for organizations to embrace data ops?

Mark Marinelli:

So I think Nate now is the time because of this nexus of a huge upsurge in the demand for good quality data, that’s been going on for awhile, but more recently there’s breakthrough new technologies like AI and ML married with infinite resources of cloud computing. We’ve got all this [inaudible 00:34:13] there and we’ve got tons of work to do. And all of this is going to go untapped unless we collectively adopt new ways of working with these technologies to broker these data. Traditional waterfall methodologies, traditional software development, lifecycle type approach to this stuff. It served us poorly for the last couple of decades. And now there are attributes of this new tooling that allow us to work differently. So just as we look back at SAS company software, service companies, they changed the way that they built software so that they could deliver compelling new functionality rapidly. We can, at the same time, adopt these new technologies and methods of working to deliver compelling new data at the same speed and scale that we’ve seen with the big SAS companies. But only if we move beyond the traditional tooling and process to more modern frameworks like data ops.

Mark Marinelli:

Data ops is also about choosing from among the best of breed technologies to suit your needs. That can be pretty daunting, but with new companies being funded every week to solve specific specialized portions of the problem, even though it’s hard maybe to get the right one, the optionality that you get from being able to take these best of breed, loosely coupled technologies, put them together and rip one out as soon as another one comes along, that’s better, is that’s a really core principle of data ops. And it’s something that really should be leveraged and is new and novel beyond the monolithic, single platform approaches that we’ve seen in the past.

Nate Nelson:

Okay. Well on that note, thanks to Chris Bergh for speaking with me and thank you, Mark for your insights.

Mark Marinelli:

Thanks Nate. Great talking to you.

Nate Nelson:

This has been the Data Masters podcast from Tamer. Our next episode will release in two weeks’ time. We hope you’ll tune in then. Bye for now.