datamaster summit 2020

Driving Transformative Data Projects with AI and ML

 

Jeremy Achin

Data Scientist & CEO, DataRobot

AI and ML are enabling a new paradigm for deploying truly transformational data projects. By leveraging voluminous real-time information and new algorithms, there is a promise of better and more efficient decision making and processes. Join this session, led by two data titans, DataRobot’s CEO, Jeremy Achin and Tamr’s CEO, Andy Palmer. Together they will discuss why AI and ML should have a front row seat in your next data project and what are the most critical best practices to establish the right technology, people and processes to drive project and mission success.

Transcript

Speaker 1:
Data Masters Summit 2020 presented by Tamr.

Andy Palmer:
Jeremy, it’s great to have you with us at Data Masters. We’ve been data brothers for so long that it’s natural for us to get together and chat. And really appreciate you being here, and deep respect for everything that you’ve built at DataRobot. And would love to kick off, just get your thoughts on the state of data science and where you think people are in terms of their ability to build and deploy models into production in the enterprise and in the government side. And maybe we could talk about some examples in the process.

Jeremy Achin:
Sure. I think there’s… First of all, thank you for having me. It’s good to virtually see you. And thanks for being a mentor throughout the years. I always get a lot of, when we get together I always leave with a bunch of takeaways that create a lot of work for my team. So thank you very much for that. And so I think the state of data science right now is it’s such that there’s still a tremendous amount of potential. And I’m not happy with the amount of progress yet though, in terms of putting it to use for real life impactful use cases.
In my opinion, I think both private sector and public sector are moving far too slowly. And I would’ve thought, if you would’ve asked me years ago where data science would be, I think in terms of excitement about AI and data and machine learning, I think that’s even higher than I thought it would be. In terms of real progress though, I think it’s a lot less than I would have thought there would be.

Andy Palmer:
So it’s like the promise is still alive and maybe even as big or bigger than we thought, but it’s taken us longer to get there.

Jeremy Achin:
Yeah. And there are plenty of examples where companies are able to, either for a single use case or even transform their whole organization and create almost a factory for turning out use cases. But it’s just rare. And I see too many companies get started out, organizations, not just companies but public sector as well, that get started out and choose the wrong use case, or they don’t have the right people involved and so they end up failing. And then that really sets them back, because there’s oftentimes skeptical people in the organization who think it’s too early to go down this path. And then so when the first few use cases fail, then they’re like, “Oh, maybe we should just wait a couple more years.” And so we’ve gotten really good at qualifying the use cases and the people who we work with.
You need both of them. You got to choose the right use case, and it’s not just about building model. It’s not just about having the data. It’s all the way, end to end from the raw data all the way to putting it, maybe even building an application that the machine learning is feeding into. And then the operational change or change management, oftentimes you put a model in production and it changes some process or some job, somebody’s job, and you have to go and you have to manage that. Otherwise, there’s so many ways that these things could fail.

Andy Palmer:
Yeah, it’s amazing. I couldn’t agree more. It’s like you have to have this holistic view of the data and all of the analytics and the AI from end to end. And at the same time it’s like we’re all putting together these best of breed ecosystems with our customers, and it’s really challenging. Can you talk for a second about the… We’ve always talked about Tamr and DataRobot as being complimentary. Tamr sort of more on the data engineering side and preparing the data broadly, and DataRobot on the modeling side and creating amazing models that use data. And the more data there is to use, the better the models can be. Can you talk a little bit about the difference between data science and data engineering from your perspective, and where you see things have gone right, in terms of the two functions? And it may be a little bit about organizationally how you see people setting up.

Jeremy Achin:
Yeah. I actually have trouble… I started the company more than eight years ago, and when you’re CEO for a long time you start just mostly doing emails. But recently, with the COVID, the pandemic, I’ve actually come out of data science retirement, and actually been doing some real work. And so what I’m about to say, I can speak it from real experience. I’d say data science without the data engineering, or machine learning without the data engineering, or vice versa, it doesn’t have a lot of value. And so I think you can’t just sit down and do machine learning without deeply understanding the data and working with the data engineers. Either being a data engineer yourself, you have to be willing to engineer the data yourself as a data scientist, or you need to work very closely with data engineers. And data engineers all the way down to the very raw collection of the data.
If you picture hospitals around the country, or in laboratories around the country, sending in data regarding the pandemic, all of those individual places might be sending it a slightly different way. And just trying to wrangle that down and turn it into something that’s even useful for data science, even that’s a lot of work. And so I think clearly it’s garbage in, garbage out. So on the data science of it, it’s just not really… You can’t really do much if you don’t have somewhat clean data. You don’t need perfect data, but it has to be somewhat.
And I think when it comes to data engineering on its own, there always needs to be a purpose. And it’s either something predictive or machine learning, or it could be business intelligence, but there has to be something there too. So I think data engineering, I think that it by itself can also [inaudible 00:07:09]. So it’s a perfect… And that’s why there’s very few companies out there that actually can work together complimentary. And it seems like everybody wants to be everything these days.

Andy Palmer:
Yeah. Right.

Jeremy Achin:
I think we remain like a pure complimentary play that works really nicely.

Andy Palmer:
Yeah. Well it’s amazing, too, how much work there is to do, both on the data engineering side as people had to figure out how to take their data and process it and get ready for consumption and use in all kinds of ways. And then they’re figuring out how to get through the process of creating, operating, and publishing models that consume this data. So there’s like these incredibly complex set of skills that are required in order to make this work. And it seems, I know we’ve run into this, I think you guys have too, where some of these organizations that are more traditional IT shops are unprepared, maybe, to deploy these kind of infrastructure and do this kind of work or at least need to be completely retooled. Do you see that to see that? Do you see these organizational challenges coming up a lot?

Jeremy Achin:
Yeah, I see actually it’s primarily organizational challenges. And so it’s no longer the technology that’s preventing progress here. Actually, there’s a common thing that you hear now, a rallying crying around the public sector, especially with new ideas, I think they’ve figured out everybody’s bought into the fact that the primary difficulty is around culture, changing the culture. It’s not really the technology. And I think the other major problem when we talk about data engineering and data science, for example, but there’s other. There’s software engineers, there’s dev ops. There’s the kind of like product people. I see too much where everybody’s in their silo, and so no one’s taking responsibility for the entire project. So you’ve got data scientists like, “Yeah, throw me, clean the data for me, throw it to me. Here’s my model, go figure out how to do it.”
And so the data engineers are like, “Here, I gave you the data, what do you mean you can’t build a model or you can’t deploy it?” So I think you’ve got to break down those silos. And what we’ve been trying to do is pull some of the data engineers and software engineers into the data science arc so they don’t have to replace the data scientists, but the more they can go deeper into the analysis, instead of just staying in their lane and tossing stuff over the wall, the more there can be crossing over in collaboration throughout the life cycle, the better. I think that’s where probably one of the low hanging fruits are in organizations, is breaking down some of those silos. And we’ve definitely seen some success with that.

Andy Palmer:
Yeah. Well, I know we’ve worked together a bit on the Air Force SEEK EAGLE project between Tamr and DataRobot. And as we’ve gone through the process of working with the Air Force, they definitely had this commitment to this very cross-functional kind of teams. And that organizational dynamic seems to have made that project relatively successful. And I know they’ve got a lot of really complex modeling challenges ahead now that they’ve started to clean up their data and are really looking at DataRobot really seriously as one of the methods they can use to go to the next level.
So what are the other key things you’re seeing [inaudible 00:10:57]. I’m anxious also to kind of… I’m trying hard not to dive into the COVID stuff too quickly, because I know you’ve been focused on that quite a bit. But what are the other things that you’re seeing in terms of success that your customers, that you feel like… Are there use cases that you think are the most impactful and the things you guys are most proud of right now?

Jeremy Achin:
Yeah, I think so. Can’t say what part of the government this is, but there’s several places where the primary reason for engaging with us is to up-level the entire organization. And I mean the entire organization in terms of their understanding of data science and AI and machine learning. And of course through that process, the best way to teach people about this stuff is to find use cases that are meaningful to them. So of course there are use cases involved. But I’d say we’re seeing more and more where, and this is a positive sign, by the way, more and more we see customers come to us, and it’s not like, “Hey, I have this mission I need you to solve.” We do get some of that too, we love those. We like the biggest, most important missions. But sometimes they don’t have a specific mission. It’s like, “Hey, we need to up level our full organization.”
And we’ve created a bit of a reputation for ourselves around that. And we have a whole DataRobot university with special courses designed for all the different people in the organization, whether it be data engineers or business analysts or software engineers, executives. So we even have like one to two hour courses for executives or the equivalent on the government side. Because we know they’re really busy and have a smaller attention span, no offense there but yeah, I think that that’s been great to see, and that’s some of the more successful follow on projects come out of that as well.
So you’re going to have some subset of the people going through these educational programs get really excited and want to go just roll up their sleeves and go tackle a use case, which is… And as I said before, it’s not the technology, it’s more organizational. So you really need some champions and some people who are passionate and who aren’t going to give up at the first sign of trouble. Because there will be trouble with these use cases, whether it be data or technical or legal or regulatory, or just organizational. The larger the organization, the more it’s like Game of Thrones, where everybody goes with it [crosstalk 00:13:45].
Yeah, when we… Actually, this is a real thing I’m about to say, this isn’t a joke. When we engage with an organization, public sector or private sector, and we really want to make them successful, and we think it’s going to be a tough situation, we create like a war room where we put the photos of the people involved up on the wall, with all the notes about them. It’s almost like taking down a drug cartel, or a Mafia.

Andy Palmer:
Yeah.

Jeremy Achin:
This person does not want this to succeed. What do we do about it? Let’s surround him with good people that are champions and so on. But literally, that’s what you have to do. And we do that very transparently with the leader, because sometimes even if a leader in an organization wants it to happen, it’s not automatic. I have relationships with some executives in some of the biggest companies in the US, where they use me as almost like a crazy person they send in.

Andy Palmer:
You look at changing, right?

Jeremy Achin:
Yeah. The first thing I’ll say to their people is like, “Listen, your boss’s boss’s boss wants change. He wants this. And you know what? I don’t care about anything. All I care about is making it successful. So you can come along with this and benefit from it, or, I’m not going to hesitate to go with you. If you cause trouble, I’m not going to hesitate to.” Which is a really interesting dynamic. Those are the funnest engagements, actually, when we have permission to be the force for change agents. I love that. But sometimes you have to do it more quietly and subtly.

Andy Palmer:
Yeah. But it’s amazing to say that. We have a customer at Tamr who, first time I met him, it was he and his CEO. This is a Global 2000 company and asked him what his job was. And he said, “Well, I’m a buccaneer.” And I said, “Well, what does that mean?” He’s like, “Well, buccaneer is like a pirate who’s got a mandate from the king. And my job is to go out and break all these stuff related to data and analytics and in the interest of change and improving things. And any one of the business unit leaders can bring me back. And they’ll probably yell at me a bit, but they can’t fire me.” And so I was like, this idea of a data buccaneer was like a really cool metaphor.

Jeremy Achin:
Love it. Yeah. That’s exactly right. You have to find some of those people. Yeah.

Andy Palmer:
Yeah. Well, and it’s amazing during, everything’s kind of up for grabs now with COVID and all this change. And it seems we’ve had a couple of examples where the dynamics related to COVID, it actually created sort of positive environment for change and just doing things differently and better because so many companies feel like they’re just under some sort of existential threat. They have to-

Jeremy Achin:
Yep, you see people who… Like I’ve talked to someone the other day from a large company. I’m not going to say what industry, I don’t want it to be tracked down through this is, but like a big industry in US, one of the biggest companies. And they said to me, they’d been introduced to me for the first time, by someone at the top. And they said, “For the first like six months that I came here, I was like excited and really trying to and then I just got beat down and I almost just kind of sat back and relaxed for awhile.” And so that happens in quite a few organizations. It’s like dogmatic thinking and bureaucracy is very present.
And what I’ve seen with COVID is when there is more urgency, whether it’s a government response to the pandemic, or if it’s a company trying to survive through the pandemic, right, you start to throw out these rules that you thought were there and the dogma goes the window, it’s like, “Wait, doesn’t matter. We can’t have that for an excuse anymore. We’ve got to move.” So I’d say I have seen that almost without exception companies moving faster because of COVID.
Let me actually, let me be very specific. Companies being willing to do different things in faster than they would have before. Now, one challenge that there’s many, I think that you’re with the people that are your attendees of this event, should be mindful of this, is that different people, even if you have permission to do something, you can have new routes to get things done, you need people to get it done. And I think that people are not all responding the same to the in-depth. So we’ve seen it all over the place. Some people just checked out and so you got to make sure if you want to get something done during the pandemic, you got to be honest with yourself about the people you put on that team that’s going to go get it done that they’re all still here in with it and ready to go get something done. So …

Andy Palmer:
Well, that’s great. So I know that there’s a lot of companies out there in the world that use SAS and we’ve talked about this from the beginning that I always thought of it as a bit of a plague on my world and all my users. Where do you think SAS is? I’m sure you’re replacing SAS quite a bit. What do you think the state of SAS is? S-A-S, the company not-

Jeremy Achin:
Yeah, no. Got it. Yeah, not one a, not two answers.

Andy Palmer:
Yeah, right.

Jeremy Achin:
So I was a SAS programmer for eight years before I started the company. That was my primary tool. So I’m intimately familiar with SAS and still have nightmares from it. No, it was a good tool. It was kind of the only game in town back then. That’s how they did what they did. And I would say one other thing that made it successful was it was seen as industrial strength and could be trusted. And I’d say, for Tamr and the DataRobot, we need to do that as well. If we’re going to be the ubiquitous choice and just the other, we need to be the trusted in the same way, SAS.
So I do want to give them credit for that. It’s like they definitely got themselves into the IBM category. Nobody gets fired for using IBM. Nobody gets fired for using SAS. Right. So definitely they had that for awhile. I think they just didn’t pay attention to like open-source coming. You know, I think they did some minor things with open-source that wasn’t enough. I think that they’re kind of overconfident about their grip on things. This is from a company perspective. But from a on the ground perspective, I haven’t seen, I’ve seen very few companies that are even thinking about keeping SAS around. Most companies have a mandate to move off of SAS. We’ve moved quite a few companies off of SAS. And it is again, it ends up being organizational challenges rather than human challenges, rather than technical challenges.
Because when you have people… We have a large bank that we’ve been working with for a while. And they have a center of excellence in here with about 40 people. This is for one department. And they’re building many models, it’s almost like a modeling factory. And they had been doing it for years of this following the step-by-step, building these logistic regression models and SAS and getting them off of that was very, very hard. So the management realized that that couldn’t continue. They said, “We can’t scale. We can’t build models fast enough. The models are not good enough. Help us.” And we went in there and helped and they’re trying to sabotage… I don’t know if the SAS sales reps were behind this or whatever, but literally the employees that were the SAS users, like trying to sabotage the work and everything. So we had to wind some of them over and then we had to expand from there.
But today, I mean, this is for the last two and a half years, there hasn’t been any SAS models there at all. All the models, hundreds of models, have been switched over to DataRobot. We’ve done the sigma teleco. So I just see everywhere and often done, let me tell you from our perspective, oftentimes they’re like, “Oh, well the machine learning part, yeah, DataRobot can do that. But what about the data part?” There’s a thing in SAS called the data stamp and so on. So I think that’s where good opportunity for Tamr and DataRobot to go in completely replace SAS.

Andy Palmer:
And it’s so exciting to hear you say that because we really… The full life cycle of data all the way from wherever it’s created, all the way through to where it’s consumed, like it really does require more than just one set of tools in order to do that. And so many of these SAS folks, I mean it’s like a mafia. We have the same challenge with some of the ETL tools and George from one of the other great companies that we work with, that does data movement was sort of saying how a lot of the traditional ETL vendors now are trying to convince people to continue to use their tools despite inability of those tools to scale, especially in cloud friendly environment and the dogma associated with some of these tools.
And I guess it’s fair because a lot of the people in those companies have built their careers around those products, but the sands are shifting and we see all the time that the cloud is kind of like the redrawing the lines. Tell me, can you tell me more about what you guys are doing with cloud and what you see in terms of your customers with regards to cloud adoption and who’s running cloud native, who’s running DataRobot as a service, like what’s the situation?

Jeremy Achin:
So we’ve had a DataRobot as a software for software as a service since 2013. So actually our first product was like software as a service cloud. It’s been running since 2013. We have hundreds of customers in this. We actually have one in Europe and one cloud in Europe for and then one in the US. We’ll probably launch one in Asia soon. But we also have on-prem and on-prem could mean truly on-prem. It could mean in the customer’s virtual, private cloud in Amazon or Microsoft or Google. So, because we are in this transition phase, we’ve had to be very flexible about for the customer. And I think that’s a big advantage, being able to handle the hybrid because I think customers are still trying to figure this out.

Andy Palmer:
They’re still kind of schizophrenic about it.

Jeremy Achin:
But I think where they’re landing is the ball. I think where they’re landing is they realize they’re going to have to move to the cloud. The last things that go to the cloud are going to be the most important mission critical, most private applications that use the most private data. For some companies are taking that risk now, but I’m still seeing a lot of companies hold back. And a lot of the stuff that’s going to the cloud is still either experimental or these kind of secondary use cases. They’re just kind of dipping their toes in. There’s a lot of companies that will publicly say that they’re fully moving to the cloud and have a mandate internally, as I saw our large bank do this. I won’t say which bank or which provider, because it might be obvious who it is but absolutely everything needs to go to the cloud, a single cloud vendor and that didn’t go so well, honestly.
They ended up in a state of paralysis and they now have added another vendor alongside this other cloud vendor. And so I think everyone will land to where they’re going to have multiple cloud vendors and they’re going to keep trying to move as much as possible to the cloud. But I think the clouds going to grow faster than expected. But I think that some of most sensitive and important use cases may take longer than expected to make their way to the cloud. But that’s what I’ve seen.

Andy Palmer:
But you think multi-cloud is kind of like a default, like everybody’s going to have a primary and a secondary or something. And?

Jeremy Achin:
I think so. I think so. And so I think that that means that the native tools in the ecosystem of Google, Microsoft and Amazon, those are not going to be the preferred choice in the long term because they’ll make it harder to switch from one to another. I think anything that makes it easy so just like… So for DataRobot, we say, “Hey, we want it to be easiest way to take your AI, your machine learning models and take them from on-prem and put them into the cloud, whichever cloud you want or move it from one cloud to another.” We were able to just unplug the whole thing from Microsoft, put it on Amazon, or be running things on multiple clouds and all of it registered through your data. That’s the way that we think about it.

Andy Palmer:
Well, you’ve been ahead of me on this one for a long time and at Tamr now, we’re announcing this week support for all three major cloud providers. And we’re also gearing up to do Tamr as a SAS offering and appreciate you kind of showing us the way, not that one.

Jeremy Achin:
Great, I’m happy to help however, I can there. We still have a long way to go and because we did it early, there’s still… We’re constantly evolving our architecture because a lot of the thing … Kubernetes and all the things like this were not around back when we built this stuff, right. We were really with Docker. So we actually invested super heavily in Docker early on to the point where it hurt us because it was still going through all the growing pains. And even once we finished it and we got to places like some of the largest banks companies in the US and they’re like, “Hey, can you make Docker less? Like we’re not allowing darker in our watch.” Like, “What? We just invested so much? Like really, come on [inaudible 00:00:29:04],” and then Kubernetes is here to help to solve all the problems with Docker.
We’ll see what comes next. Right. But you keep moving up this like layers of abstraction. It’s the, I know this is, you’ll probably cut this one out here, Andy, but I think there are major decisions to be made. This isn’t for your viewers. This is more for you guys. There’s major decisions to be made about architecture choice of what level of abstraction. And I think it’s like a company. It’s like the success of your company depends on making that the right decision on that. At what level of abstraction should your team be building? How do you invest your R and D resources? You know because even I feel very constrained with our R and D resources and I’m sure you do too. Do you want to build the world? But yeah.

Andy Palmer:
Yeah, you got to be focused and like I say, you have to pick those points. The old metaphor of the software, it’s like a ladder in quicksand and you got to pick the right wrongs. And we, in the right ladder, even, we bet on Mercer early on and was like a total disaster, like a complete waste of our time. But we’re really, sort of feel like at least with Kubernetes, things are kind of stabilizing a little bit.

Jeremy Achin:
That’s definite.

Andy Palmer:
So tell me, you guys have grown so fast. So how many people now?

Jeremy Achin:
I don’t even know. Over a thousand.

Andy Palmer:
Yeah. So I mean, what an amazing story.

Jeremy Achin:
I mean it’s like 11, 1100, stuff like that growing to… I think we have a few hundred positions open.

Andy Palmer:
Yeah. So tell me more about, as you look back over the last eight years since starting, and we both are practitioners that kind of turned into vendors, right. And what are your lessons as a founder/CEO, what are the most important things you’ve learned and your biggest takeaways over the last eight years?

Jeremy Achin:
That’s a deep question. And someone would write the book of my life or something. So yeah, I think one thing I believed in very early on and I still believe it is in working hard. And so I think it’s not really something I’ve learned, but maybe something that was confirmed is that I just always value hard work over anything else. Like raw talent or like credit credentials. I just put more value than most people do and just somebody who’s just going to work hard. And so we’ve worked very hard and like this past year, once the pandemic hit, I’ve been working here seven days a week and over a hundred hours a week. And I really appreciate the people who are come along that journey with me and not everybody can. I do understand that, but I’d say like if there’s advice or something, I wouldn’t start a company, if you’re not prepared to do that. I’ve learned probably too much to put in a concise answer for you.

Andy Palmer:
That’s sacrifice, right? I mean, there’s huge sacrifices involved to build something as big and as powerful as you’ve built, like really remarkable.

Jeremy Achin:
Yeah. Definitely. You got to find the right people to do it with you. You can’t do it alone. So I think, yeah. I mean, I have a whole list of things I need to improve about how I run the company and like how I manage my time and all that stuff. I’ve learned a bunch of stuff that I haven’t yet put into practice. I learned a bunch of stuff like watching, kind of taking a step out of my, by watching myself over the last eight years and being like, “Okay, here’s the things you got to change.” And then it’s really hard to change. When opportunities and threats are flying at you a thousand miles hour and you’re trying to grab all of them. And so I’d say, yeah, that’s probably, if I could go and tell myself something, I would be like, “Hey, no matter what, just stop catching all the opportunities and threats and trying to deal with everything yourself and go take a step back and really plan your time.” And I’m actually saying that to my present self, as well as my past.

Andy Palmer:
Well, Jerry, it’s amazing to catch up and thanks so much for being with us at Data Masters, and really looking forward to working together at the Air Force and many other customers going forward and congrats on all the success, really. At Tamr, we’re all huge fans of DataRobot, and we’re sort of in awe of what you guys have built and really, really excited to work together more. And thanks again for being with us today.

Jeremy Achin:
Likewise, big fans of Tamr and you and thank you again for the mentorship over the years. I’m here in Washington, DC, if you can see in the background.

Andy Palmer:
Oh yeah.

Jeremy Achin:
We are going to do some big things together down here. So looking forward to it. Thank you very much.

Andy Palmer:
Thanks Jeremy. See you.