- Episode 6, Season 2
Trifacta CEO, Adam Wilson: How To Help End-Users Truly Be Data-Driven
DATAMASTERS Podcast SUBSCRIBE
We are in the midst of I think a generational shift that’s happening right now around all things data, the data and analytics have now become the frontier for competition which is I think different than where it was 20 to 25 years ago.
Hello and welcome to DataMasters, I’m Anthony Deighton. Today we’re joined by Adam Wilson CEO of Trifacta. Adam has more than 20 years of experience in leadership roles focused on data, integration and analytics. Since Adam joined Trifacta in 2014, Trifacta has become the global leader in data wrangling serving thousands of customers worldwide with data transformation solutions. Adam has an impressive history leading successful and innovative technology companies, prior to Trifacta Adam was co-founder and chief operating officer at Zimba. Adam and his co-founder sold Zimba to Informatica where he continued to play several influential roles including VP of products, SVP of product management and marketing and SVP general manager of the application information lifecycle management business unit. Adam holds an MBA from the Kellogg School of Management and a bachelor of science and engineering from Northwestern University, where I also happen to be one of his classmates. Adam, great to talk to you again, thrilled to have you on the podcast.
Hey, thanks Anthony. It’s great to be here and it’s great to reconnect after all these years.
Exactly. Let’s go Northwestern. So maybe we could start with a bit about your background and how you found your way into a career in data and data management. You’ve clearly been a major sort of force in the industry both at startups at major companies like Informatica and now again at Trifacta where you lead the ship so how did that come to pass?
Yeah. I mean, luck and timing I guess are the big movers and shakers in the universe and I’ll actually take us back to campus for a minute, I describe this as the 30 seconds that changes your life forever. I was sitting on campus doing an interview, first round screener interview with Anderson Consulting it’s now Accenture, and there was a form to fill out back when you used to fill out forms by hand and at the very bottom of the form, the very last question was, where do you want to live? And I just wrote San Francisco, Chicago, Boston, not a stack rank just three cool towns. I had no idea if they were going to offer me a job, I had no idea if I even wanted to work doing consulting and next thing, four rounds of interviews later, I get a phone call and they said, “Hey, we’re sending you to San Francisco,” and I said, “Why?” I said, “You’re headquartered here in Chicago, why would you send me to San Francisco?” They said, “Well, on the form that you filled out you said San Francisco is your first choice.”
I was from Illinois and they were headquartered there and I just figured if I decided to work here I’d work in Chicago and next thing I know I’m on a plane to San Francisco and they offered me a job and I moved to California. And that was the beginning of a very much a love affair with tech and Silicon Valley and was really my entree into working with a lot of companies. At the time it seems crazy to think about it but we were going into really large financial services, healthcare and insurance companies and explaining to them that the internet was going to be a huge deal and they should probably have webpages. And so in building out all that infrastructure for them one of my first projects was taking kind of the patient handbook for Kaiser and making sure that it was available on their website which we were also building out, so it’s amazing how far things have come.
But I did that for a brief period of time for about a year and a half or so and then I got the itch because I felt like I’m recommending all these amazing companies and all this amazing technology and maybe I should actually try my hand at creating one of these companies and so I got together with a couple of co-founders and we decided to start a company called Zimba. And sometimes you purposefully get in over your head and you sort of figure it out as you go and at that point being a couple years out of school it just felt like the time to roll the dice and go have an adventure. And so we started the company and the first idea which didn’t work out kind of failed miserably was we were going to do connected address books and so we were thinking as consultants.
We spent all this time bombing around the projects and first thing you do is you have to share information with your other consultants because a lot of times the virtual teams come together you don’t even know who you’re working with and you also have to get in touch with your client and understand all the people on their teams. And we were like, this is crazy that we spend all our time managing other people’s information, shouldn’t they manage it and we should get updates. And we were thinking it would be really cool if this could get delivered to PalmPilots and Windows CE devices and all these mobile technologies that were at the fore. And so we built a technology to do this and we couldn’t figure out a business model behind it though because at that point everything was sort of ad driven, everything was very consumer and it was all about eyeballs.
And so at some point we decided that wasn’t going to work and we wanted to think about doing something that was more B2B and that’s when we thought, well, we could take this mobilization technology and this idea of delivering updates to different devices and maybe we could do that with analytic data, maybe we could actually provide almost mobile dashboards that are form factor specific for this kind of Cambrian explosion of devices and so we did that. And actually that’s what caught Informatica’s eye because at the time they were trying to vertically integrate the BI and the data warehousing stack and they thought, hey, this could be sort of a front end analytic delivery infrastructure that we could use to deliver analytic data across different devices as well as web interfaces. And so we sold the company to Informatica in August of 2000 and then the huge market crash happened about five months later and so in retrospect there was no alternative other than to sell at that moment but we didn’t know it, we debated it furiously.
Timing is everything.
Yeah, that’s right. And so I figured I’d go work there for a year or so and 13 years later I was still there and I saw Informatica go from IPO to a billion dollars in revenue which was an amazing adventure. Certainly as an enterprise software company getting to that billion dollar mark is a tough thing to do, there are very few that ever get there and it was fun to be really part of the company building. And then really Trifacta became a chance to jump out and be part of what’s new and what’s next in the space so thinking about with all the years of experience in data, in data integration, in data transformation, in cleansing, if you could rethink this from the ground up with a new set of technologies, thinking a lot about how this is going to democratize, wouldn’t it be great to go out and apply a lot of those hard lessons learned? And so that’s really what led me to Trifacta, so.
I mean, there’s a theme here that I hear, so early in the career it’s really about web and internet and content technologies, it shifts to mobile, as a proud owner of many versions of a PalmPilot myself can certainly attest to that transformative technology, and then to data. It feels like we are sitting today at another one of these transition points really around the cloud, the cloud is pushing and changing the data industry, the idea of data storage costs being effectively zero, compute being not free but certainly infinitely available, easy to access, these are sort of core assumptions that have changed significantly recently and yet the core problem that a lot of customers face which is their data is a disaster, they just can’t make heads or tails of it, still feels like still a kind of a core problem. So maybe, do you agree? And also, why? Why is it that we’ve figured out how to push mobile applications out, how to update webpages, we’ve got all that nailed but data is still a disaster?
Yeah, I totally agree. We are in the midst of I think a generational shift that’s happening right now around all things data. The data and analytics have now become the frontier for competition which is I think different than where it was 20 to 25 years ago where a lot of the rage was, let’s talk about business process automation, and let’s talk about digitizing transactions, and then you saw the rise of the database world. And then it was very interesting because right after that people said, well, now we got stuff inside of databases, that’s great, it’s digital transformation, but people said, it’s really hard to look at the data that’s in there, and then next thing you know you saw the rise of business intelligence and reporting at some level and a whole slew of companies became public companies as part of that because they were helping people to actually look at the data.
And then pretty quickly once people could see the data, they were like, man, the data is a mess, and so then you saw the rise of the data integration, ETL, data quality markets kind of as the third leg of the stool. And I think we’re going through that now again 25 years later but obviously, as you pointed out, there’s some meaningful changes that are these kind of secular megatrends that are driving it, right? So it is all of a sudden this infinite compute. It’s like gone are the days when only sophisticated companies could spin up the big sophisticated environments, now anybody can do it with a click of a button, the data is now not just transaction data, it’s interaction and behavior data, which is much more kind of semi-structured and in many cases unstructured. You’re seeing things move from being kind of managed almost entire centrally so now you’re starting to see much more collaboration, much more self-service, much more democratization kind of creep in.
And I think that when you sort of look at all of those trends taken together it’s no wonder that people are stepping back and they’re kind of saying, okay, now maybe it’s an opportunity for me to really reconsider or rethink some of these foundational approaches that I’m taking and maybe I can start to solve for how I can go faster at stitching together data products that are going to drive decision making and as part of that welcome uncertainty into my business or cater to long tail segments in my market better because now I can grab onto disparate data in new and interesting ways and get to new and interesting insights faster than the next guy and that becomes massive advantage for me. So I think that part of it is really, really exciting and unfortunately I think there’s probably more losers than winners right now in terms of actually realizing that but the potential is certainly there.
Yeah. And an interesting sort of element of that is this idea of competitive advantage by being a first mover taking advantage of that data that, as you call it, the long tail. And I feel like that’s very similar to the early internet, you made this point about Kaiser putting their documents online, being a first mover having your content available on the internet early is clearly a way that companies could gain competitive advantage. Although many were ham-fisted about it and didn’t put a brochure on the internet as a strategy as opposed to really rethinking their business model in the context of the internet. And I feel like the same thing potentially is true around data, as you said, just looking at the data, visualizing, it’s not enough, finding the nuggets of insight that uniquely move your business that’s the key piece, is that fair?
Yeah. No. And I think it’s been interesting to see also how investment and innovation and skills have gravitated to where the hard problems are, right? So it’s not as much about the containers that we’re putting the data in now, there’s lots of places, lots of specialized databases and engines for transformation and for compute. There’s, again, just an explosion of BI analytic, algorithmic, data science, work benches, and ways to sort of deliver the data or provide the sort of last mile analysis but really the hardest problems in data right now are actually creating the data products themselves. Once you have nice clean rows and columns, great, there’s lots of ways to make use of that, but getting to nice clean rows and columns it’s incredibly challenging.
And I think most organizations have woken up to the fact that if their data quality is bad then their analytics is probably worthless and so they really have to start thinking much more foundationally about, how am I going to take data of all shapes and sizes and bring it together? How am I going to standardize that data? How am I going to do that as efficiently, as effectively as possible? And how am I going to be ready for change? Right? At all times. And I think that’s actually where you’re seeing a lot of the kind of innovation in the market gravitate because it is the majority of the cost, the time and the pain, but it’s also, I think the area that is the most right for new ideas and new approaches that really can unlock just crazy amounts of value.
And oftentimes the foundation of those innovations, those new ideas, are academia, meaning that there’s some fundamental research that occurs that sort of creates some underpinning for what will end up being a set of commercial ideas, and that’s particularly true as it relates to Trifacta and Tamr. And I think many listeners are probably unaware of the origin story of Trifacta and by extension the origin story of Tamr, they both came out of the same academic research at MIT. So it might be fun if you share your perspective and your version of that story.
Yeah. I was going to say, I’ll share what I’ve heard and then you can tell me what you’ve heard from Mike and maybe the stories will resemble each other or at least will fact check each other a little bit. But yeah, so I guess for the audience, so Mike Stonebraker and Joe Hellerstein have known each other a long time, worked together for a long time dating back to PhD research and in academic circles. And I think early on in their collaboration contemplated going after this problem together and so there were a lot of discussions about starting a company that would really kind of combine the research that Mike and Joe were both doing. And I think both of them really felt like there was an opportunity here to do something that would unlock a lot of value because, as I said, everyone talks about this often quoted statistic of 80% of the effort in any analytic project is wrangling or preparing the data, the joke is obviously that the other 20% is complaining about the wrangling and comparing or cleaning up the data.
But these guys had approaches that they were debating and I think for Mike, as I understand it, it was very much like, hey, we want to centralize this, the idea is that we really want to use the algorithms as much as possible to automate as much of this at scale and one of the challenges is that it’s really hard to do this on a one off basis because just the number of systems, the scale and complexity of the data just necessitated a more algorithmic approach that was more centralized. And I think Joe came at it a little bit differently but I think in a complimentary way where he said, well, but human in the loop is really important and context matters therefore we really want to think about democratizing and we really want to think about, how can we let the people who know the data best do this work?
And so they’re both scratching the same itch but kind of coming at it from different perspectives, one that was a little more of a centralized algorithmic approach, one which was much more, how will this eventually get to a point where we enable the end users to get eyes on data early in a process and do the structuring, the shaping and the cleansing on their own so this doesn’t have to become the exclusive purview of only the highly technical? And I think what’s been interesting to see is that with the passage of time I think not only have the narratives of both companies kind of become increasingly complimentary and in some cases started to embrace some of the ideas on both sides of this but we started to get used in a lot of projects together where companies were saying, yeah, I think that there is an opportunity to use technology like Tamr to do a lot of the entity resolution at scale and to create canonical models and kind of get the data to a point where it’s coherent enough that the end users could really self-serve in a more meaningful way.
And then they bring Trifacta in to do a lot of the last mile data wrangling to actually put the kind of finished goods together leveraging the expertise of the end users who are living in data, who are data driven professionals, but either not necessarily structured programmers or just would prefer to have a more interactive immersive experience in creating this stuff where it’s metadata driven, self-documenting, reusable and shareable so that everybody can get leverage from the work that they’re doing. So it’s been an interesting journey from where this started so, I don’t know, at least that’s the story that I’ve heard, something close to that.
And I think that’s very consistent with the story that I’ve heard. And I think in some ways makes a ton of sense because both Trifacta and Tamr are attacking the so-called dirty data problem but we’re coming at the problem from opposite ends of the continuum. And it’s not a small problem, right? As we pointed out a few minutes ago, this is in a way the central problem of most enterprises, they’ve got, as you said, they’ve got so many different tools for using, analyzing, taking action from business process modeling their data and yet the data is a problem and it’s the central problem. I often make the point that every business by its nature is a data business and that’s true except what I don’t say is that, and most businesses don’t have a handle on their data so it’s a train wreck so we’re both sort of solving that problem.
Maybe stepping back a little bit from Trifacta and Tamr for a second just thinking about the overall data landscape and we often talk about this idea of the modern data ecosystem and my sense is a lot of our listeners just struggle with, what does that mean? At a practical level, what is a modern data ecosystem? Both from a technical perspective, well, what tools should I use? And which cloud should I run on? Or should I run on the cloud at all? Is that secure? But also from a business perspective, what goal am I trying to solve? What does it mean to empower my employees with data? Do I just give everyone a spreadsheet, is that sufficient or how do I think about it? So I’m sure you have a view on this.
Yeah. Well, I mean, it’s fascinating to me that if you think about, in recent memory, everyone keeps talking about it’s important to be data driven, and the end users kind of look back at you and say, that’s great. But then if you tell me that I have to write a spec, I have to throw it over the wall, and then I have to wait six months for somebody to add a table to a data warehouse, how are you helping me to be truly data driven? And half the time once I see the data, I’m like, look at it, and I go, that’s not really what I wanted, or now that I actually see the data my questions have changed. You need to be able to do in clicks what takes 6 to 9 to 12 months sometimes these days to get done and I think that for me a lot of the focus on modernization has been, how can I take advantage of some of these new technologies and new approaches in order to go much faster and in order to get eyes on data much, much earlier in the process myself?
And I think that coming sort of out of that end goal around agility I think you start to see then a whole number of changes that are occurring that line up behind that and so the diversity of data that you’re contending with is kind of higher than ever before, the need to go fast is higher than ever before, and the need for there not to be this impedance mismatch between the end users and the central functions that are supporting them. It’s like if you’ve got hundreds or thousands of end users that are, again, being told to be more data driven and then you’ve got a handful of people that are responsible for supporting them, well, guess what’s going to happen? Right? You’re going to end up with a lot of pain and a lot of angst around, I can’t get what I want, how I want, when I want, and that just makes me super frustrated and causes everything to slow down.
So I think kind of with that as sort of a general problem statement or opportunity statement, we’ve seen a lot of interesting examples where we were working with Bank of America and they had a quant group, it’s about 236 quants split roughly equally between fixed income and equities and there was a small team of people supporting that group. And that’s a game where if you’re a little faster you don’t just win a little more, you win all of it for some period of time if your algorithms are seeing things in the market and in the data that others are not. And they have a voracious appetite for different cuts of data shaped and structured in different ways because they’re trying to create training data sets that then will, again, birth new algorithms that can give them competitive advantage and so what was interesting is that the central group said, listen, we cannot hire our way out of this problem, we don’t have the budget, and even if we had the budget, finding the talent is tough.
So what we’re going to do is we’re going to open up raw zones on these data lakes and let the quants in and create their training sets and explore the data interactively on their own and then when they get to aha and they want to actually make multimillion dollar trading decisions, then we’re going to get involved and provide all the scaffolding that’s necessary as a highly regulated financial services institution to make sure that we understand chain of custody of the data, we understand how the training data was created, how it was transformed, all of the things that are the show your work things that are critical for data governance. And that was a really, to me, powerful example of when you start thinking about, again, how do I embrace more data diversity? How do I embrace velocity? How do I empower end users more to create advantage? That was a really powerful example.
And I think that the interesting thing was that the central function that was running the data lake that was in the IT organization said, what’s interesting is we’re giving more work to the users and they’re thanking us and they said, this is great. Because frankly a lot of it’s janitorial work, I say it very affectionately as a very proud data janitor, but it’s janitorial work with the data and we’re handing it over to them, they’re thanking us, and that frees them up now to think more strategically about, how do I crowdsource the best stuff? How do I make sure that it’s shared and reused? How do I go off and find other interesting data, alternative data, dark data, government data, third party stuff that I could bring in, that I could expose to those individuals that might create kind of new algorithms again? So it’s like it becomes this very virtuous cycle and I think, to me, everybody tends to want to talk about modern stack and modernization as in like, okay, so then what’s the tool chain?
But really it’s this higher order, how are we kind of changing how we think about data, how we work with data, the outcomes we can expect with data and applying a lot of these principles that are more agile, more, in some cases, DevOps or kind of broad sort of engineering principles to creating these pipelines and creating more efficacy with everything that you’re dealing with, and I think that is the really exciting stuff and the companies that get that are going to see tremendous benefit. And you’re seeing some signs of that, as I said before, though I think unfortunately there’s a lot of companies that are behind and a lot of companies that are still struggling even to kind of make use of their data under management let alone all the third party data and everything else kind of out in the wide world that they might want to grab onto. So.
It feels like one of the core ideas there is this idea of cycle time or velocity. You have an MBA as well, it’s one of the things we study as an MBA is this idea of what’s the process and the cycle time, often obviously in the context of a manufacturing facility which is kind of old school. But in a way that’s the same underlying idea, what’s the cycle time between data, insight, and meaningful change in your business which would align to the old school idea of the cycle time for manufacturing something on your production line. And, again, if you believe my hypothesis that every business at its core is a data business then the way to competitive advantage is in cycle time.
Yeah. Well, and I mean, to pick an example where you and I have had direct experience, when we both went into GSK the goal was to take the process for inception of a new drug to FDA approval from roughly 10 years down to five. Now, obviously with COVID it’s even gone and faster but at that time the ambition was, how do we cut it in half and how do we do that with data? And it was a lot about shrinking cycle time and also a lot about getting higher quality data and also getting that higher quality data in the hands of the people who are understanding the efficacy of clinical trials. And so you started seeing them put sensors into inhalers as part of their respiratory business and now they’re looking at real world evidence which is behavior data.
So rather than asking people, hey, did you use your inhaler three times a day? Did you use two pumps each time? Did you do it at 9:00, noon and 5:00? People forget or they don’t know that they’re not using it correctly, there’s all this noise, but now you actually can see the behavior data. And that’s an example where Tamr is coming in and basically helping with a lot of the work on mastering where it’s like, hey, there are canonical models for clinical trials, there’s experiment data, there’s assay data, there’s medical record data, there’s now this behavior data, how do we put all that together? And then Trifacta comes in and we say, okay, now we want to help the scientists, not data scientists necessarily, but literally the chemists that are actually doing the final analysis and preparing the reports on contraindications and efficacy and working with the FDA and others.
We want to help them now go in and grab on to a lot of that and do that analysis that they need to do in order to understand and accelerate the process and that is very much like a cycle time thing. And not only is that massive top line benefit because now you can get more better drugs to market faster but also a huge benefit to the health of the population as we’re living proof of right now. And so that to me is really exciting and it all comes back to that fundamental concept that you were describing which is, this is a manufacturing process and cycle time does matter, it’s manufacturing of data products and how do we go from raw materials to finished goods? And, I mean, all of those analogies hold. So.
It all comes back to the 1960s and the pandemic, it’s unbelievable.
So I want to wrap up where we began. We started the conversation with your career trajectory, coming out of Northwestern, into consulting, into The Valley and et cetera, and many of the listeners of this podcast maybe they’re in college or they’ve just graduated, they’re early in their careers and they’re casting an eye forward, and I’m sure coming out of Northwestern you had mapped out your entire career right up into this point, it was all completely mapped out, much as I did, I assure you. But maybe with the benefit of hindsight in your experience through to today maybe share a little bit about what listeners should be thinking about today and how they should be thinking about their career in data, where you would have them invest and spend time.
Yeah. No. I mean, I think that for me as I guess for many of us the journey was one of circumstance that presents itself that you just sort of lean into and you see where it leads and so I think you always have to be open to that. And I think that in part it’s as much about sort of making decisions about kind of broad problems you want to be around, people you want to be around, industries you think are interesting, even parts of the world where you think that people inhabit that are working on kind of problems that are interesting to you because there isn’t an element of you’re kind of immersing yourself in an ecosystem and that will naturally pull you in directions and will present opportunities to you and you’ll bump into things in your day to day life when you think about that.
So sometimes I think people tend to get very prescriptive about, I want to get this job, and then I want to go get this job, which is going to help me get that job and I guess I’m a little less deterministic about it, I’m a little more like, well, if I put myself in this orbit doing these kinds of things surrounded by these kinds of people in this place then goodness will ensue. And I don’t know what that is exactly but I do feel that you’re kind of in some sense hanging around the hoop and eventually you get a chance to toss in a few shots. And I think that’s a little bit, frankly, what led me to Silicon Valley in San Francisco, it’s a little bit what led me into startups and, well, a little bit what led me into kind of data and analytics and I think that at least on the data and analytics side there’s never been a better time to be doing more work on data and analytics.
I mean, I think that when I first joined Informatica many, many years ago this was the plumbing, this was the stuff that was not sexy that people were kind of like, yeah, I mean, we got to do some of that too, and all of the app integration and the transaction stuff and the database stuff and all of those things that were more business application, business process re-engineering kinds of things, that was where all the heat was at the time and the analytics and the data was a little bit of an afterthought.
Well, that is now completely flipped and so people who have skills in data, people who are able to come in and work with data but almost more importantly figure out how to be, not just problem solvers but really problem finders and then opportunity finders, I think that sort of blend of business acumen combined with really kind of skills rooted in being able to work with data and having a technical appreciation for how this work gets done, I think that’s an amazing combination that is incredibly valuable and you’re seeing companies bend over backwards to try to find that talent and engage that talent.
And I think that given what you’ve seen with cloud it’s been interesting because the data sets themselves have gotten bigger, they’ve gotten more diverse, but also that means there’s sort of more interesting problems that you can go after than ever before because there is stuff to grab onto, whereas before sometimes you just couldn’t find the stuff that you needed or you couldn’t get it to a place where you could process it and manage and contend with it and I think a lot of that has changed and even more is continuing to evolve. So to me I just think because it is so foundational to everything that organizations are doing this is an amazing, amazing time to be in data and so I’m very bullish obviously, and optimistic about what that holds for the future but also holds for the people they’re going to create that future.
Yeah, I couldn’t agree more. I mean, if the late ’80s and early ’90s where the early innings of the internet revolution I think we’re still in the early innings of the data revolution. And your point about cloud as an enabling technology and being at the forefront of the intersection of business and data, that’s going to put any young person in the right place for their career. So, hey Adam, great to have you on the podcast, wonderful to connect again, thank you for the brilliant insights and for your time.
Awesome. Anthony, thank you so much, super fun, always a pleasure and go Cats.