datamaster summit 2020

It’s Time to Manage Enterprise Data as an Asset

 

Andy Palmer

CEO @ Tamr

Digital transformation is an urgent competitive necessity. What does it take to reach total digital transformation? The first step is to manage data as an asset.

Tamr’s CEO, Andy Palmer, kicks off DataMasters Summit with a vision: a world where people in large organizations are empowered to deliver business value with mastered enterprise data @ scale across silos.

Transcript

Andy Palmer:
Welcome to Fenway Park. I’m Andy Palmer and I’m thrilled to welcome you here to Data Masters. We’ve got some great customers, prospects, thought leaders from across the global 2000 here to talk about data and how to manage data as an asset. I’m really excited to get going, so let’s go inside and get things kicked off.

Andy Palmer:
All right. Excited to get things kicked off here at Data Masters and I’d really like to spend some time talking about how it’s become time to manage enterprise data as an asset. There’s going to be a lot of discussion during the course of the day about the how and the what and the when of managing data as an asset. I’d like to talk a little bit more about the why. The journey that we’re all on in the enterprise now is digital transformation. It’s become an urgent competitive necessity. It’s no longer just a nice to have or a bunch of IT projects that every large enterprise is in a position now where it absolutely has to transform into a digital native as quickly as possible.

Andy Palmer:
It’s really this investment that we’ve made over the last 10 or so years in data science that’s created this urgency to start to clean up our data and manage it as an asset. Most data scientists out in the enterprise spend at least 60% or more of their time actually trying to prepare the data to be analyzed rather than doing the analysis itself. We think this is a huge problem. Managing data as an asset is a fundamental requirement for doing data science well. There are lots and lots of folks out there in the world that are implementing AI programs at very, very large scale. Almost all of them are coming to the conclusion that Marvin Minsky, one of the founding fathers of AI, was famous for saying that you really have to put the data horse before the AI cart. No algorithm is useful without enough great data.

Andy Palmer:
So as large companies begin their digital transformation experience, it’s very important for them to begin to manage their data as an asset and to start thinking first and foremost about how to organize that data, how to prepare it so that all of the folks in the enterprise that are going to use and consume that data have a common set of understanding that they’re working from. Over the last 40 years, we spent a tremendous amount of time in the enterprise automating business processes. These are companies like Oracle and SAP and IBM, Interrogator, were all founded on the belief that automating business process was a good thing. And essentially that created a whole bunch of data generating machines in the enterprise. Every single day, every ERP system, every procurement system, every CRM system is kicking out data that is a potential asset. Although more often than not, it gets treated like an exhaust.

Andy Palmer:
Then in the last 20 years, a lot of great companies, such as Tableau and Click and DelMo have led in the democratization of analytics. This is preparing essentially all the consumers of data in the enterprise to actually take that data and do something useful with it. Now in the last 15 years, there’s been this influx of big data infrastructure. Companies like Cloudera and Vertica and Pivotal, Informatica and Talent have started the process of preparing the data and putting the infrastructure in place. But now with the advent of the cloud, as large enterprises are moving the center of gravity for their data out to the cloud, is the real time for us to start to take advantage and manage of data and manage data as an asset in the enterprise. The table is set to realize dramatic value from all of this data that’s coming up from the bottom of these data generating machines and empowering all these consumers of data in the enterprise that are using modern analytics platforms.

Andy Palmer:
So when you do this, it’s tempting sometimes to sort of revert to a big waterfall methodologies and big boil the ocean projects. We work really hard at Tamr when we’re engaging with our customers to make sure that they focus on monetizing their data productively at scale from the very beginning, doing cradle to grave projects that take data from wherever it comes from all the way through in benefits and organization, either by saving money, by growing faster or by reducing risk in some way, shape or form. And if you do this over and over again at scale, then you also have this tremendous common curated set of core data that can be leveraged for all the coolest AI projects that are out there in the world. And as everyone in your organization starts to take advantage of modern data infrastructure and modern data science, such as you might get from a company like Data Robot, you can leverage this core data asset across all of those programs in a very, very compelling way.

Andy Palmer:
The time is right to do this now. It hasn’t always been easy to do these kinds of projects, but now between the impetus for digital transformation in the enterprise, the opportunity to apply great state-of-the-art machine learning and the center of gravity now being in the cloud, it’s never been faster, easier and more affordable to manage your data asset and deliver that to all the consumers in your enterprise. Now as you begin, as you start to sort of dig in and want to do this at large scale, there’s some really key principles to keep in mind. First and foremost, you always have to retain this agile analytics as a context. I’m going to talk a little bit more about each one of these principles. You’ll also want to make sure that you implement an open best of breed technology stack. You don’t want to go with single vendor, single platform.

Andy Palmer:
The third is best to do this cloud native. There’s no reason to spin up a lot of on-prem physical infrastructure. The cloud based infrastructure is cheaper. It’s faster. It’s going to skill up all the people in your organization to be more competitive in the marketplace. Another key principle is that sometimes in data, people like to argue between aggregation of data and federation of data. I really believe that a core principle in next generation of data management in the enterprise is embracing both aggregation and federation. Think of both Snowflake and Presto. Finally, curation is really a core capability. Data quality and the management and curation of your data every single day as a continuous process is really a key thing. And this is again, what we focus on at Tamr. It’s a core part and it’s often sort of left to the very, very end of a project or a series of projects.

Andy Palmer:
The most compelling thing about doing curation at large scale in the enterprise is that it becomes an opportunity to use these curated, structured entities to link your structured data and your unstructured data. It’s a very powerful opportunity. Let’s talk about each one of these a little bit more. When you’re trying to build out your date next gen data infrastructure, it’s really critical to work backwards from your analytic outcomes. Start with the questions that are the most important in your enterprise and then figure out what data is required from key business entities, customers, suppliers, contacts, products, parts, employees and such and then implement your data ops solutions to support those key business entities and the data that is required in order to deliver the outcomes. Do this in a very iterative way continuously over time.

Andy Palmer:
The second key point is to make sure that you focus on the people involved and their behaviors. Technology is really no longer the key bottleneck. There’s plenty of technology out there, especially if you’ve embraced cloud native kinds of deployment modalities. Start with the data consumers, the average data citizen, the average person that is just looking, trying to look up information about a customer or supplier. The average data analyst that’s using Click or Tableau or SAS, the average data scientist who’s out there probably building models using state-of-the-art tools, such as Data Robot, or the developer that’s building a data-driven application. Start with these people, their questions and the challenges and the things that they’re trying to do and work back into the data engineering and the supply of data required in order to answer those questions.

Andy Palmer:
When you implement these projects, it’s really critical to use an agile approach. Don’t spin up huge multi-quarter projects or multi-year projects. God forbid. You want to be able to mobilize with speed, deploy small multi-skilled teams and infiltrate iteratively and collaboratively with the users and leverage all the existing tools that are out there in your infrastructure. It’s a very heterogeneous environment. Don’t lock into just one set of tools. What does the modern open data engineering ecosystem look like? We at Tamr, after working on this for the last 10 years, and my experience both as a CIO and also running software and data engineering at a large pharmaceutical company, there are these seven core components to the modern open data engineering ecosystem that we think are absolutely essential. And at some level, it doesn’t really matter which vendor you get these from as long as you have these seven components and you have well-defined endpoints between these components, then you’ve got a sort of a great infrastructure to deploy out at very large scale.

Andy Palmer:
And of course, it’s really best if you deploy this natively on the cloud. GCP, AWS, Microsoft Azure, Snowflake Databricks, all of these vendors provide adequate infrastructure to be able to host this kind of next gen data management and are ready and willing to help you figure this out on oftentimes have templates that are pre-built for your industry. It’s very important to deploy this cloud native. As I mentioned before, it’s essential to consider both aggregated data sources, as well as federated data sources. The worst thing you can do, neither one, neither aggregated nor federated is adequate in order to solve the problem. You need both resources like Snowflake and you also need capabilities like Presto in order to truly deliver on the promise of analytics in the next gen enterprise.

Andy Palmer:
The next key principle is curation is at the core, that you can’t ignore the data variety problem. It’s the 800 pound gorilla in the corner. And oftentimes, curation sort of gets left to the very end of these projects. This happened with data warehouses and then data marts and then data lakes. And now that we have a chance on a modern cloud to implement next gen data management as an asset, we have the opportunity to get curation right and data curation really starts with tackling this data variety problem, mastering your data, using machine driven, human guided techniques, like the ones that Tamr provides, and making curation a core part of your data management profile. One of the benefits, if you organize your structured data really well, is that it creates an opportunity to take all those curated, structured entities and use them to tag your unstructured data very, very efficiently and effectively. You’ve got masses.

Andy Palmer:
Every enterprise has masses of unstructured data, documents that are sitting out there. Oftentimes, it’s difficult to find what you need in those documents because they’re not tagged accurately with clean, curated entities. It’s essential, once you clean up your structure data and you have these masters, to use those and leverage those to tag all of your unstructured content very, very efficiently, make them more findable by everyone in your enterprise. My favorite example of this is contracts. Once you’ve got a clean customer master, it’s relatively easy to go in and tag all of your contracts for those customers with a clean curated entity so you can find every contract you need for any given customer whenever it’s required. So if you do a great job of mastering your data at very, very large scale and managing your data as an asset and prosecuting these key analytic use cases, the next thing to do is to document the returns very, very specifically and then promote those aggressively within your organization.

Andy Palmer:
One of the things that’s most important to promote is that you’ve done this in a very short period of time. These are not big IT projects that take quarters or years to roll out. You want to promote the idea that in some small number of days even or weeks at the most or God forbid a month, you’ve delivered from cradle to grave something that’s created significant value, ideally measured in millions or tens of millions of dollars and promote that aggressively in the enterprise as the way to accomplish these objectives. You want to hold these people that have done this up as the heroes of data inside of your enterprise. At Tamr, we’ve done a lot of this work over the last 10 years and we’ve worked with all kinds of large customers across many different industries in delivering real tangible business outcomes.

Andy Palmer:
One of our favorite early examples was the work we did at GE, where in a very short time, three months, we were able to save them more than $80 million a year in spend optimization. We’ve done similar kinds of work across the life sciences, healthcare, energy, all kinds of manufacturing, entertainment and of course the public sector where we work really closely with a lot of agencies. The opportunity exists because we’ve actually been able to accomplish these significant business outcomes by managing data as an asset and we know that everyone that’s attending Data Masters has the same opportunity to realize significant value really quickly.

Andy Palmer:
Some of our favorite customers are the ones that start with a tangible analytic outcome. A great example of this were our friends over at Thermo Fisher, who really were looking to do spend optimization aggressively inside of their organization. And by adopting Tamr to master their data and implementing agile analytics, they were able to deliver a significant amount of benefit measured in millions, even tens of millions of dollars very, very quickly. We’re really proud of the work that we’ve done with Ignacio and the entire team at Thermo. Similar kinds of benefits at Johnson and Johnson, where we’ve mastered lots of data. It’s really starting with product mastering data and we worked with fantastically talented people, such as Elena, who was leading all of analytics at J&J for their product information and consumer. Truly exceptional results in a very, very short period of time.

Andy Palmer:
And in the case of J&J, they were running natively on AWS. So at Tamr, we really envision a world where people in large organizations are empowered to deliver business value with mastered enterprise data at scale across all the different silos of their organization. It’s our mission to enable these organizations to quickly and easily master their data as an asset to deliver this really tangible business value. I’m really excited that over the next two days, you’re going to hear lots of stories from practitioners about how to accomplish these kinds of objectives, how to manage your data as an asset, how to get short-term results and begin building your next gen green field infrastructure to manage data as an asset at scale in the enterprise. I’m really looking forward to sharing the next two days with you.