datamaster summit 2020

Don’t Boil the Data Lake


Marc Alvarez

Vice President of Data Management and Operations @ Thomson Reuters

As a business leader, setting objectives to improve the quality of data in your organization is critical. But, being too ambitious could hinder your ability to achieve quick wins and gain support from key stakeholders. To overcome this, data leaders need to avoid ‘boiling the data lake’, a phrase that describes taking the impossible task of transforming your organization’s data management processes overnight. Instead, leaders are encouraged to focus on a specific area and follow the DataOps methodology to incrementally change the processes, technology and people’s behavior within the organization. Marc Alvarez, has applied this ‘entrepreneurial’ approach to data management as Vice President of Data Management and Operations at Thomson Reuters. This session will cover why business leaders should treat data less like an IT project and more like a business asset.


Speaker 1:
Data Masters Summit 2020 presented by Tamr.

Mark Marinelli:
Hello everyone. Welcome to the session. Thank you for joining. Today’s session is titled Don’t Boil the Data Lake. We’re going to talk about breaking down large monolithic, big bang projects into smaller projects and steps to deliver immediate and incremental value and thus make sure we’re actually tracking again success.
Today, after I introduce our speakers and the topic itself, we’re going to talk a bit about why enterprise projects fail, a little bit about the incremental approach that we would present as the right way to keep these projects on the rails and then spend the bulk of our time speaking with our guest, Marc Alvarez, who will talk in detail about his experiences in large-scale enterprise data management projects.
So that said, Marc, welcome to the session. Could you provide your own bio so people have context for the conversation please?

Marc Alvarez:
I’m Marc Alvarez. I run Thomson Reuters corporate data organization. We’re responsible for pretty much anything to do with the production, distribution, and integration of corporate data, we call corporate data, principally contains domains are customer data, products that we sell, pricing that’s associated with entitlements, contact information across the organizations for things like identity management, and breaking out into more and more areas as we really start to get some of this under our belt and branching into things like product usage and supporting market initiatives for a variety of projects.
I think, by and large, the single biggest challenge… I’ve been here for just coming up to three years. Our biggest challenge in that period was in 2018. The company announced it was going to cut itself in half and divest itself of its finance risk business, which I have deep history in prior to Thomson Reuters. We took a 12 billion dollar company and turned it into two six billion dollar companies, and that’s really been the catalyst for a lot of our efforts. It’s really forced us to take a look across our whole business landscape, all our systems and, almost immediately, you center into the fact that data is an essential input to the organization, and it’s a value-added input.
The immediate observation everyone would make is, we weren’t taking it seriously, as I’m sure many firms come to the same observation, as well. So, we’ve spent the last couple of years putting together a strategy to go through our landscape of data and really move to a data as a service model, much less sort of the [inaudible 00:03:05] custom deliveries file transfers batch operations, and that’s unloaded a, that’s highlighted a number of very, very, what should we call it, major pain points in the organization and inefficiencies, so we’re very much using data to drive those forward.
Parallel to that, the company, of course, like everybody else in the 21st century, has launched a digital transformation effort, which I think in our business, which is largely driven by online information services, is really going to have a profound effect. I mean, I don’t think… Starting in 2021, this won’t be the same company it was three years ago. It’s going to be changing very, very significantly. So, good to be in the sandbox and dealing with all the same issues as everyone else, very interesting area of business processes, data content, and technology, all of which you have to make work together.

Mark Marinelli:
All right, thanks. We’ll jump into a whole bunch of different facets of your role in the business and your experience coming up. Before we do that, who’s talking from the Tamr side? You’ve got myself, Mark Marinelli. I run our Customer and Partner Enablement group. This is a team whose actually specific remat is to make sure that our customers are very quickly and wildly productive and are building that continuous value on top of a solid foundation in whatever context they are using the Tamr software and its people. Clint, do you want to introduce yourself as cohost?

Clint Richardson:
I am Clint Richardson, Technical Lead and Data Science/Data Ops at Tamr. I attempt to work on sort of the deeper technical problems in projects that our customers bring to us, again, seem to revolve around the space of how to actually get value out of data.

Mark Marinelli:
All right. So, what do we mean by boiling the ocean and why it’s a bad idea? Most of our customers, Marc among them, are in the midst of some form of digital transformation program. Oftentimes, it’s the second or third attempt at a digital transformation program, and the reason that the first couple may have failed is just taking on too much at once, taking on too broad a landscape of input data, too broad a set of projects, too many constituents in the hope that this can all be done at once.
It may take some time, but we’re going to get on the other side of this digital transformation program to everyone getting the best data that they could possibly use to solve whatever analytical or operational problem they’re trying to solve. We can build all the infrastructure and it’s just going to hum along. At enterprise scale, we find that this, to keep everyone engaged in these programs when they take months or quarters of years and not weeks to months is a difficult exercise. Because of that, these will atrophy and, ultimately, not reach the essential ROI that they were begun expecting.
Why does this happen? You find the groups that are undertaking these projects between IT and the line of business, a few months or quarters, and saying, “Where are all the data?” We don’t actually know all of the data that could be leveraged for a program, and this is the wrong thinking. Until we know apriori what all of those data are and how to contort them into formats that we can use them, we’re not going to do anything. We don’t want to leave anything out, so we need to get our arms around all of the data, catalog all of the data, put it somewhere where we can get at it.
Then, we’re going to start trying to analyze it, so that monolithic treatment of the data is really beholding to the availability and accessibility of those data, which is oftentimes pretty compromised in an enterprise business that’s been around for a while. A few quarters in, where are the results? We started off this project. We have line of business, operations, and analytic groups. We started off this project expecting that it was going to produce some value. That’s why we spent all the time and resources in it.
One group believes that theirs is the most difficult or valuable problem to solve. Another group believes the same. Neither one of them has seen any results, and then we begin the sort of political battle for prioritization of products, projects, resources, data, etc. That can cause a lot of friction and loggerheads that means that no one ends up getting what they want. Then, really, the big one is where’s the value?
We can have transformed and hued these data to a common data model, put them all in an enterprise data warehouse, made them somewhat in much better shape than they started, but what are we actually getting out of that if we are now quarters into the project or a year into the project because it’s taken us all this time to assemble the right people and data and projects, and we still haven’t seen any value. That’s where people are going to start falling off, say “I’ve got a lot of other things to do.” The business may have changed in that time, and that project that we started at the beginning that we thought was really important is no longer all that important.
So, essentially, time just kills these things. To keep everybody in the boat is a difficult task. So, an alternative in how we can solve this problem is to take those big projects that have a lot of different outcomes and constituents and inputs and break them down, taking a traditional sort of waterfall approach that we’ve seen a lot of people try and fail to do and break it down into a more agile approach.
What that means is small initial wins, subsets of the data, subsets of the problem, maybe one or two constituents and not all of them, and delivering a little bit of value, testing, course correcting, delivering a bit more value, etc. by breaking the problem down that allows us to checkpoint ourselves. We don’t wait quarters for the value to come out the other end and then only then discover that we were missing a large set of data that would’ve been useful or that we made a lot of bad assumptions about the rule set that we built for analytics.
Instead, in two-week or month-long sprints, we can get a little bit of value, useful data, not just a proof of concept, but useful data, get some value out of that, ask some questions about it, realize we got some stuff wrong, fix it, rinse, repeat. We’ll, at the end, get to that full-blown ROI, but we will do so in a way that is a lot more efficient and a lot more effective in maintaining the mind share and collaboration that’s necessarily to make all of this work. So, that’s really the problem statement and a picture for us to have in mind as we go into the conversation with Marx.
Marc, before we start peppering you with questions about specifics, could you describe the ocean? You talked a bit about the remat of your group, but could you describe the sort of overarching goals and aspirations for your digital transformation initiative?

Marc Alvarez:
Sure. I mean, I think the first thing is, you can’t talk digital transformation without talking about your data supply, and it’s that simple. If you’re not supplying data in a timely and frictionless manner, you can have as much digital technology as you want and you’re not going to get meaningful business insights out of it.
I think when I came into an organization that was really geared for a 12 billion dollar revenue spread across 40,000 employees across the world, and that really was the catalyst, to take a close look at what we were doing and really come up and, quite frankly, I was, after a month in the company, it was, “Okay, tell us what we’re going to do now,” so we decided to sell off half the company, so we had a real major catalyst sitting there. The first thing we did was to start to develop a business plan.
I would not do this again, having done this several times before. I would not do this again without sitting down and putting forward that business plan and really taking a view that, how is data being consumed in the organization, for what purposes? When we started, overwhelmingly, the biggest area of activity was our finance groups, simply to report company results and maintain books and records of the company. That was, by far, the biggest use of the information. As we’ve gone down this path and as we’ve started to simplify a lot of our technology landscape, we’re starting to see a lot broader usage for the data and a lot more broader demands that are emerging.
It’s quite interesting that PR had put together a customer master database years ago, sort of five or six years into this experience, but nothing else. What we’re finding is a lot of the demands were, one, to improve the quality of that data simply for management reporting and business planning, who are you selling to at what price point, just demographics, basic sort of stuff. As soon as you open that up, you’re now talking very much front office sort of activities where you’re describing the customer in all its complexities to the sales exec who’s responsible for covering the account.
So, suddenly, just by virtue of the his very organic approach to things, the whole landscape started to make itself known, so we sat down and started to enumerate the landscape, which is the second thing I would tell anyone to do, so we started to… Literally, that’s the word we used, what’s the landscape? So, we started with our customers. What are we describing about our customers? Who’s using it? Started to build out use cases, started to build out success criteria.
Like a lot of firms, we operate almost entirely on batch cases. That makes it very difficult. That makes it very difficult to drive some of the more digital enabled type functions that people want, right? They want notification of whether or not a… They want analytics to calculate the likelihood of renewal. We want analytics on velocity of sales. The whole company strategy has, as we’ve started to go down this path, has really been to focus in on leveraging our core assets as a company and generate growth for cross-selling and upselling.
We service major multinational organizations on a global basis. I’m sure it’s the same for many other firms. We’re not doing our customers any favors if we’re not doing a good job in making sure we’re on top of where we’re working with them and what we could be working with them, and that is something that comes back in all our customer surveys is how much they appreciate working with a global firm who can do these sorts of things. So, I think this is just going to keep going but, as we get into it, you start to realize that this cube of data that, traditionally, is used as sort of back-office books and records of the firm is actually quite sophisticated and very multidimensional.
It’s not just the customers. It’s who are the people we’re working with within the customer? It’s the products they buy. It’s any redistribution and arm’s length customers that we’re dealing with with them. It’s the transactions that we book with them. This goes on and on and on. You start going down this path, you start to develop quite a big vision of what that landscape of data is and who’s using it.
On the back of that, you start to see the opportunities, and it’s really, really important that you don’t view this as a technology exercise. You need to view this as a business exercise. You’re going to apply technology to get the economies to scale and [inaudible 00:15:46] that you want but, at the end of the day, you really do have to have that business strategy, and you have to commit to it. If you don’t commit to it and, by commit to it, I mean budgeting for it and planning and holding people to account for it.
You’re going to go around in circles. I’ve seen it everywhere else I’ve ever worked, and it’s a real risk and precisely the point you were bringing out. Unless you can demonstrate and manifest actual progress and improvements and unless you can quote an ROI value back to your finance guys, you’re going to struggle. It’s tough. So, my advice is, do the business work up front. That really is important. Have a vision. Have a plan.
Be very clear on how you’re going to generate return. That will start to filter down where you should start. In our case, it was getting our product and pricing information under control. That’s incredibly important information to drive how we simplify our commercial terms with our customers. That’s a big thing that comes back from our surveys with customers is they’d really like simpler commercial terms. They’d like them, especially global accounts and we service a lot of global accounts, they’d like them to be consistent.
We’re a company that grew up by acquiring 300 companies before I got here, and we continue to acquire companies. So, there’s a lot of diversity in there that wasn’t stewarded very well over time, and now we’re being pushed by our customers to really operate much more as a coherent type of partner that they can understand and get scale out of working. That’s demanding. That’s a demanding thing. We ended up with products being our basis for this and then we committed ourselves to standardizing how we do everything, and that’s the shift we’re in the middle of right now, and that’s what’s going to service our digital transformation as we go into 2021.

Clint Richardson:
I wanted to take one step back and dive in, really just focusing on the getting started piece. I think, as you said, once you take a few steps down this path, you can really start to see the myriad of benefits that are sort of coming your way and how you can really start to expand and continue to accelerate, but how do you get started? How do you… What roles do identifying sort of key things that you just have to do to serve a business outcome? What role do those play versus sort of like experiments to see what is possible when you’re really starting out at the beginning?

Marc Alvarez:
Like a lot of firms, I suspect the answer boils down to whoever’s shouting loudest gets the attention. In our case, it’s blatantly obvious when you look at, what does it take to run a global survey for the firm? The marketing people really struggle with this. The textbook example of the data paradox where, in order to run a global survey, they have to go out and source data from a dozen different systems and put it in spreadsheets and scrub it and try and make sense of it and then basically concatenate it all into one list and get the survey out, by which time that list is not completely out of date, likely full of duplication, likely full of all sorts of redundancies.
So, they make the case quite strongly, and this is a big input to our digital plans that we’re not getting enough bang for our buck as a company. They are one of many stakeholders, so I think the most important thing is to start to identify these problem sets and start to build a stakeholder community in each one of them. So, you look at something like our customer data, we have approximately 50 subscribing applications to that data, and that’s a very wide set of use cases.
I mean, it’s not just dropping data into our CRM platforms so our customers know who they’re dealing with. It touches on everything. It touches into our financial reporting, about how we segment our revenues, what we report to the street with a public company. So, there’s no one that comes to the top of the list. There’s many at the same time. When you start looking at it that way and you start looking at how are you going to get some benefit out of this, that’ll help you get the priorities straight.
Then, I think the industry’s a wash in best practices as to what you need to do, and the first thing you need to do is inventory what you’ve got. It’s our sin as a firm. We’re sourcing data from… We’re doing what everybody else does, right, sourcing data from all our operational systems and loading it into dump trucks and backing it up to a big data warehouse and dropping it in there, so you got these big static tables of data, basically reports from other systems, with no coherent integration. So, you can pretty much pick your problem, pick your poison at that point. There’s just so many things that are not aligned with being a real-time, quantitatively driven commercial firm.
I think it becomes pretty obvious pretty quickly what your priorities need to be and where you’re going to be. In our cases, it’s one of the biggest drivers of sales planning. We are hugely in-… We were hugely inefficient in our sales planning every year. To top it all off, we reorganized our entire business as a result of our divestment of Refinitiv. So, we went through a pretty tough 2018 to try and get on top of what customers belong to which of our customer segments and who was responsible for them, and it cascades through the whole organization, whether it’s the territories planning or commissions, quotas, all of which is just absolutely fundamental to running the business. That one surfaced pretty quickly.
It’s also quite a data-hungry activity so, when you look at what does it take to be halfway decent at sales planning and have an effect… Really, we want to empower our sales organization. We want to empower our customers with better service. They shouldn’t have to put up with multiple phone calls to deal with one issue. They should be able to deal with it, one email sort of thing. You know, those are real benefits to the firm, and those get buy-in quite quickly, and it’s pretty obvious to start to quantify the ROI you’re going to get from that.
Once you’ve done that, then you inventory it. In our case, we didn’t just do an inventory based on use case. We actually just decided, “Enough’s enough. Let’s catalog everything we have in the firm and let’s get it modeled logically, correctly.” So, now we have a listing of what data we have, where it is, and we have rules for how we normalize it now, and the normalization can be as sophisticated as what’s required for generally accepted accounting procedures through to our methodology for unique identification of a customer.
There’s a very wide variety there, but we now have a normalized view of the data without actually building yet another data warehouse. We did this completely logically, and now we have the discipline where, if we’re going to normalize that content and deliver it to an application, it goes through another best practice, which is we centralized our data management, so I run the data management for the firm.
So, there’s only a single centralized data organization with a consistent set of data stewards, a consistent set of data managers, and we’re now viewing this really as a value add. It’s how do we add value to that? You add value through streamlining delivery, modern interfaces, timeliness, better documentation, governance. These are all the things that come with it, but it’s all one model. It’s not five different models for five different use cases.

Clint Richardson:
So, it sounds like you’re saying a lot of this is sort of figure out, like you said in the beginning, what the landscape actually looks like and then I think you’re saying the problem sort of popped up, which ones to focus on in terms of business value. Was that… A couple of curiosities. How much do the stakeholders come to you versus how much did you have to go find them and, in the same question, with their data, right? How much did the data come to you versus how much did you have to go track it down and find it?

Marc Alvarez:
Last question first, all of it. We had to go track it down. We had to track down the sources. We, as part of our cataloging effort, we built a lineage from source, went and profiled all our backend systems. I mean, we’re running 14 different CRMs, for example. Imagine the amount of duplication across all the CRM systems, files going back and forth. Very difficult to add any value, just what we’re doing with Tamr, trying to add additional value here through our normalization. As far as people coming to me, I think I was the first appointed VP of Data Management. I think the first day there, people were calling me, “What are you going to do? What’s your plan? How are you going to do this?”
There were a few fires already burning, so we were in the process of ruling out a global CRM platform, which we’re still in the process of doing, but we’re getting there. That really highlighted for the organization the need to get its data supply figured out. Until I came in, nobody was actually owning that and making it happen. There was no single point of data supply. It was really a self-service model from the data warehouse, limited governance and oversight.
It had a lot of attention pretty early, especially as we had to decouple from Refinitiv, where we really needed precision on who are Thomson Reuters customer and who are not? I’m not sure every firm deals with that, but I think everybody would get the benefits of getting to that level of precision and really starting to build out all the additional descriptive content that really helps you get a 360 view of your customer.
That’s it. At the end of the day, on that side of the fence, that’s the most important thing, especially as we go down our digital journey, is making sure we are very clear on who our customer is and how most of these firms are big global firms. They’re not just one legal entity. There can be up to 1,000 legal entities forming a customer for us. Quite honestly, I don’t think the firm understood the complexity of the data that they were dealing with. It’s all legacy baked-in, hard-coated type of stuff. So, the vision as to how much more benefit they could get was really touch to achieve.
It was really hard for them to see that, if we did a better job on tracking, which is what we use Tamr for, if we did a better job of inferring who are customer constituents, who makes up that customer. We use Tamr with machine learning to follow some logic, which helps us maintain our customer heirarchies, which is actually fundamental to us. If we don’t do this right to a sales exec or a sales operation’s person, they will get a list of 1,000 accounts that make up their customer. They don’t know which one they’re working with.
So, we really need that precision, and that’s what you guys are working with us on, so it’s maintaining that view of the customer, and that’s got so many different applications. I think with data, once you start to get it right, the more you do, the more you can do, so now we’re in the position where we have analytics that our analytics team runs that really goes through and clusters data for our sales users to make sure they’ve uncovered every cross-sell and upsell opportunity. We haven’t left anything on the vine. We haven’t left a customer hanging, waiting for something.
I think these are types of quantitatives and statistical methods that are common to business now, so we’ll be doing a lot more of this in the future, and this is really an example of how you add value to what really is just books and records of the firm, should be books and records of the firm. This is how you add value.

Mark Marinelli:
Question about technology in all of this. You said you have data managers. You’ve got data stewards. You started to organize your team and your structure. Why not just hire a whole bunch more of them and use whatever technology you were using before? The economics of moving that to a different region, you could actually just throw a bunch of bodies at this and slog through using your erstwhile approach to this. What was it about Tamr, specifically or automation generally, that really moved the needle for you?

Marc Alvarez:
The answer is, you can’t. That’s a myth. Sorry. The reality is, you need to centralize your operations to get economy to scale to get consistency, and you need to normalize your content from a very, very heterogeneous landscape, everything from salesforce platforms to [inaudible 00:29:22] to everything else. We decoupled the Oracle data warehouses. We’ve got it all. Reality is I don’t think you could hire a big enough team and get to that point. We’ve already offshored as much as we can. We run big operations in Costa Rica, big operations in Bangalore, big operations in Europe, Eastern Europe, so we’ve already done that and the reality is, we weren’t generating the value. The value comes from the normalization and the integration.
Normalization, integration, governance, all of that is what you need in order to generate the value that our analytics teams are looking for as they analyze the likelihood of a customer to cancel the subscription, for example. They are moving very quickly. They’re pretty smart. We have large teams of data scientists who do this stuff. They rely on the content. The biggest impact we can do for them is to make sure that the content is accounted for, make sure it’s timely, make sure it’s not duplicated. You can only do that with automation.
In terms of Tamr, specifically, we have a couple of cases where we’re using you, but we really just started as an experiment to see what we could do with our data to add this value. I think we were all surprised that the reaction we got from our business users when we started to show them, “Look. Using Tamr, we can eliminate these causes of duplication, or apparent duplication, for you, even though they’re not duplications.” We were able to say…
For the first time, we could give a reasonably accurate view, and one of the big issues with COVID-19 was, where are the clusters where our revenues are at risk? We’re able to do that because of the work we do with Tamr to build up those customer heirarchies. That’s what we call them. There are all the legal entities that roll up to a domestic parent that roll up to an ultimate parent. Our global accounts teams operate at the global parent level. For them to know all the subsidiaries and their relationships is really vital. That’s just how you manage big global businesses in the 21st century.
So, Tamr, I mean that’s really introduced a lot more precision, a lot more accuracy, and the nice thing that this really was just a goal for us and objective. It really was a bit of a research exercise to understand less the Tamr technology, more the behavior of the data that we receive. We use Tamr now. We’re just applying it right now, so it runs daily, the hope being that, using machine learning, this data should, the accuracy of our clustering model should get better over time the more it does. That’s the hypothesis, and I’m feeling pretty confident about it.
In that case, those two models that we’re running are very much at the heart of our data production plant, so it’s a way of adding value. We’re not just grabbing a bunch of data and normalizing it and sticking it in the database. We’re actually flowing the data, and Tamr’s a layer we can put on top where we can start to put intelligence around the data, and we expect to do a lot more of this going forward before the data actually gets used by an application.
We can actually detect, correct, and reissue data before it actually gets into a business application or somebody, I don’t know, somebody in CRM or something like that where they would perceive it as an error, but we’ve already tracked the error. We’ve already fixed the data where it’s streamlining all our data management processes. So, that, I think, is a significant value to us, and it’s an area that I think we’ll be investing a lot in.
I mean, I think if there’s any area where machine learning and artificial intelligence is going to work, and you won’t hear this from any other people, it’s in the management of your data. This is what we do every day. This data comes in every day from across the organization. That just sounds like the killer app for AI. It’s not the sexy analytics that’s going to uncover whole new lines of business for us, but it’s going to keep us from building that army of data scrubbers like you suggested.

Mark Marinelli:
Certainly stipulating to automation and, yeah, we think we do it pretty well, why didn’t you just do it yourself? You said you’ve got a whole bunch of data scientists and data stewards. Why not try to build your own ML engine?

Marc Alvarez:
I mean, let’s be frank. I mean, data science is not data management. They’re two different things. We use some data science in analyzing the data but, in today’s industry, data science is not the same as data management. Why would we do it? Look, pretty much machine learning AI model up there is in the public domain, whether it’s Watson or SageMaker or even Tamr, and there’s very little to differentiate between these engines. So, really, our focus is on what’s the fastest path to value in using these applications? Really, it’s the old buy versus build concept and, quite honestly, our instructions as a company, our instructions are to go cloud first, leading edge technology, break the mold.
Those are philosophical courses of action that we’ve been issued. Working with Tamr and then Tamr was working with us on some other things. This was a new area for you guys. It’s been… The last year-and-a-half, we’ve been doing this. It’s demonstrated success and now to the point where we’re looking at really institutionalizing it. Those are the characteristics you’re looking for. You try to build by yourself and you’re just going to be doing a technology exercise. We’re much more about focused on get the results and get going quickly, and that’s just buy, don’t build.
There’s plenty of machine learning and AI platforms out there. You don’t need to build it yourself. Quite honestly, I don’t think we have… We would not have the commitment, and we wouldn’t get the ROI on it either. It’d be yet another vertically integrated proprietary application that we have to take care of, and we just know from past experience that those things don’t scale economically, so much happier buying than building the stuff.

Mark Marinelli:
Yeah, [inaudible 00:36:14], but it sounds like that your mindset dovetails really well with this whole dump over the ocean thing. If you’re going to have build your own thing, you’re already flowing down sort of the wrong track philosophically because you’re already saying, “I’m going to have to do the whole thing” as opposed to being able to pick and choose what tech actually solves the problems that you’re dealing with.

Marc Alvarez:
Yeah. So, I came out of the capital markets, and finance is my background, which have been lots and lots of big data use. We were doing big data before big data, and we were doing it on a realtime basis, the power trading applications and things. Clients’ projects I’ve worked for in the past have been people building in memory databases to capture all bonds in a given geography. I mean, these are big data, tracking every transaction against them and running forecasting models against them. These are not unknown methods. I just don’t think those methods have been applied to this corporate data universe. It’s what we’re doing now, and it’s kind of interesting.
What’s interesting is the divide between applying those methods and visually seeing results is a lot shorter than I thought when I came in. I thought this would take a lot longer to percolate up, so that’s quite interesting, and I think it positions us well for the demands of digital work we’re going to put forward and the ones we’re already seeing. So, yeah, I think these technologies exist. They’ve been around for a long time. They just need to be packaged and deployed against this problem set. I don’t think this problem set is particularly unique. I think every firm, 21st century, has this. I mean, everybody… You have to be slick in your online presence with your customers.
You have to be committed to supporting your customers to be successful. If you’re not, there’s plenty of other competitors up there who are going to or some smart firm out of Silicon Valley or somewhere else is going to start eating your lunch. I mean, you just cannot afford to not get this right, and that’s it. I think that’s the arms race of the 21st century is everyone’s going to be pushing down these paths to be the most efficient and successful they possibly can, and I don’t think it’s a question of cutting costs or being the leanest organization. It’s about meeting your customer’s expectations. I mean, that’s just dead obvious to us.

Mark Marinelli:
So, Marc, you’ve built this technological and organizational foundation where you’re delivering immediate value and incremental value [inaudible 00:38:54]. What’s next? What’s the next big thing that you want to take on in the organization?

Marc Alvarez:
Oh, that’s scary. It’s not that I want to take on. It’s what we have to take on.

Mark Marinelli:
You have to, yeah.

Marc Alvarez:
That’s the scariest one of them all. We get to the back end of this year, I think we’ve got critical mass of data that’s going through this centralized, normalized model of ours. It’s now going to be the business transformation to use it. It’s the actual integration and generation of value. This is not an area we are particularly strong in. We’re not going into this vertically integrated with a technology team supporting everybody on a finance desk or a sales desk or anybody. These are the users. We have an incredibly complex cross section of users.
We wrote up our requirements before some work we did on our contacts. We ended up interviewing 80 people, generating 60 user requirement use cases that we need to tackle. So, I think the biggest issue is this represents business transformation and cultural transformation to the company. It’s no longer going to be acceptable to take a big square stupid file at 5 o’clock in the afternoon and see if you’ve managed to update the system. That’s now how these systems are going to work next year and the years in the future.
Everybody’s asking for notification of change, event based, transaction based activity. They want to trigger workflows that make sure we get ahead of a customer inquiry or we’ve already resolved a question or a credit in collections or whatever. So, it’s actually downright scary because you actually look at the plumbing of the business. I don’t think it’s been looked at for a long time. Sure, we’ve had the consultants in. We’ve tried all sorts of different things. Even my short tenure with the firm, I’ve seen a number of finishes attempted, but now you’re talking about something which really is root and branch.
It’s going to change, let alone the fact we’re embarking on the migration of our customer data now, which incorporates the Tamr value add. It sounds silly but, as we walk people through how they’re going to get this new data and what it means and the fact that we’ve actually modified the semantics of the data to make it right, some of these applications, for legacy reasons, actually struggle with getting the right and consistent data. They’re not used to it. They have to make changes to cater to it, so I think the biggest challenge…
Talk to me next year. All we’ll be talking about is integration to our business application stock. It’s going to be a long and challenging journey across the firm. I also think we have new use cases coming out where I think this is going to really get us out of the gates quickly. We have better data, faster data coming through, under control, governed. We’re going to be able to generate better business intelligence, and my bet is that’s going to outweigh the cost and pain of bringing in the legacy business applications up to spec.

Mark Marinelli:
Right. Well, thank you very much. Great session. We are running out of time, so I’m going to move on to just sort of close it out. Everyone, hopefully you heard us give you some advice and guidance onto how to alleviate or mitigate some of those issues, why big, long, big bang enterprise data management or digital transformation projects fail. Just to leave you with a few points here, one, I think you heard Marc say, right from the get-go, he wanted to show some value, make it a business problem, show some immediate actionable and compelling results from his first project. So, put points on the board early and elevate the visibility of what you’re doing and gather, capture mind share.
You also heard Marc talk about a culture of experimentation. He did POCs, including the exploration of the Tamr technology and whether it succeeded or failed, let it happen quickly and then move on to the next one, not being afraid to fail fast, as they say, but not going too far downfield before you’ve been able to course correct and check what you’re doing. Lastly, breaking the problem down.
I think we also heard Marc talk about initially focusing his cataloging efforts around a subset of the data, around the most relevant data to solve a few problems and then expanding the cataloging effort to cover all of the data, but starting with a subset and really knocking that out of the park before moving on to the broader picture.
So, I’ll thank everyone, first of all Marc for giving us his time and sharing his experience and then to our audience for joining this session. Hopefully, there’s some interesting and actionable insight that came from the session, and we’ll see you soon.