datamaster summit 2020

How Cloud is making it easier to integrate external data sources

 

Matt Holzapfel, Head of Corporate Strategy, Tamr
Chris Napoli, Industry Principal, Financial Services, Head of Wealth & Asset Management at Snowflake
Mike Meriton, CEO at EDM Council
Businesses are growing increasingly reliant on market and alternative data to monitor key results, stay ahead of trends, and gain a competitive edge in the marketplace. While many organisations have solved the problem of identifying and acquiring diverse data sources for potential insight, the question still remains: How do we effectively integrate our sources at scale to unlock their true business value?Join us along with Snowflake to learn:
  • The biggest challenges to effective market data source unification
  • How the Data Cloud and machine learning are helping overcome these challenges to enable continuous, clean, and trusted insights
  • Real-life use cases where data unification has delivered high ROI through new data products and insights

Transcript

00:00:03:17 – 00:00:21:04
Speaker 1
Hello and welcome to the EDM Council and Tamr Webinar titled How Cloud is Making It Easier to Integrate External Data Sources. Mike Meriton, co-founder and CEO of the EDM Council, will be kicking off the presentation. Over to you, Mike.

00:00:22:14 – 00:00:54:06
Speaker 2
Thanks, Joann, and welcome everyone to today’s EDM webinar. So why are we all here? Businesses are growing increasingly reliant on market and alternative data to monitor key results of their business performance risk. Stay ahead of trend trends and gain a competitive edge in the marketplace. While some organizations have solved the problem of identifying and acquiring diversity resources for potential insight, the question the fundamental question still remains.

00:00:54:15 – 00:01:23:13
Speaker 2
How do we effectively integrate our sources at scale in the cloud to unlock their true business value? So to address this question and opportunity, let me go ahead and introduce our two speakers for today. First, Matt Hausfeld, who’s the head of Corporate Strategy from Tabor. Welcome. Matt, good to have you on the discussion today and also Chris Napoli, who is the industry principal for the financial industry from Snowflake.

00:01:24:05 – 00:01:35:01
Speaker 2
So to get us going, Chris, I’m going to turn it over to you to address that first statement here, which is how is cloud making it easier to integrate external data sources? Chris, over to you.

00:01:35:17 – 00:02:04:02
Speaker 1
Thank you, Mike, and very happy to be here. And before we go through to advise how the cloud is helping us bring in and actually incorporate external data sources and even internal data sources, I thought maybe it would be a prudent to take a step back and think about some of the challenges that we have had as data and technology executives prior to the cloud and then kind of ensure that we’re not duplicating mistakes or lessons learned that we could have actually drawn over the course of this journey.

00:02:04:16 – 00:02:28:14
Speaker 1
So if everyone is taking a look at the slide on the screen here, if we look at the left hand side about the technology and data silos that have existed in a lot of our organizations, not just exclusive to the financial services industry, but all industries where we tend to have created silos fit for purpose applications, fit for purpose technology and fit for purpose data sources for specific business processes.

00:02:28:22 – 00:03:11:13
Speaker 1
So whether we are looking at this from the financial services lens where we have different, say, security masters throughout the organization, different data feeds that are external, that do not actually match or do not or do not map relatively easily. We wound up creating a host of actual challenges on the right hand side. The reason why I want to highlight this is if you look at the second row here, we also have the ability now where it is possible that if we have a federated model and we have numerous cloud providers in the same kind of organization, and the ability to share between those cloud providers has the potential to recreate these these legacy inefficiencies.

00:03:11:19 – 00:03:29:15
Speaker 1
So as we go through this webinar, we’ll be highlighting how to get around some of that and how Snowflake, as well as Taymor, are able to assist in that in that journey. But on the right hand side, just to make sure everyone’s level set, you know, we do see that if you have staging databases that you get created, you create stale or incomplete data.

00:03:30:01 – 00:03:52:08
Speaker 1
There’s data versioning and inconsistency issues. If in the event that we don’t have a failsafe through the cloud, you know, the data governance challenges will always persist. But there are ways that organizations such as EDM Council is trying to assist with the best way to manage data within the cloud. But most importantly, what we’re looking to solve and why we want to go on this journey and leverage the cloud for all its capabilities.

00:03:52:17 – 00:04:19:00
Speaker 1
Right. Is actually improve scale and to get to the time to insight or time to value of our data pipelines. The last bit and if we move to the next slide I’ll go over is a bit about the lack of resiliency. So if we think of this from a financial services lens, we want to be able to incorporate external data as well as share internal data as efficiently as possible in order to fuel any type of algorithm that we’re looking to create.

00:04:19:08 – 00:04:46:09
Speaker 1
It can be a backtesting algorithm. It can be a customer. 360 Next best action algorithm across the enterprise. But these are the business challenges that our previous infrastructure created that we really want to be to be mindful of as we migrate to the cloud. As you can also see, the lack of scale or processes sometimes taking 5 hours that now potentially can take minutes or seconds in order to be able to provide value to the business.

00:04:46:15 – 00:05:08:10
Speaker 1
Those are some of the capabilities that the cloud is permitting us to do, both with internal as well as external data. And then I think the last bit that we should be mindful of in the bottom right hand side here is really about VCP and resiliency throughout the cloud. In previous worlds, if we had onsite and on prem data centers, we would have to have a fail over that.

00:05:08:10 – 00:05:33:24
Speaker 1
More likely was physical. What is capable of being done in the cloud and particularly through Snowflake is actually to make sure your mission critical processes fail over to either different regions of the cloud or potentially even different cloud providers. It’s one of the unique benefits of snowflake that will go into from a external incorporation of datasets, but also to be mindful of when we think about just BCP based processes as we kind of move forward.

00:05:34:16 – 00:05:36:06
Speaker 1
I’ll take a pause there, Matt.

00:05:38:01 – 00:06:03:12
Speaker 2
Thank. So Chris, to jump in and get our audience involved in this discussion. We have our first polling question, so hopefully everyone sees pop up on their screen. The question how many data cloud providers does your organization have? It’s a single choice from a zero. We haven’t started the journey all the way up to as many as more than three.

00:06:04:02 – 00:06:27:09
Speaker 2
And we’ll give it a moment for our audience to weigh in and keep give everyone. I think a general sense here of our participants and where they stand on implementing a cloud solutions. Give it a few more seconds here. Okay. If the team could go ahead and give us the responses and we’ll just take a look at where.

00:06:27:09 – 00:06:56:04
Speaker 2
Okay, here we go. So the number one response at 43% is the winner. Two cloud providers that they’re involved with. The next second highest is three or more and it’s 28%, 23% at one. And the lowest response is it looks like 92% of our audience is involved, gentlemen, with at least one cloud provider, which is good to see.

00:06:56:04 – 00:06:59:24
Speaker 2
So, Chris and Matt, any surprises on this data?

00:07:00:24 – 00:07:18:18
Speaker 1
I’m surprised is no, but I am actually very happy to say, you see, that people are really moving forward with the cloud journey. I’ll talk a little bit more right after this. About Snowflake’s ability to normalize through different cloud providers. But Matt, over to you to share your thoughts on this.

00:07:19:12 – 00:07:46:02
Speaker 3
Yeah, it’s taking us. I’m not really sure. I was expecting one in two to be the highest, but three plus at 28% feels it feels a little high. And so I’d be interested in understanding how people think about a data cloud provider and kind of how people are defining it. Because I think that the term is still very much, I think, being defined in people’s heads.

00:07:46:02 – 00:08:10:03
Speaker 2
Yeah, agreed. And I think if we had asked this question a year or two years ago, we would have much fewer people. Certainly I would think almost no one taking three plus and at best maybe one, maximum two. So it does evidence that people are more in pursuit of cloud and have actually began the operational steps. So this is good and I think the timing of this discussion couldn’t be better.

00:08:10:03 – 00:08:23:08
Speaker 2
So back to you, Chris, on how, you know, that phrase external data sets. People may have a very limited view. I love this slide because it opened up the door. So welcome. More context here.

00:08:23:22 – 00:08:47:00
Speaker 1
Yeah, great. And you know, when we mentioned bringing external data sets, I think the other way that organizations are starting to think of it is first party, i.e. data created and generated by the enterprise or the institution and third parties. Right, taking it in from somewhere else. I think as we continue to evolve, we’re getting to the point of where we think consumer and producer, right?

00:08:47:00 – 00:09:09:15
Speaker 1
So if you think about it, there’s a direct institution where sometimes it’s just not a commercial opportunity, right. Of what we’re seeing and looking on this slide. Here are some organizations where they commercialize their datasets. Right. And and I think the way that we are starting to think about it is how do we take in data from organizations that aren’t our own in order to enhance our models to help perform a business action?

00:09:10:02 – 00:09:32:22
Speaker 1
So if you take a look here, you know, the way that Snowflake views this world currently is that we have a marketplace on the snowflake platform, makes it rather easy for organizations to share their datasets to monetize their datasets, not only to organizations that currently are using them, but also at the same token to raise awareness that their datasets exist in the market.

00:09:32:22 – 00:10:13:06
Speaker 1
So that functionality of Snowflake is really that we’re trying to bring to help organizations, you know, actually incorporate their data to just expose their tables through the snowflake platform and have the data scientists, the data engineers, the business owners just directly join those tables into their workflow and go about their their business process. The reason why I bring this up is that when we think about the cloud and how do we share or even incorporate external datasets, part of the challenge that we want to get around is the legacy ETL processes and the data operational processes that normally created friction in order to actually get to that time to value.

00:10:13:20 – 00:10:41:11
Speaker 1
It’s one of the unique value propositions of Snowflake to rather to normalize between the three major cloud providers for sharing data, both in our external marketplace as well as through some of our private sharing capabilities. So what does that mean? If I’m on a WAC and my other provider is on Azure, we can actually just share our tables through Snowflake as opposed to going through some of the other methods of delivering information.

00:10:41:11 – 00:11:05:22
Speaker 1
I sftp tsb’s API is the reason why I bring this up prior to kind of moving forward is if you think about it also from the from the vendor perspective, if you platform on us but your client is on GCP, you still have the issue of how do you get the data and the information from one area to the other in order for them to incorporate that external depth.

00:11:06:07 – 00:11:24:22
Speaker 1
Right. So if you think of it from how do you actually normalize distribution of data, right? That’s another value proposition that we lever that snowflake provides to the market in order to be able to connect these disparate datasets directly to the client where they want to be met without all of the operational burdens that exist in the cloud.

00:11:26:23 – 00:11:48:00
Speaker 3
And just to jump in on one of these these points around collaboration, because I think I think this is a particularly important point in terms of how to get value out of external data and in particular most, if not all external data providers charge and license based on either the number of records or the number of tables that they’re licensing.

00:11:48:00 – 00:12:15:08
Speaker 3
And in other words, the number of users doesn’t matter. And so if that data is only used by one person or if it’s used by 100 people, the cost to the organization is the same. And so one of the things that that we’ve seen as is really powerful with with our customers who are trying to use more external data is when they put it in the cloud and really make it easier to access and collaborate exactly as the slide says.

00:12:15:08 – 00:12:26:17
Speaker 3
And they’re able to unlock a lot of hidden value without incurring any incremental costs because already licensing that data is just they haven’t been able to get it in the hands of the right people.

00:12:28:06 – 00:13:10:05
Speaker 2
And the thing, too, is the phrase data marketplace has been used a lot. But what I especially like, if you look at this slide, is often when people think of external data, they limit their thinking to one area like geospatial data because a lot of examples have been used around that. But this opens up the door to all of the different ways that companies can leverage external data sets across energy sector, the economy, demographic data and this opens up the door for a data literate organization to really get to the next level of of using this to create insights to drive better decision making inside of their organization.

00:13:10:05 – 00:13:13:03
Speaker 2
So I think this is this is exciting.

00:13:13:21 – 00:13:37:23
Speaker 1
And might just to to follow up on your thought, you know, one of the most access datasets within Snowflake during the COVID pandemic and the continued pandemic that we have was actually the COVID information that was coming from the US government. People rather quickly incorporated that, particularly the retail and manufacturing sectors, to figure out where they should be opening their stores.

00:13:37:23 – 00:14:04:12
Speaker 1
And normally that would have been a lot more difficult of a process to just bring in and run that data and run the forward looking models, audit without the capabilities of the cloud. So it’s really not just the commercial aspect. There’s numerous other ways and other value add in as discussed. I would like to highlight a few points though as we go through about incorporating and accessing data in the cloud.

00:14:04:20 – 00:14:33:11
Speaker 1
And it’s really around security and governance. I’m the main key factor and one of the key value propositions of snowflake, but the part that really, truly makes it I’m accessible for everyone to actually go through. And a lot of these processes is really around the security and governance capabilities. Right. As Matt just mentioned earlier, knowing who is using your data or knowing how it is entitled is rather important, both from a commercial aspect as well as from a other regulatory aspects as well.

00:14:33:19 – 00:14:51:06
Speaker 1
So we just want to highlight three areas and bring up how we’ve worked with the EDM Council in order to to achieve these objectives is really to know your data, understand the metadata tagging, know how to protect your data, to make sure that the right people are using it or permitted in the right place for for numerous myriad of reasons.

00:14:51:13 – 00:15:27:20
Speaker 1
And then the last bit is to integrate your data, right, and to understand how to do it securely through the cloud as opposed to kind of dropping it in storage areas and losing some of that functionality. So as we move to the next slide, we want to just highlight some of the journey that Snowflake has had working with the EDM scene, particularly to develop the cloud kind of strategy and data management framework where we all as peers in the industry realize that this is where the world was heading and we wanted to take the best practices that exist for a data literate organization to move forward.

00:15:28:00 – 00:15:51:24
Speaker 1
So we engaged with the EDM City Council way back in 2021. We’re kind of to skip reading what’s on the slide here. We’ve worked with auditors as well as advisory organizations in order to come up with the appropriate framework that truly will give the certification that says, Hey, we understand that data in the cloud has is the same but different too, so to speak.

00:15:51:24 – 00:16:11:23
Speaker 1
But there are unique ways and unique functionalities that we need to be mindful of as we move forward into this into this more open, more data literate and more demographic or more democratic way of sharing data throughout organizations. So Michael, actually turn it over to you for the next slide. Perhaps you could just give us a little bit more information about the cloud data manager.

00:16:11:23 – 00:16:41:16
Speaker 2
Yeah, sure. Thanks, Chris. And what everyone is seeing on this slide is the sort of the bull’s eye result of over 100 companies and a year and a half effort to define a series of controls and best practice processes for managing sensitive data in the cloud. And in fact, this was the first publication that came out in September last year, so about 9 to 10 months ago.

00:16:42:00 – 00:17:12:08
Speaker 2
And these are 14 key controls for protecting sensitive data in cloud and hybrid cloud. And you’ll see it covers everything from governance that immediately everything in cloud should be cataloged and classified and then will set up a proper accessibility and usage rights. The right protection and privacy considerations manage data through its full lifecycle, including transformations with full lineage and visibility, and all the way to the archival and retention of data over time.

00:17:12:18 – 00:17:41:04
Speaker 2
So this framework is actually a free resource credit, by the way, to the organizations on this call. Snowflake was very active in helping to build this framework along with the three major, four major cloud companies Amazon, Google, Microsoft, IBM, top foreign revenue worldwide, all put in about a dozen engineers to work on this, along with 100 other companies crossing the financial markets and other industries.

00:17:41:04 – 00:17:59:03
Speaker 2
And this was also briefed with regulators throughout the world and was published as a standard resource for everyone. We’re going to put in to chat, I think we just did a moment ago. If you’d like to access this, you do not need to be a member of the Council to access this. It’s a free resource to all companies.

00:17:59:10 – 00:18:37:04
Speaker 2
If you’re interested in what was notable and Chris, credit to the Snowflake organization is you all took these 14 key controls, set up an environment across Asia, IWC and Google, and then brought in one of the major audit firms, KPMG, to assess that all 14 key controls shown here were up and running and protecting sensitive data. And the reason we raised this is if you want to go and move forward onto those exciting business use cases of highly leveraged your data, the foundational requirement is that you protect your sensitive data first, then it opens it up for the right use case for the right situation, and that’s the benefit of this type of framework.

00:18:37:12 – 00:18:49:11
Speaker 2
So Chris, thanks for raising it. Thanks for your involvement. And also to Taymor for raising awareness that these type of resources are available to anyone in the world that would like to better manage their cloud environment. So back to you.

00:18:50:14 – 00:19:15:00
Speaker 1
Yeah. And Mike, thank you very much for that. And just to demonstrate how this framework works in practice, just as you had mentioned, Snowflake is for a large financial services organization, the only approved cloud software or cloud provider in that case that is permitted to have PII information stored within it due to this framework and due to the security and the governance pieces of our solution.

00:19:15:06 – 00:19:38:09
Speaker 1
And the reason why I bring that up is it actually does empower that Customer 316 use case that you mentioned, particularly for organizations that are looking to service the same individual across many different areas of the bank. So just a testament to the hard work of what it means and to be mindful of all of what we need to do in order to empower the next wave of technology and use cases with that.

00:19:38:16 – 00:19:58:23
Speaker 1
Just to move to the next slide and wrap this up before handing over to Matt. And we do have assets available on the Snowflake GitHub. Do feel free to reach out as required if you have any questions around anything that you have seen here, we’re more than happy to to provide some insights of what and how we’ve taken other organizations on that journey.

00:19:59:13 – 00:20:01:15
Speaker 1
With that, Matt, I’ll throw it back over to you.

00:20:02:24 – 00:20:25:05
Speaker 2
I think Chris, to get Matt section going, we’re going to get another poll question going and get our audience into keyboarding and some questions for us as well. So the next polling question is what data sets do you find most useful to incorporate into your analysis? It’s a single choice, so this is the one that you feel is the most useful.

00:20:25:05 – 00:20:51:13
Speaker 2
So it’s reference data. Second choice is demographic data, third choice, security data. And if you see a type of data that you think is most useful but not represented in these choices, you can click on other. We’ll give it a few moments here to get a sense from our audience as to where you’re bringing in other sources of data to supplement your decision making.

00:20:51:13 – 00:21:20:03
Speaker 2
All right. And our team will now present the results. Here we go. Are the number one winner, 51% hit reference data. That’s an interesting response. And right behind it at 34% demographic data. So, Matt, do you want to chime in on this one? And certainly, Chris, you may have some comments just make sense.

00:21:20:03 – 00:22:04:15
Speaker 3
Yeah, super, super exciting to see this because we talking about reference data in the in the next section, but it makes a lot of sense kind of in particular, I think what we see is when you’re particularly looking at data about companies or organizations, the best data is data that doesn’t live in your four walls. And so in order to do any sort of really useful analytics around whether it’s customers that their accounts are companies that are potential investment opportunities or portfolio companies, if you don’t have that that reference data and those external data sources, it’s really hard to get much insight in.

00:22:04:15 – 00:22:08:22
Speaker 3
So good. Good to see people agree and are trying to get useful.

00:22:10:01 – 00:22:28:02
Speaker 1
Yeah, I think the only thing I would add is that what we’ve started to discover is that cybersecurity and security data in general is really a big data challenge, right? It’s more so getting as much of the information that’s coming from all or parts of the walls and protecting of the walls into a repository in order to analyze.

00:22:28:02 – 00:22:42:15
Speaker 1
So it’s nice to see that people are also selecting that people are starting to view even cybersecurity and Cisco type of activities to also be a data challenge, not just purely a network and technology challenge.

00:22:42:22 – 00:23:05:22
Speaker 2
So Matt, you’re going to dig in at this point on the next level of the opportunities and challenges of integrating external data into companies fabric of how they operate. And for our audience, we encourage your questions as you have to just pop them into the Q&A box, and we’re going to do our best to answer as many along the way.

00:23:05:22 – 00:23:23:21
Speaker 2
And we’re also going to dedicate a few minutes at the end. Any question typed in. Here’s our promise that you will receive a written response. If for some reason we can’t answer it with all the questions we get today, we will give a written response and send that back to everyone after this webinar is over. So Matt, over to you.

00:23:25:07 – 00:24:09:14
Speaker 3
Yeah. Thanks, Mike. And so for for for this, we’ll talk a bit about some of the challenges and opportunities with being able to integrate external data with, with internal data. And just to provide some context on my perspective and the perspective as a Tamrs, we’re a an enterprise data mastering company and very often we are helping customers who are either in the process of migrating to the cloud or have migrated to the cloud, helping them stitch together disparate internal and external data sources so they could get a clean, complete view of key entities such as companies, people, suppliers, products.

00:24:10:04 – 00:24:39:23
Speaker 3
And so what you’ll see in this content is really based on on our experience helping advise and helping kind of implement large scale solutions for overcoming this challenge in a, in a repeatable manner. And so let’s, let’s dig in a bit to kind of how we see the problem and what we see in terms of and where are we seeing success and where are we seeing people fail on this on this journey?

00:24:39:23 – 00:25:18:19
Speaker 3
So just to kind of bring us to what the overall trend is in the market, I think what we see pretty much across the board is that companies are trying to use more and more external data for their their analytics. When we talk to some of our customers who are investment managers, what they’ll say is the only way for them to really get an edge is to have external data that helps them understand trends in the market, such as consumer spend before their competition.

00:25:18:19 – 00:25:51:17
Speaker 3
And so they can really understand should we make this investment or not and understand that the health of their portfolios. And so across all industries we’re seeing that people are spending more on external data in order to get some of these new and richer insights that better enable buy it. And I think the challenge that we see when people start to really acquire sources on a regular basis, for example, one of our customers and they’re buying 1 to 2 data sources every single month.

00:25:52:10 – 00:26:25:11
Speaker 3
And the first few data sources that that they they onboarded the very well-known ones like Cap IQ, for example. There are a lot of clean identifiers within within that data, things like it’s like a ticker symbol, like a domain, many company alternative names. But when you start to add multiple data sources, you don’t really have a primary key in order to join all of that external data together, maybe you get lucky and you have multiple sources that have a domain that is that is trusty trusted.

00:26:25:23 – 00:26:52:14
Speaker 3
And then you could just connected through that. But that’s that’s very rare. And even in in those cases, there are often situations where you’ll have two entities that share a domain, but you actually want to treat them differently. So for, for example, Amazon’s venture arm, Amazon, Alexa Fine and Amazon, the retailer, both Amazon Echo com and depending on how you actually want to run that analytics, those analytics, you might want to treat those separately.

00:26:53:19 – 00:27:23:08
Speaker 3
And so some of these these issues with that inconsistency can make it very difficult to actually get the value out of external data. And in particular, what we see companies are trying to do is rely on third party match services in order to help overcome some of those issues. And so I have my internal data set. I’m going to send my data to each of my ten or 20 plus data providers and have them do the match.

00:27:24:01 – 00:27:57:06
Speaker 3
That could be challenging because then you don’t own that process of doing the match and reconciling the differences between all of those those those data sources. And so what we’re kind of outlining here is that the amount of complexity kind of really adds up as you start to get more and more sophisticated with external data that the goal that people are driving towards and where we all want to really get to is to have data that is organized by logical entity.

00:27:57:06 – 00:28:20:21
Speaker 3
And I, I’ll drill in a little bit on kind of what, what we mean here. So when we were in a meeting five or so years ago with a head of I.T at a large medical devices manufacturer and what they said is my supply chain team is kind of beating me up to get more external data about social media signals.

00:28:21:01 – 00:28:52:19
Speaker 3
They wanted to try to scrape everything that’s happening on Twitter and Facebook and LinkedIn in order to understand supplier risk and understand what’s what’s happening in the market. And so when we kind of dug into that idea and what they were really trying to do, they thought of the problem initially as well. If we just have all of this data about everything people are tweeting about, we’re going to find interesting signals that will help us inform how we should manage and adjust our supply chain.

00:28:54:05 – 00:29:13:15
Speaker 3
We said, okay, you know, that’s not something we help with. But, you know, let’s let’s stay in touch. We we talked to them again a couple of years later and they said, we have all of this data. It’s a lot of noise. We don’t know what to do with it. And we said, well, that’s because you need to integrate that data with your internal data.

00:29:13:15 – 00:29:36:17
Speaker 3
And then once they did that, then they were able to actually start they were actually able to get real, meaningful insights about their business. And so the key point here is if you start with that internal structured data and then expand from there, you’re going to get much richer insights than if you’re just kind of buying or acquiring data sources for for the sake of acquiring them.

00:29:36:22 – 00:29:59:23
Speaker 3
And I think the good news with cloud and how the cloud is enabling all this is you could really do both of these in one step. You can acquire the data and integrate it within that that cloud environment much more efficiently then on prem environments where you’re passing data around and the way you’re actually managing that data might live somewhere else.

00:29:59:23 – 00:30:24:09
Speaker 3
And just to kind of bring this home and, and drill this in for what this this looks like, if we think of one individual organization and kind of why this is such a challenging problem, we look at something like like Capital One. Capital One is it represented kind of many different ways in internal data as well as as external data.

00:30:24:09 – 00:30:45:16
Speaker 3
And so if you can establish that our Capital One in our salesforce, which is our internal data, is the same as the Capital One in Cap IQ or in Dun and Bradstreet, then what you get is this view at the top where you understand all of the attributes that are available about that company that you could start to run analytics on.

00:30:45:22 – 00:31:11:15
Speaker 3
And so one of the things that just kind of taking it back to this, this example of the person who was looking to acquire a bunch of social media data, like one of the things that would have worked a lot better in that case is starting with what are the attributes that we’re actually trying to acquire, what are the attributes that are actually going to be useful for us to tend to run better analytics and then buying and onboarding the sources that that are relevant to that?

00:31:12:02 – 00:31:28:20
Speaker 3
And it just gets back to this point of what is that data, that internal data that you’re trying to get richer insight on and continuing to augment it as opposed to necessarily just starting with let’s look at all the external data and then try to figure out what’s what’s useful to us.

00:31:28:20 – 00:31:52:19
Speaker 2
So Matt, another question for our audience, just to get a general sense, where are we as an industry, which is how many sources of third party data do you use? And by the way, all of this is kept anonymous. So it’s just to give us discussion points for this webinar. We’ll give it a moment or so and the choices will pop up.

00:31:53:11 – 00:32:15:20
Speaker 2
You can say zero if you haven’t started that journey yet. One, two, five external data sources or third party data sources or more than five oh. We’ll give it a few moments for everyone to click. One of those single choice options. And then, Matt, over to you and Chris for some commentary on the results. And please, we’re getting some interesting questions.

00:32:15:20 – 00:32:45:10
Speaker 2
I’m about to pop one or two in, add those. As you think of them, there’s no bad questions in this discussion, so please go for it. And Matt, while we’re waiting, one of the things that struck me is, yeah, the two dimensions of how you can think about it, which is number one, what are all the external data sources which open up the ideas of, hey, now they know I can access the data that gives me the creative choice to think about how to integrate it back into my my internal data or the other way around, which is what business problems do.

00:32:45:10 – 00:33:05:01
Speaker 2
I want to solve that and therefore what external data I’d actually I could argue both sides of that coin, but I think at the end the other thing you have to think about is how do you match that data so that you could add it in incrementally and actually make it usable faster? So here are our results. The number one response.

00:33:05:19 – 00:33:30:05
Speaker 2
And again, I think this is a forward leaning audience is 47% said five or more external data sources right behind that and 40% is 1 to 5. And only 13% of the audience said none at all. So basically 87% of the respondents, which is consistent to where we were before, are in the stages of already doing this type of work.

00:33:30:05 – 00:33:38:17
Speaker 2
So Matt and Chris, are are these numbers what you’re seeing in the market? Any reaction to this?

00:33:38:17 – 00:33:58:14
Speaker 1
And yeah, I guess from what having seen in the market over the course of my career, that is rather accurate. I think the part that is much more modern is that’s no longer being merged inside of an Excel document. Right. I’ve seen books with macros that have 30 tabs, write an API calls through how many different plug ins.

00:33:58:20 – 00:34:13:04
Speaker 1
Right. And I think the benefit of the cloud and some of the next generation technology is really just the loading of that data to join and analyze and deliver other next gen tools like, like Tamr in order to assist on that. So I’ll, I’ll turn it over to Mike on a matt on his thoughts.

00:34:13:20 – 00:34:34:24
Speaker 3
Yeah. So one of the questions we got was that the stats we showed around external data are a few years old and so how has it changed? And you can see right here that the numbers are consistent. And what we’ve seen is there was a lot of hype around external data maybe five, six years ago. And I think we’re we’re past the hype cycle.

00:34:34:24 – 00:34:59:02
Speaker 3
And at the point now where it’s become a reality, people are actually integrating it into their their analytics. It’s part of their their core data assets, particularly if they have a lot of data in a in a data cloud or the data cloud like Snowflake. And so I think this is very consistent with the trends that that we’ve seen.

00:35:00:13 – 00:35:29:19
Speaker 2
So Matt, this, you know, nothing happens in our world today without the right kind of business case for this. I think that topic would be really useful to go into some detail on so far as our audience is listening, everyone, while I have some ideas, I can carry this forward to management and top management. But again, you don’t want to proceed with anything of new ideas unless you can sort of articulate the business value in business case over to you on that.

00:35:29:19 – 00:36:06:02
Speaker 3
Yeah, thanks. And and so but I think the good news about external data is that the business case for it is naturally very strong. And so kind of in in particular what we would recommend when forming that business case is there are two aspects of it. There’s what are the business benefits we’re going to get in terms of new insights and then what is all the time that we’re going to save by having this this data we’ll focus on on this first piece of what are these new insights that we’re going to get first?

00:36:06:02 – 00:36:49:18
Speaker 3
Because that’s that’s what’s really most most exciting. I think one of the things that is particularly important if you’re executing on an external data strategy within the context of a broader cloud strategy, is that the number of users of that data is going to be significantly higher and so the number of people that you can engage with throughout the organization to understand what business questions are they trying to answer on a regular basis increases dramatically versus I think kind of in the legacy on prem world where it might be difficult to get that data outside the hands of a handful of people.

00:36:50:18 – 00:37:17:08
Speaker 3
And so one of the things that we recommend starting with is just what are the the analytics and what are the reports that that people are already running on a daily basis because that that’s what gives you insight into what are the types of questions that that people are trying to to tease out. And then you could say how how would external data help augment that, if that’s a total addressable market analysis that’s being done every every three months?

00:37:17:08 – 00:37:49:08
Speaker 3
Well, if you have additional data about things like propensity to buy or what other products people in that market own, that’s going to make your your TAM analysis much richer, which is going to ultimately make your sales team much, much more effective. And so starting with those actual questions that people are answering today and then start to figure out by just talking to the stakeholders who are asking those questions, what other questions are top of mind that as you learn more, you’ll you’ll want to have have answered?

00:37:49:20 – 00:38:10:10
Speaker 3
I think one of the things that that happens a lot with data and in particular external data is when people see a an analytic or a dashboard for the first time, the natural question is, why is is this happening? And I think if you can work with your stakeholders who are on the business side in using using this data, I’m like, what are those?

00:38:10:10 – 00:38:18:12
Speaker 3
Why questions that they’re trying to drill into it can really help firm up the business case for where insight is trying to where people are trying to get insights.

00:38:19:02 – 00:38:43:12
Speaker 2
So Matt, I, I like this very practical approach. You’re already running reports and what, what are they serving and how can external data actually make them more valuable? Which means you really have a set of consumers for those reports. And some simple question is could bring out a business case pretty rapidly, especially if it’s a widely consumed report or a critical report.

00:38:43:18 – 00:38:56:17
Speaker 2
One of the questions, by the way, came in from said, which is what is the use case for chief risk officers? And I see that for box on the right around risk management. Do you want to comment on that, if you don’t mind? Matt.

00:38:56:17 – 00:39:29:15
Speaker 3
Yeah, so I think the big use case for risk managers is understanding or not knowing what you you don’t know and knowing what you potentially can’t know. So kind of in particular, I think the challenging part of risk is that what you need to be made aware of are typically events that you might not be monitoring on a daily basis.

00:39:29:15 – 00:40:09:02
Speaker 3
And so to a large degree, you’re you’re trying to see things as they are happening in real time. So before they actually impact your business. And so some of the external data sources can help provide kind of insight into trends that are that are happening in the market. So for example, if you’re on the in the manufacturing world and looking at supplier risk, if you start to see that debt is is increasing across the board, well, that’s a a a pretty big red flag that your suppliers might might be at risk kind of.

00:40:09:03 – 00:40:29:00
Speaker 3
Likewise, on the investment management side with some of the companies that you’re tracking either in your portfolio or potential investments that that you’re trying to make, if you start to see consumer spend getting weaker in certain areas that can show that, you know, there might be risks in your portfolio as it pertains to one segment of of the market.

00:40:29:13 – 00:40:38:05
Speaker 3
And the only way that you get that insight is through that kind of aggregate level view that you’re tying back to, you know, what are the companies or industry that I’m actually focused on?

00:40:38:17 – 00:41:06:21
Speaker 1
Yeah. And Matt and Mike just rather quickly, you know, to the point of incorporating external data, sometimes the question can just be as simple as what if I was able to expand my market by 10%? What if I was able to reduce my risk or have an event be alerted 10%, either less or one month sooner? Right. So you don’t have to reinvent the entire value engineering deal as opposed to just think about what if your preexisting process could be improved.

00:41:06:21 – 00:41:12:12
Speaker 1
Right. And I think that’s just an easier way to kind of just quickly get to a rapid business case or business about.

00:41:12:15 – 00:41:36:21
Speaker 2
Yeah, and Chris and all it matters for our audience listening in this idea of helping people in data management and analytics generate are a why the idiom council for the last year has been running a broad work group for data on ROI. And if anyone is interested, we’re in the next month or two actually publishing a series of best practice recommendations on our life for data.

00:41:36:21 – 00:41:56:02
Speaker 2
Programs are a way for data related projects, and our life for data as an asset on your balance sheet. So stay tuned. If you’d like to learn more, we’ll be putting out a link. I think it’s in chat right now just to help the audience in moving their programs forward. You do need r y and you also need to address risk issues.

00:41:56:02 – 00:41:58:18
Speaker 2
So all great comments for Matt back to you.

00:42:00:07 – 00:42:49:23
Speaker 3
Thank you. And so the other part of the business case around external data, which which I think is often overlooked, is that it enables you to automate a lot of existing processes. And so in particular, I think one of the things that we see pretty regularly with our customers is the way that they’re augmenting their internal data today is through the basically the legwork and b the elbow grease of their employees who are just adding additional information to things like Salesforce, for example, about customers or contacts.

00:42:50:11 – 00:43:44:16
Speaker 3
And so by onboarding external data and making that process automated of actually feeding that into your internal data, then people are spending a lot less time either kind of cleaning up that data because they need to update revenue figures that better wrong they need to update employee headcounts that better or inaccurate. Instead of spending time doing that, they can actually spend their time working on figuring out insights related to that, to the data so that the key idea here and key message here is that if if you really want to have a kind of robust and kind of automated process of enabling downstream data science, data analytics, data applications, then you’re going to need that

00:43:44:16 – 00:44:21:05
Speaker 3
external data to just be embedded in your data pipelines. And that’s something that the humans are constantly augmenting manually. And so how do you how do you actually do it? And I think one of the things that the cloud has enabled is it’s enabled people to kind of break up this this problem more efficiently and effectively. And so now it is much easier to use applications that really manage this process end to end in in an automated way.

00:44:21:05 – 00:44:38:24
Speaker 3
And so kind of the first thing you need to do is aggregate the data. We’ve talked a lot about at the beginning how Snowflake is making it much easier to actually establish this this raw zone for landing disparate internal and external tables and just breaking down those physical data silos so you could have all your data in one place.

00:44:39:08 – 00:45:22:24
Speaker 3
I think one of the things that’s that’s often overlooked with the integration of external data is cleaning that that data. The great thing is there are a number of and whether it’s external data sources or just external data services that you can use in order to help clean up the input data to ensure that if you are going to be using things like a domain in order to do matching or you’re going to be using address that you’re you’re feeding in clean and standardized addresses and in information, I think more, more generally, I just want to emphasize this point that, you know, it’s you can’t overlook the the importance of data cleaning when you’re going

00:45:22:24 – 00:45:52:23
Speaker 3
into any form of kind of automated entity resolution or record matching. Because the information that’s being used in order to match this information together, if it’s inaccurate than any algorithms you use or any approach you take to actually match that data is going to be fundamentally inaccurate. And then from there, you need to think about how are we going to create a persistent identifier that’s going to link together all of these different instances of, in this case, a company?

00:45:54:09 – 00:46:15:07
Speaker 3
So that when we look across sources, we have a clean primary key to join that data together and get all of the attributes we need in this mastered record where we’re actually able to see for this company of interest what is all of the information that that we have available and how can we use that in order to drive new insights?

00:46:16:04 – 00:46:40:04
Speaker 3
And so one joint customer of ours that we think has done a tremendous job on this journey is is Blackstone. And so when we started working with Blackstone, the majority of their data was on prem and SQL Server, and they had many people who were just manually preparing data for analytics. And so all of that work that we showed on the previous slide was primarily being done by, by humans.

00:46:40:08 – 00:47:17:16
Speaker 3
And so you can imagine when someone has a question about, you know, should we invest in this company or what is what are the the spending trends related to women’s fashion? Because we want to understand if we should invest in this other company, the amount of time that it would take to actually get that data needed in order to deliver those analytics was was so long that it in some ways it was like, you know, what’s what’s the point if I’m not going to get this insight in the next day or the next hour, is it still relevant to the business?

00:47:17:16 – 00:47:51:14
Speaker 3
Because we need to make decisions very quickly. And so what we’ve been able to help them kind of accomplish and achieve is this automated end to end pipeline for centralizing data in Snowflake. It now lives in Snowflake. So they have this central view of all of of their data sources, external and internal, and they’re using a machine learning based approach in container in order to automate the curation of that entity data.

00:47:52:01 – 00:48:14:05
Speaker 3
And so what this enables and what this empowers is for that data about. Who are all of the companies we’re tracking and what is all of the data that we have in that company universe? All of that is readily available and continuously up to date. On the surface, it sounds like this is very useful just for hey, my reports are more accurate and that’s the starting point.

00:48:14:05 – 00:48:40:17
Speaker 3
But back to this this business case, once you actually have that data asset that’s clean, trusted and has a diversity of attributes available, you can start to build data applications on top of it. And that’s that’s exactly what Bay and other customers of ours have have done is start to think kind of beyond just let’s look at reports and dashboards and more into let’s build applications.

00:48:40:17 – 00:48:41:06
Speaker 2
For.

00:48:41:21 – 00:49:08:08
Speaker 3
Individual stakeholders where maybe they want to see who are all of the companies who we’ve interacted with over the last 30 days where, you know, the stock price has has gone up by 10% or they were impacted by a specific market disruption. All of that data is available at their fingertips. And so they can package that up kind of however however they want.

00:49:08:09 – 00:49:14:02
Speaker 3
And the work becomes more about those insights than the actual preparation of the data.

00:49:14:02 – 00:49:33:21
Speaker 1
And that just is just as you had mentioned, they’re right to take away from the thinking about reports and thinking about actual business decisions, buying a security, buying a firm. These are rather time constrained opportunities. So getting that information as quick as possible permits the outcome to be achieved, or at least the opportunity to be explored. Whether they can or they cannot.

00:49:33:22 – 00:49:45:21
Speaker 1
If we get back to the risk management example, right. Waiting five days to find out I’m running out of liquidity is not exactly ideal. Knowing I may have a liquidity event sooner or faster better activates the firm in order to tackle the challenges.

00:49:45:21 – 00:49:46:02
Speaker 2
Of.

00:49:46:07 – 00:49:47:06
Speaker 1
What’s happened to your point.

00:49:48:19 – 00:50:09:15
Speaker 3
Yeah, thanks for that. And just to to to wrap here before we get into Q&A on some of the lessons learned on this journey. So for the people in there, the people who responded and maybe you’re at 1 to 5 external data sources, you’re hoping to get to five, five plus. What are some of the enablers of that?

00:50:09:15 – 00:50:33:02
Speaker 3
Number one is that humans working with machines are really kind of the only way to solve this problem. One of the things that the cloud has enabled is is enabled us to use machine learning based approaches much more efficiently and effectively and kind of, you know, really like machine learning isn’t possible without the scalability of of the cloud.

00:50:33:02 – 00:51:06:11
Speaker 3
And so embracing that and not just throwing everything over to an algorithm, but also having a human who can actually curate and give feedback on on the data is extremely important. The next is even when data sources seem trusted, it’s important to remember that some amount of data cleaning matters a lot because our goal here is we’re trying to automate as much of this process as possible so we could feed it into downstream analytics and not have to spend a lot of time preparing individual datasets.

00:51:06:18 – 00:51:32:17
Speaker 3
And so the way that we clean that data and have consistent approaches to cleaning that data regardless of how trusted or untrusted the data is, that’s that’s very important. And then and then finally, one of the things that we’ve learned along this, this journey is kind of break up the difference between primary sources, which is often your internal data that you’re trying to augment and the secondary sources that that you’re trying to to join.

00:51:32:18 – 00:52:04:08
Speaker 3
And we think it’s important to understand what are those things, whether it’s companies or customers, that we want to be laser focused on, ensuring that the data is right and that we have all of the attributes we need for that and then continue to kind of pile on additional attributes over over time, taking this approach just ensures you don’t end up with that kind of a swamp of a bunch of data, but you’re not really sure how it all links together in order to to create meaningful insights.

00:52:04:08 – 00:52:04:22
Speaker 2
For.

00:52:05:08 – 00:52:08:02
Speaker 3
For your stakeholders.

00:52:08:02 – 00:52:39:09
Speaker 2
So Matt and Chris, really excellent insight on thinking through everything from what are you doing today for your decision support that might be some low hanging fruit that the audience can immediately tap into to expand. Number two is the availability of data marketplaces and external data sources. You know, platforms like Snowflake and others will allow that to be a much easier process than self-discovery in the marketplace.

00:52:39:09 – 00:53:06:19
Speaker 2
And then third, Matt, your your whole approach to this is just having the data, both internal and external, isn’t good enough. You to think through. How will you link it, clean it, validated and make it trusted and useful. So that’s sort of the third leg in this step and that’s the experience since you were sharing. So I think a couple of quick questions and then we’ll wrap up and thanks to the both of you again for educating here at your one question.

00:53:06:19 – 00:53:32:21
Speaker 2
I would just fundamentally have it’s it’s and they relate to each other. If our audience is listening in and they’re going, okay, this is where we’re all heading, what can I do an individual to increase my skills to be more valuable to my company in this area? Any thoughts on things that you should think about upskilling for to be more valuable to your organization?

00:53:33:22 – 00:53:34:05
Speaker 1
Sure.

00:53:34:05 – 00:53:35:23
Speaker 2
So I just want to you want to go first.

00:53:36:08 – 00:53:59:17
Speaker 1
I’ll tackle that, too. To be I mean, a great question. Thank you. And General, I think just research, right? I think making sure you lean in to what’s available right through some of the societies that we’re all a part of to make sure that we’re thinking about what’s coming next. Right. There’s more than enough publicly available information to brush up on the capabilities of the cloud.

00:53:59:17 – 00:54:21:14
Speaker 1
I know a lot of our or a lot of the be a cloud providers and even snowflake themselves offer training courses about this about solutions. But even more importantly is probably trying to find out how to bridge the gap between the as we’ve discussed here, the business outcome you’re trying to achieve and then how you leverage these solutions to empower your organization.

00:54:21:14 – 00:54:28:14
Speaker 1
I think that’s always the last bit and there’s a myriad of ways to go about doing that. But over to you, Matt, your thoughts?

00:54:29:02 – 00:54:57:18
Speaker 3
Yeah, I think one of the things that’s important is just get your get your hands dirty with trying to generate analytics from three or four or five external data sources. I think it’s becoming increasingly important, as we’ve seen from the data we went through today in the poll questions increasingly important to be able to leverage all the external data that’s available in these these organizations.

00:54:57:18 – 00:55:20:19
Speaker 3
And I think when you actually kind of sit down and try to bring together the three or four or five and answer interesting business questions, you learn a lot about what is the value of external data, but that can be difficult. By just reading a brochure of a data provider. You need to really try to help someone in the business answer a question using resources.

00:55:21:06 – 00:55:26:07
Speaker 3
There are a lot of interesting challenges that come along with that that will improve your skills quite a bit.

00:55:26:19 – 00:55:32:14
Speaker 2
Yeah, I’m almost hearing using a known statement in the market. Just do it.

00:55:33:00 – 00:55:33:13
Speaker 3
Yeah, yeah.

00:55:34:05 – 00:55:36:02
Speaker 1
Yeah. So super experience.

00:55:36:02 – 00:56:01:18
Speaker 2
But until you start actually doing it, you’re not going to learn and and take courage to actually make that step and start with things that might be right in front of you with a friendly business person looking for more. A great place to start. So, gentlemen, thank you both for this great update. In a moment, I’ll just cover some of the resources, post this webinar.

00:56:02:01 – 00:56:17:21
Speaker 2
Any final takeaways? First, please let us know about how they can get in touch with both of you. You both represent exciting companies. Both are partners with the EDM Council. Matt Chris, do you want to just describe how they can get in touch?

00:56:17:21 – 00:56:45:09
Speaker 3
Yeah. Feel free to reach out directly or go to Tamr dot com and you can request a demo, set up a meeting. You know, we would love to help you work through whatever external data challenges that that you have. We understand that there’s there are quite a few gotchas along the way and so I’m always happy to jump in and consult and advise wherever where I could be useful.

00:56:45:24 – 00:57:03:04
Speaker 1
And often on the side just feel free to reach out. You can also go to the Snowflake website, download a trial account that permits you just to see how the solution is. Take a look at the marketplace in general and just reach out appropriate team. We’ll be in touch with you, similar to what Matt just suggested on his side.

00:57:04:03 – 00:57:37:01
Speaker 2
Awesome guys and thank you for letting us know about that. And for those that want to reach any of the EDM council resources, the website is EDM Council dot org. And there you can download and access the CDM C Cloud framework. We put that link and you can join the data our ally work group and get access to the papers and also visit a whole library of a near 100 webinars on these and related topics, all available for both our members and guests to learn more about this topic.

00:57:37:09 – 00:58:06:13
Speaker 2
So a two follow up come. Within the next 24 hours, everyone who registered will receive an email that has a recording links. You can pass that around your organization or to any of your friends in other organizations. Both Tamr and Snowflake have been graced to let us bring this to everyone to get this news out. And number two, within about the next four or five business days, we will be emailing everyone the Q and A response is a copy of this deck.

00:58:06:19 – 00:58:21:06
Speaker 2
So you’ll have that as a second set of resources as well. Again, thank you, Chris, to you and the Snowflake organization. Matt, you and the team, our organization for sponsoring the CTAM webinar and we wish everyone a great day. Take care of everyone. Thank you.

00:58:22:14 – 00:58:23:01
Speaker 3
Thank you.