Improve data quality with data products
We break down what business and technical buyers need to know when evaluating tools and building a strategy to improve data quality within the business.
Get on track to improve your data quality with these tips
- Learn how to start conducting a data quality analysis program and core questions to ask vendors
- Discover what features to look for in a data product platform and the data quality benefits they provide
- Understand common pitfalls in implementation and hear real-life implementation success stories
Get on track to improve your data quality with these tips
- Learn how to start conducting a data quality analysis program and core questions to ask vendors
- Discover what features to look for in a data product platform and the data quality benefits they provide
- Understand common pitfalls in implementation and hear real-life implementation success stories
Rather read the transcript? Dive right in.
Speaker: Shannon Kemp
Timestamp: 0:00:00
Hello and welcome. My name is Shannon Kemp, and I'm the Chief Digital Officer of Dataversity. We'd like to thank you for joining this Dataversity webinar, Decoding Data Quality with Data Products, sponsored today by Tamr. Just a couple of points to get us started. Due to the large number of people that attend these sessions, you will be muted during the webinar.
For questions, we will be collecting them via the Q&A panel. And if you'd like to chat with us or with each other, we certainly encourage you to do so. Just a note, Zoom defaults to chat to send to just the panelists, but you may actually change it to network with everyone. To find your Q& A or the chat panels, you can find those icons in the bottom middle of your screen for those features.
And as always, we will send a follow up email within two business days containing links to the slides, the recording of this session, and any additional information requested throughout the webinar. Now let me introduce to you our speakers for today, Matt Holzapfel and Nick LaFerriere. Matt is the head of corporate strategy and leads Tamr's technical solutions, working closely with customers on large scale deployments.
Prior to joining Tamr, Matt held positions in strategy at Sears Holdings and Strategic sourcing at Dell, where he led the development and implementation of new analytic sourcing tools to significantly lower procurement costs.
Nick is a technology leader focused on building out cloud deployments and infrastructure for early stage companies. Currently, he is the lead cloud architect at Tamr, where he leads Cloud infrastructure, technical operations and security efforts. At Tamr, he led cloud to cloud migration efforts, bootstrapped their SOC2 program, and is currently focused on developing Tamr's data products, enabling customers to use data product templates to consolidate messy source data into clean, curated, analytics ready data sets.
And with that, I'll give the floor to Matt and Nick to get today's webinar started. Hello and welcome.
Speaker: Matt Holzapfel
Timestamp: 0:01:45
Hello and thank you everyone for joining. Before we get into the content here we wanted to just learn a little bit more about where everyone is at with data products. And if you wouldn't mind, we would love if people responded in the chat with where they're at in terms of familiarity with data products.
Is this a new concept? Are you familiar or, are you off to the races and have actually implemented? We want to see some trends here and use that to just make sure we're tailoring the content appropriately. So we'll give a few more seconds here for everyone to key in an answer.
It looks like we do have one: D- multiple data products in production. That's awesome. A lot of A's, good number of B's and C's. Si really all over the board which it's not terribly surprising and it aligns very well with what we see with our customers, typically we're working with larger enterprises, typically a billion dollars and above who have a fair amount of complexity within their data ecosystem.
And yeah, just some last pieces here. It looks like we have a nice distribution of people across every level of familiarity. Which is great. We're certainly going to cover some of the broader definitional pieces around data products and hopefully be able to get everyone on the same page in terms of where data products needed, what's the value, and then, as an addition, if you're starting on this journey, what are some of the best practices that you can learn from others who have implemented data products and are starting to to see value from it. Just to level set all of us, the way that we define data products and a definition that we've seen that aligns pretty well with how others in the industry thinks about it is really that a data product, I think the kind of key part here within this is, that it's a consumption ready set of data that's used to help solve business challenges. One of the things that you'll see throughout this presentation is a key theme is really that last piece used to solve business challenges and really understanding: what is the purpose of the data product?
First and foremost, what problems is it trying to solve? And then work backwards from there. We'll get into some of the complexities that can make this make this challenging. But at the simplest level, a product in any sense is something that should fulfill a end customer need. And in this case, the end customer is typically some business stakeholder within the enterprise who's looking to use data in order to make a better decision, automate decision making, drive new insights, whatever it may be. And data products can play a very key role in that.
Before we get into some of the nitty gritty on how you actually implement a data product strategy and ultimately to enable your business to work a lot smarter and more efficiently be helpful to have some context on how we see the ecosystem today.
So Nick share some of that context.
Speaker: Nick Laferriere
Timestamp: 0:04:58
Yeah, so if we take a step back and look at historically, a lot of the companies that we work with when we talk with them 15 years ago, 20 years ago, they only had a pretty simple kind of architecture for what their data looks like. They would have an MDM that usually powered some of their operational applications that are critical to their business, and then they would export views from that to what nowadays we'd call a data warehouse, but would probably be an analytical database.
Back in the day that would power a lot of their B.I. and reporting for the decision making process. And what we've seen is, over the last 10 years, that the acceleration of people moving to the cloud, both where they store their data in addition to adopting cloud applications. That application has really exploded some of the complexity in the amount of different sources that you have and you have to deal with and also the amount of integrations teams expect. Now we see customers that might have multiple salesforces in addition to a HubSpot instance, just alone for their SaaS applications, just for managing their customers and their communication for them.
Then oftentimes we'll then see individual teams are then buying external data to try and generate fill-in values and specific reports and to help create better views in their data warehouse. In addition, there's now tools like B.I. tools are pretty standardized and expect basic integration to data warehouses.
You might have data science teams that are expecting certain views and tables to be constructed and consumed and more kind of tech focus. Companies are already starting to build their own AI/ML Models inside their organizations and expect high quality data products as the base to be able to train more of their models to be able to then put them into time into the existing operational use cases.
It's a much more complicated landscape that really lends itself to treating data like a product, where historically it was just a loose collection of view.
Speaker: Matt Holzapfel
Timestamp:0:07:08
And if we think about what the headline has been from this migration to the cloud, and really what a lot of the large cloud vendors really focus on, it's been the story on the right of the explosion of data sources. You're going to have a lot more data, and you're going to have a lot more tools in order to consume that data.
One of the things that can get a little bit lost in that story is that becoming data driven and actually getting the full value out of this new ecosystem out of all of the new sources of data, all of the new and fantastic end points for consuming that data, it does have a cost in the form of new requirements for data quality.
And so with BIM reporting, the big need was, we need data to be clean and standardized. We need our date column to be in a consistent format. We need country codes in consistent formats so that we can look at sales by country, etc. And so a lot of data prep tools like Alteryx, for example, are really born out of these needs.
Let's let analysts be more effective in how they clean and standardize data. As new endpoints have come to the market and there are more ways that people are consuming data things like data science, for example the need changes. It's not enough anymore to have data that's clean and standardized.
Now I need a lot of attributes, because the more attributes I have, the better model I can build, the more predictive my model can be and the more accurate it can be in its predictions. Yeah. As we've moved to more automated decision making with things like customer data platforms that are really driving and firing marketing events for really large scale customer interactions it's created this need for a single view of a customer.
We need to make sure that we have this integrated view of our customer profiles so that as we're executing on these events that we're doing that accurately and effectively. Next, as data apps have really grown and become much more mainstream with the growth of applications like streamlit, for example, where people with python and a knowledge of how to manage a database are able to build a simple data app on top of that, that they can serve as either an internal application or even a external one that they post on their website. One of the things that this enables is this puts end users much closer to the data itself, and also just exposes a lot more of an organization's data.
And so one of the things that this creates is the need for people to be able to fly fix issues as they see them. The more people who have access to data, which data apps are really driving, the more nuanced and narrow people's feedback is going to be. I'm looking at an account that doesn't look right. A customer 360, whatever it may be. If I see an issue, I need to have that issue resolved quickly. And then finally I think nothing has brought this point of data quality to the forefront more than what we're seeing now with these next generation AI/ML models. Things like large language models, for example, and just the broader class of foundation models that people are using in order to ultimately try to build a better customer experience.
Whether that's a common use case, like a chat bot, for example. People are putting AI/MLat the center of their customer experience and this creates a lot of pressure on all of the data quality requirements because if you put garbage into one of these models, then your reputation is potentially ruined. These are the types of things that end up in the New York Times or other media outlets where there is an issue with a model as a result of bad data that could have serious implications. And the importance of data quality has really never been higher. I think the good news is that we've been building towards this point.
And organizations are caught totally flat footed. But certainly there's more work to be done as evidenced by a lot of third party data that's been collected on just what is the state of data quality and data teams? I think, one of the common threads is that data leaders just can't scale their teams quickly enough.
The number of sources that people are working with has exploded, like Nick outlined at the beginning. And a lot of the impact of that has fallen on data teams to fill in the gaps. We don't have data that is accurate and trusted. Our data team is going to fill in that gap and that's creating heavy strain on the organization. A result of this, the decision makers aren't getting answers fast enough. And that is the number of companies that say or data leaders that say their company is data driven hasn't really moved despite all of this investment.
I think one of our customers is a chief data officer and he's in media & entertainment business, and specifically his internal stakeholders are talent agents. And for them the key metric that they measure is really around analytic velocity and time to insight.
They track for a question that a talent agent has, how quickly would they be able to answer it and have various scenarios that they evaluate in order to just understand how that metrics trending because they know that if people don't get the decision, they aren't able to answer their question fast enough, they're not going to use data.
And so their only way of being data driven is by being able to answer those questions quickly. And then finally, just on this point of AI and the importance of it. I think the hype is more than just hype at this point. Organizations are putting serious dollars behind AI and using it in order to stay ahead of the curve, which is awesome.
There are a lot of great opportunities that are possible. The issue is that executive don't trust the data that's going into the system. And so to this point, previously reputations being at risk and on the line, certainly it'll be interesting to see how the next months and years unfold on this front, because there is a lot of top down pressure to use AI, but, I think a lot of people generally feel like that could be very risky given the the state of of their data.
Speaker: Nick Laferriere
Timestamp: 0:13:29
Yeah. In addition, AI is not the only kind of technology that's driving some of these conversations. What we see is a lot of our customers, in addition to trying to do these data projects to make their company more data driven, it's often usually paired with a project to move to the cloud or switch clouds, for a lot of reasons. Usually the first project that a lot of large companies do when they're moving to the cloud is moving their data assets, their data warehouse over to cloud based resources to try and take advantage of the elastic compute there to enable latencies and applications that they couldn't have done before with on prem hardware.
Now, this is just gonna expand even drastically when people start to build out use cases and trying to leverage machine learning more and more in their business operations. They don't really need high quality data to be able to do that. and trying to leverage machine through technology problems at the same time is trying just to get to the point where they have good clean data.
Oftentimes they're going to have to be the ones building out the foundation of their cloud environments and cloud infrastructure, how they want that to work in addition to also providing integrations into the either data analysts or machine learning teams that are trying to consume that data to build out models.
This really puts a focus on and should make you think about how you want to manage your data and thinking about what your data product strategy is to be able to do this repeatable across multiple use cases. Which really dovetails into something we've seen for a lot of customers that they start with use cases approaches, but they're really successful ones are saying more along the lines of data product uses cases.
Speaker: Matt Holzapfel
Timestamp: 0:15:21
Yeah, just on this point of the use case based approach. I think, now more than ever, this is becoming difficult for data leaders. One of our customers at a health care company, he's been under a lot of budget pressure within the organization, effectively he has some important analytic and business initiatives that he needs to support.
But funding now is much more difficult to come by. They're switching to a model where they need to be able to have connecting chargebacks to the business in order to find some of the investment in modernization that they want to make as they're in the middle of their cloud journey.
Okay. And I think one of the kfirst insights that he had was the only way that he and his team are going to be able to be successful in this new model is effectively, if they operate in a way where instead of every use case is, a dollar basically, and then they have a dollar of cost they need to make it so that a use case is something that can be used much, much more broadly.
And so if they're getting a dollar of revenue it's not a dollar a cost. It's 25 cents to serve it and continuously decreasing. They know that just adding more resources to the problem and trying to fund their initiatives purely through just doing more use cases isn’t going to be effective for them anymore. They need to shift away from the use case mindset and went towards where they're really thinking of how they can be most profitable as a team which is what's really driven them to think about how they manage their data as a product so that they can operate more as a PNL as opposed to a a consulting staffing agency where you're doing much more kind of one to one mapping of use case to people.
Speaker: Nick Laferriere
Timestamp: 0:17:24
Yeah. So one of the parallels that at least I naturally gravitate towards when thinking about this problem comes from the perspective of how different SaaS application providers have their offerings, right?
Where there's companies that will do a traditional hosting base solution where they'll spin up a separate set of dedicated infrastructure resources for every customer that they have. And that's the more old school hosting of, hey, we onboarded, say, if you're running a WordPress site. We onboarded a new customer, we're going to spin up a whole new WordPress application for them, dedicated resources, dedicated databases that don't scale very well in terms of: How can you leverage your staff? For every 10 new customers, you might need a new employee to be able to manage all of that. Versus multi-tenant SaaS applications, which are the standard. Now you can scale your staff and your resources and your costs subliminally to the amount of customers and requests that you're scaling.
That's really also what kind of what happens with data products. When you start investing in the platform to be able to serve all the different use cases and have everyone pointing at the same set of resources, and you can standardize a lot of these processes, you can start to scale your data quality and your resources.
Sublinear to the amount of requests that you're getting from your stakeholders that literally makes it a very high leverage solution to how you can meet your business needs and the…Go ahead, Matt.
Speaker: Matt Holzapfel
Timestamp: 0:19:04
No, I was gonna say one of the I think most challenging pieces or biggest mind shift mindset shifts that we see within the journey towards managing data as a product ultimately to, to get much better data quality is the feedback management cycle.
I think it's become common wisdom and common knowledge that if you're building a product of any kind that one of the first things you should do is quote unquote, get out of the building, go get feedback, learn from customers and continue to iterate. And that's a muscle that can be challenging within kind of data organizations because it does require a unique set of processes and capabilities to be able to actually manage that feedback loop incorporated into the data product and ensure that data product is something that's continuously getting better. You don't have people doing kind of the classic move of ‘I see an issue with the data.I'm going to download it into Excel. I'm going to make changes. And then I'm going to be able to get my report done and serve that that end user's needs’. But rather, having good process and governance in place so that people aren't just rushing to make some end tweaks or tweaks on the edges to a spreadsheet in order to ultimately solve that kind of short term need, but are really thinking much longer term on how to improve the data on an ongoing basis.
And one of the things that has really, I think, been a bit of a challenge with building data products in the past has been the best practices for building data products can be expensive. I think historically if you wanted to build data products, here, you're looking at the options of: do I just hire more data engineers and stewards in order to take on this new workload in order to put in place new processes to improve our data operations? Do we just reduce our scope? Where instead of trying to hire more people and scale out maybe we say, we are only going to manage our customer data.
I think a lot of traditional approaches to master data management have fallen into this view of we're going to have this narrow set of data. We're going to manage it very tightly, put a lot of top down governance on it, but it's going to be trusted, high quality. And we just hope it.
It covers enough of what we are trying to do in order to be effective and successful. I think there's a comment in the chat, are these more like the dumb practices. I think certainly it it can feel that way. These all seem very expensive, but these are what we see people often doing in order to get out of kind of data debt challenges.
Speaker: Nick Laferriere
Timestamp: 0:21:55
Yeah. Recently, we've also seen an uptick in people trying to incorporate AI to also help out with this, where it can make your developers and your data engineers far more effective. A lot of these ML models have code plugins that are IDEs that if you use BigQuery, you can even enable it right in the tool console to generate SQL queries, just asking questions, and it can generate your report views.
That's really a high leverage kind of tool that can help enable and make your existing team more productive. It's also starting to see it used in a lot of other different use cases where you can ask it if you have a good system set up, you can ask it questions about specific rows or clusters of records. Or is this a person or a company? And you can then use that as a DLP tool. Hey, we do not want to process people inside of our company's data products. So if this is a person, please filter it out. You can type just directly into the system that you would normally ask a data store or curator to go and manually remove that row or ask them that question, what is this doing here?
Also, another good use case for it is enrichment of these sparse data sets where at its core, a lot of these LLM models are meant to just predict the next word that can also be used to predict the next cell or attribute in the data. So for some of these fields that have very common things where we see a classic example is someone will fill in a partial address of where Boston's the city, but then we leave state and country blank.
Most of the models will be able to guess what those attributes are pretty easily. And tying that into your systems, that can be a very good tool where you're not taking up any time for a human, and they can go and fill in a lot of those values for you. And it's also, if you have a system where you already have a baked in task management workflow, where you have tickets coming in and curators or stewards managing those tickets and then applying changes to your data sets, you can plug in AI models into that to give, Hey, this is what we suggested, what we think should happen and just have your agents accept those changes and auto commit the changes to your data, you can really save them a lot of time of having to do some manual process work there.
Speaker: Matt Holzapfel
Timestamp: 0:24:24
And AI certainly is not a silver bullet. It provides a lot of value as described here. But I think the important thing is that what AI really enables within data engineering more broadly is it enables data engineers and data practitioners to really use much more declarative logic in defining a data product.
And being able to reduce the amount of complexity with creating and managing a data product. It's very similar to what we've seen with software engineering practices, where the initial effort was just making software engineering. That was something that was much more tractable and able to understand distinct functions.
We're seeing something similar with data engineering, where you have a kind of distinct transformations DBT, you really drove this with analytic engineering, where you have distinct models in order to handle different parts of a pipeline that are very reusable. And what AI is enabling is taking that to the next level where now data engineering can be much more declarative in nature.
And instead of operating at the level of needing to write a lot of individual transformations, being able to push that down to AI and really focus more on the business logic of the data product itself. And if we think about where that is ultimately heading and in our opinion our view is that most organizations will have a data product platform that combines this more declarative approach to data engineering and data transformation with human interfaces in order to drive collaboration on the data itself.
In most cases, the people who are expected to deliver value from data, sales ops, people in sales ops, people in procurement analytics. For example, it might have some understanding of SQL and how to transform data, but really their superpower is understanding their business understanding, for example and customers in the case of sales and in the case of procurement, being able to understand suppliers and the nuances of the market.
So if they can interact with their data team at more of the business logic level and then give feedback on individual points within the data, then it really elevates the quality of conversation and ensures that there can be good, effective collaboration. And it definitely, it feels like we are very much heading in this direction where we are able to have the business and data teams interacting at the business logic level. And then after that data product delivered, be able to manage it through human interfaces. Very similar to tools like Zendesk, for example, where it'd be impossible to imagine shipping a product without some form of feedback where customers could say, Hey, I'm having this issue.
And then we're able to learn from it. We think the same is true with data products where it'll be unimaginable in four or five years for you to ship a data product and not have a mechanism for collecting feedback, for being able to track usage and adoption of that data product.
That's what's represented in these consumption services where collaboration is key and going to be on the benefits described ultimately having a data product strategy in place and having all these tools and mechanisms for managing those data products effectively should increase speed. Accuracy of decision making and make people feel confident in the data that they're using.
One of the really key drivers of data products. And in my opinion, the most important piece of data products is that it really breaks down the barriers between data teams and business teams and really makes it easier for these teams to collaborate on the data itself, because it's a common set of assets that are continuously improving.
And you have clear rules of engagement for how to give feedback on that data, how it improves and what the SLAs are associated with it. It's antithetical to the traditional approach of we have data that lives in a warehouse and then we have an analyst that sits in the line of business and then they do bespoke data prep in order to serve a use case.
That certainly can help get things done. But what you get is not great to leverage more broadly. And also the collaboration there is very siloed because you have this island of data debt, which is the spreadsheet on the analyst's desktop that other people know might not have access to.
And there, there isn't good built in collaboration in a lot of these cases.
Speaker: Nick Laferriere
Timestamp: 0:29:08
Yeah, so one of the other questions that often comes up when we start to talk to customers about why they're looking at enhancing their data is recently in the last six months. There's been a huge focus of the industry of everyone talking about AI. What's your AI story? How is AI going to disrupt your business or customers, your industry?
So one of the questions that we'd love to see what everyone's response to is in the poll. Where are they on their journey? You're just starting to learn about the technology, starting to use it potentially in some POCs. Have it in production and kind of noncritical workloads, where it's just on the side as you're growing your kind of expertise and how to use them, how to run them and how to integrate into your business, or are they already.
A core part of the reason we're asking this is because the first step, usually when we're having conversation with customers about why they're using the data products or why they're starting this journey on data products ss the very first thing we think that they should know is why they're doing it and what their goals are.
A lot of times that is to enable some of these things around being able to deliver AI or having a story around that or they may even have a vision of where they can use that.
Wow. we’re seeing a really wide range of results. It looks like most people are just starting at the beginning of their journey.
Matt. You want to go into this?
Speaker: Matt Holzapfel
Timestamp: 0:31:02
Yeah, I appreciate the question on: What do you mean by AI and do statistical predictive models qualify because I think that is an important point, which is that I think a lot of these technologies around AI machine learning people are using it in some way, shape or form.
It might just be some of the quote, unquote earlier versions of it. But I think from an organizational readiness standpoint those examples and being able to point to some of the statistical models that do drive decision making in an automated way can help really get buy-in on a much broader strategy related to, for example, using AI in a data product strategy.
Financial services is a good example. Very regulated industry. Certainly a lot of risk. And definitely has been one of the industries at the forefront of using AI and ML through some of the modeling that they do in things like underwriting, for example.
And I think being able to point to examples like that through an organization really helps. Helps a lot with getting buy-in and budget above all, whether it's an AI strategy or data product kind of strategy specifically starting with your why is by far the most important piece.
We love when we get on a call with a prospect or a customer and they say, we need to implement a data product for our customers because we are trying to improve the effectiveness of a CDP that we implemented. Very clear why we know that it's going to be a very successful engagement. They're not just shopping for a stack, if you will, but they do have a clear business problem that they're looking to solve as you get going with the data product strategy.
Some of the specific challenges that we see organizations facing include how to aggregate disparate sources. And I think more importantly, it's just understanding when is this going to be needed in order to really drive meaningful outcomes. I think particularly when you have data sets that are frequently changing things like external data. And also when there are a lot of insights in the long tail of your data, this becomes a important challenge to just have the top of your mind as you start to move towards a data product strategy in the world of just basic reporting or trying to understand maybe one-off customers here and there that this type of problem isn't going to be particularly painful, but the more that you're trying to use your data as a product and use it for things like automated decision making, but the more that you need to have an answer for, one, just how big of a problem is this? Is this for you? And two, do you have a solution and a way to solve it? Because data integration is not free. And so being ready for that going into your data product journey is really important in order to ultimately meet the timelines you expect.
Speaker: Nick Laferriere
Timestamp: 0:34:12
Yeah, so when we start talking to our customers, one of the things they get a little bit afraid of is how big the project can feel when they start talking about what their end goal is and why they want to do it and where they would love to be at an end state. And what we really want to try and help them define and start with is define their use case for what is their minimally viable data product.
And oftentimes what we want them to focus on early on is data integration and not system integration. For the data integration, the architecture of that look a lot well, there's a lot of boxes and arrows that I can click on the next slide. There's a lot of boxes and arrows and it looks like there's a lot of moving parts.
At its core, it's really just trying to get all your data into one place into a data warehouse where you can then start to clean and aggregate the data and create views where you can then point upstream applications that to use to drive the use case, right? So for one of our customers that effectively meant that we would just take all their data that they had, whether it's from their shopify from their Facebook advertisement, their Google advertisement and just getting into their data warehouse and then they use one of our customer focused data products to clean it up, create a basic view of their customers, but they're unable to put into their marketing tools to do segment analysis and targeted marketing to their customer, right?
And then they're able to really say, Hey, we can create this view. These are the segment demographics. We really want to go after and target it. No, it wasn't tied to any of their operational kind of systems where it's not something that's tied to anything that's trying to generate recommendations inside their website in real time as people are checking out or trying to do up sales or managing their golden record of necessarily a customer.
It's just trying to build what is the smallest kind of use case that can be valuable to the business and answer ‘why now’ if they set up that architecture in a good way and have some forethought on how to do it, doing the system integration piece next is a lot more of an incremental step than it is necessarily to a massive project that they were afraid of when they started.
Now, this is a different customer, Novacure, that we have a lot of, for us, external material on about their data journey. There's a QR code on top that you can scan to find out more about what we've done with them. But they started off as just doing the data integrations. But once they had that working end to end and then trust in that data, they then built in integrations using MuleSoft to be able to then push that data back into their SAP system to actually drive their operational use cases that are mission critical to them. And that's really what we mean by system integration. That's a much higher bar usually to get done, where you start getting a lot more things around needing to go through change approval boards and get a lot of other teams to sign off on those changes.
So we tend to have customers focus on starting on more data integration at first, build up trust in that data, build the basic views that they want to use and then eventually switch over to doing the system integration, actually consuming those systems. Like the output of their data products into their operational systems.
Speaker: Matt Holzapfel
Timestamp: 0:37:55
And another pretty common challenge is just around how do you sort through this mix of legacy homegrown and modern tooling in order to improve your data KPIs across the board, whether that's from data quality to analytics and automated decision making.
And we'll share a customer story on how they were able to do or manage this trade off effectively. But first I do want to address the question in the chat here on how do you define a minimum viable data product? I think that's great question. And I think, one of the kind of things that's really important thinking through how to define a minimum viable data product is that it should really help to prove out both the business goal and reduce and serve the business need. And then also prove out technical goals or reduce overall technical risk.
And so what I mean by that is let's say that you are building a customer data product where ultimately that data product is going to be used for customer segmentation. It's useful as part of defining the minimum viable data product to also scope out how that customer data product could be used for a different application such as sales territory alignment or within a CDP because one of the things that's really critical to managing data as a product is being able to go through that full end to end loop of adding new attributes, new sources and taking feedback so that the data product can serve multiple use cases.
If the minimum viable data product only maps to one single application or use case, there's a lot of risk that you're back where you were, where you have this data asset that actually is high quality and good. But it serves a very narrow need and doesn't have a clear path towards being able to serve a much broader need and also doesn't have the broader buy in that if we're going to be doing any sort of analytics around our customers that we're going to use this data product as the foundation for it. And so doing some of that extra work up front, just to make sure that what you're building can actually scale and be leveraged across multiple applications is really important and this kind of gets at a question in the chat on just defining your use case and the contradiction between that and moving away from use cases. I think that the word ‘use case’.
We definitely presented that as a bad word, but ultimately you, you do need downstream applications and it's the one to one mapping of a data asset to that individual use case, and then just creating that silo, that new silo, it might be an aggregated data set, but it's still a silo and I'm still a source of data debt ensuring that you're not going down that path is really the key part of it and ensuring that your minimum viable data product and where you start can serve multiple purposes and can prove out that you have a good and effective feedback loop and a way to improve that data product over time. Adding sources, adding attributes, curating those attributes, because that's really what's challenging and different about a data product versus just, Hey, this is a pretty good table that someone is using for a dashboard.
Speaker: Nick Laferriere
Timestamp: 0:41:24
Yeah, so back on this whole concept of the journey of how to do this. And this is really a journey of modernization and kind of switching to the cloud. So we've been working with old mutual now for, I think, a couple of years, and they started out when we first were talking to them, they didn't even have an AWS environments head up yet.
They were just, they were still running an MDM system with parts of it running on a mainframe, but they knew they didn't want to continue to support that into the future. So that was really a huge chunk of their reason why. They wanted a more modern system that they could use and they knew that their data wasn't great and that they thought that there was a lot of value that they could derive from it.
So when we started working with them. The first thing that we focused on was an analytical solution that would generate some clean data set on their customers that they wthen used in like quarterly review processes and trying to do some audit based processes of Hey, we are selling insurance and we have this insurance policy out. The policyholder we have a death record is no longer alive. Can we cancel that contract and close it out? So we stop making payments or stop having premiums and go close the loop of that individual kind of customer story or customer journey. Once they had built up good trust in that data, they started to then use APIs that were pointing at their data product and the data warehouse at the point of consumption, where they built it into their business applications to search for existing customers when people would open up accounts, so that way they could link it and say, where do you have a record of this person? They're just adding an extra account or an extra person to their account and not create a whole new identity and I.D. for that person to prevent the sprawl of bad data from propagating throughout their system.
And then they're in the process and almost fully done with switching over to now pointing their business applications to other update endpoints to create endpoints on top of the new real time source system that's directly tied in and to their data product. So that architecture when you zoom out, and if you get the next slide this is what it looks like from our perspective, where they started out building out a landing zone and putting exports from their existing MDM system into a landing zone.
And we took that up, built a reconciliation process to use our secret sauce of AI/ML to clean that data and really make it the highest quality that we could and then put it into a real time store that was able to serve them to run analytical reports off of. They then started tying into APIs, starting mainly with read and search APIs, pointing at that real time system.
And that store eventually switched over to using the full crud suite and pointing their business applications at those end points. And then they were able to slowly move off the rest of their applications and using their old MDM system to just using the kind of new latest and greatest system.
But this was a long journey over several years and several distinct phases, but they started with where they wanted to be and we were able to incrementally make our way there, starting out with answers or delivering business value as soon as we could.
Speaker: Matt Holzapfel
Timestamp: 0:45:11
Great. And just as you head down this journey, some of the questions that we think are really important and shaping the direction of the market. One is just around the domain specific needs that will come up. I think one of the things that we've noticed with our customers who've implemented a data product strategy is that the nuances of the industry become increasingly important, the broader that data product becomes.
And so the more external data, for example, that you're trying to integrate into your data product, the more important it becomes that you're partnering with people who really understand the domain and are building tooling for the specific needs of those domains.
Speaker: Nick Laferriere
Timestamp: 0:45:58
Yeah, the other thing that is, as AI becomes more and more popular, almost every tech company will have something that they'll say is AI, and they'll say it's even generative AI, and if their answer is just that they have more advanced search, that usually just means that they used an AI model to generate embeddings and put it in an extra database and it's a slightly more advanced search bar, or they'll have some chatbot.
And sometimes that chatbot can just be trained on their doc site is really just basically automated support. You should really be looking for do they actually have use cases where they can say, Hey, we use AI in this specific way to deliver this value. It's not just a chat bar that pops up when you log in or an extra search bar added somewhere.It really should be tied into their product and actually have a concrete use case.
Speaker: Matt Holzapfel
Timestamp: 0:46:56
Yeah. And then finally certainly the data products mindset and managing data as a product and also layering in AI can require some changes in skill sets. And certainly this is something to just understand going into your journey is do you have the right people and skills in place in order to make this successful. Really, I think one of the big promises of AI is that it will make it much simpler for people to be very productive with, for example, highly technical solutions. And so it should ultimately simplify the skill set and reduce the kind of diversity of skills that are needed.
But this is something to push on in order to really understand what is the maturity of the kind of application that you're looking to adopt.
Speaker: Nick Laferriere
Timestamp: 0:47:08
Yeah, and just some kind of core takeaways from my perspective to wrap this up is that as much as a lot of us here on this call are probably very tech heavy and technologists that are core, very easy to get sucked up into what is the latest, greatest technology and spend all your time talking about.
Oh, should we be using Snowflake or BigQuery or AWS as your GCP? Technology at the end of the day is just a tool. What really matters is the outcomes that you're driving, the business cases that you're enabling and really focusing on the outcomes. Like technology isn't the only interface that matters and tying together the systems somewhere at the end of the day.
There's humans that are involved in this process. They're either consuming the data in some way, shape or form, whether that's someone at a POS system in a store, trying to look up someone's customer loyalty number to do a return or order. You have a seller that's trying to get updated information for a contact to renew a deal.
It's humans at the end of the day, they're consuming this data. So you have to use the technology to make their jobs easier and to enable them. And another thing that we try and really preach our customers is don't try and solve everything day one. Start with something that is as small as possible that can deliver value to your business and iterate from there.
Because if you can deliver value early on, it's easier to get the further investment to continue going down the journey. Versus if the bigger you try and do something at the start, the more permanent it is to be running late. Or going over budget. Or just not getting finished. And then the last thing is a lot of people will look at some of these slides and be like, Oh, that's simple.
We think we have the team to do that. We can do that. The question then is, is it's a false economy in my mind. It's more of a question is, do you want to spend time on building out these data and system integrations yourself? Or actually figuring out how to tie it into your business applications and your business process that's closer in line to your core business and what the value your business offers than it is the tying together these systems.
Q&A Session: Start listening live at 0:50:19