datamaster summit 2020

DataOps Architecture: Keep the Human in the Loop

 

Mark Marinelli & Katie Porter

Mark Marinelli, Head of Customer and Partner Enablement @ Tamr
Katie Porter, Sales Engineering Lead @ Tamr

Leverage human intelligence for training the model. Receive trusted results, delivered quickly and at large scale

Transcript

Speaker 1:
Data Masters Summit 2020 presented by Tamr.

Mark Marinelli:
Hi everyone and welcome to this Data Masters session. DataOps Architecture: Keep the Human in the Loop. I’m Mark Marinelli. I run customer and partner enablement here at Tamr and I’m going to be MCing this, but I’m going to be joined by two of my colleagues, Liam Cleary and Katie Porter. Before we dive into the material, Liam, Katie, can you introduce yourself starting with Liam.

Liam Cleary:
Hey Mark. Hey everyone. I’m Liam Cleary. I’m a technical leader for Tamr based in [inaudible 00:00:36]. I’ve been with Tamr since more or less the beginning of the company and I specialize in post sales customer success.

Katie Porter:
Hi, I’m Katie Porter. I’m a sales engineer at Tamr based in Cambridge, Massachusetts. Currently the financial services vertical lead and a lead of the technical team, which I’ll be focusing on for today’s conversation.

Mark Marinelli:
Alright, quick overview. We’re going to talk about what we mean by the human feedback loop in our workflow. Liam’s going to dive into a case study that shows some of the novel applications we’ve had of the technology. Katie’s going to give us some more examples of field driven innovation in this realm and then I’ll take us back into some recent advancements that we’ve made in the core Tamr product along this topic.
So starting off, a reminder, Tamr’s mastering capability really is based on human guided machine learning. Meaning that we have applied machine learning to do what was historically done in different ways and our technology is reliant upon humans to provide examples of where schema should be mapped or how data should be mastered or matched to each other. That is really our bread and butter. We would definitely claim that our approach where we are offloading a lot of what was historically a lot of manual rules driven labor onto machines, is accelerative to the mastering process.
Far more intimately entwines the human expertise into creating an accurate result, has a raft of advantages versus traditional mastery. However, if we’re going to do better mastering through machine learning that means we really require machines to learn from people. That human expertise is essential. Machine learning is exceedingly powerful, but not particularly bright. It needs a lot of help from people who understand and can provide examples of relationships among the data that the machine can then propagate across all the data. So, human expertise is absolutely essential. The more expertise we can bring to bear and the more diverse expertise we can bring from a variety of different constituents for the data, the more training data that we can get for the model, the more validation we get from the model, the more confidence ultimately we can have in the results. And the way we get more expertise is by getting more people involved.
This is a bit of a chart here, but if you look broad brush at what we mean by The Human Feedback Loop. What are the areas within the machine learning based mastering pipeline, where we get human beings involved? How do we get them involved? What do we ask of them?
We’re looking at data developers, scientists, analysts, and citizens. Each of them having different types of knowledge that they bring to bear, collating that in this intelligent feedback engine, which knows what to ask from what people at what stage in the pipeline. You’re used to using our stuff for, in the matching realm we’re asking you for pairs of records or groups of records, clusters of records, do they all belong together?
When we’re merging to get golden records, we’re asking you, what are the best source values? Enrichment. We’re asking for what category does something belong and how do we tag this in a taxonomy? You’d be used to getting some of these questions, but really what informs all of us and how do we think about innovating beyond what we’ve got in the product so far, really is that how can we get maximum improvement in the value of our data with minimal effort, from the people that we have, who have all of this knowledge.
The answer to that is by asking the right questions from the right people in the right context. If I keep asking someone questions that they can’t answer, they’re going to stop paying attention. If I ask them to traverse a lengthy process to give me that feedback in a user interface that they don’t understand and outside of the context where they’re using the data. I also may have issues gaining as much input feedback as I would like.
So, before we jump into the case studies and the field work, just some guiding principles here about encouraging humans to stay in the loop. We want to encourage participation. So, like I said, don’t ask me questions if I don’t know the answer. Don’t keep asking me the same question. I should be only asking very pointed questions that’ll give maximum uplift in the quality of the data. Don’t make me hunt around for the answer. Give me enough context, visual context, or whatever you can bring to bear so I can very quickly ascertain whether these two records are matched for example.
That’s getting more participation, more confidence in the results. Which also improves the number of people and the density with which we’re going to get participation. Tell me what happened if I gave feedback, was it good or bad? Did I make the model better or worse? If I made it worse, tell me why I made it worse. Who else provided feedback so I can go and collaborate with them and just know where these data came from. Really, I’d love to know why the machines are making these suggestions. So, the more we can do in giving people very targeted opportunities to give feedback and the more feedback we can give about their feedback, the more we’re going to get from them.
So, that’s the prices. I’m going to hand over to Liam now who can take us through his case study on data mastering.

Liam Cleary:
Thank you very much, Mark. So, I’m going to walk through a very common case study here. We’re going to walk through federated data mastering. So this is a very, very common pattern for mastering data in an enterprise or a large organization. When we talk about data mastering in a federated sense, we have the common hallmark salvaged. We have source systems, they’re onboarded to a singular pipeline, a mastering pipeline. We have experts who can curate your datasets, your entities. In this case, we’ll be talking about suppliers across all your source systems into a single mastering mastered view. We also very typically, you saw it in the previous slides. We have external sources, enrichment sources that you used in a central efforts to integrate your data, to unify your data. We have the output of this effort, which we call a single unified dataset, or a unified view of your suppliers or entities.
This can be used for various use cases or downstream consumption points. The next piece or not always part of a federated view, but part of a federated mastering view. We often partition the output into separate views for different consumption points. So what that might look like at a very high level overview or architecture is something like this. Essentially what we have, if we start in the bottom left corner here, we have our source systems. There are many, there could be ERP, CRMs, SAP systems, et cetera. We take them to a central point. We then curate them here in the center of the slide where our SMEs, our subject matter experts, to review and curate. In this case we call them clusters, but your entities or your single golden records of your suppliers, and this is going to consume typically external sources or internal reference data management systems who have mastered data for the curation effort.
This external source is very typically integrated or used in the pipeline in the central hub, as opposed to in the local hub or local spokes. This produce a unified data set. It’s from this point that we take the central unified view of all of your source data of your suppliers in this case and we can consume in different ways. The two most typical consumption patterns that we see are a global view for your dashboards and analytics, and then your individual views, your partitions back at the source systems. Back where your end users use that integrated view for various day-to-day business operations. So if we think about in this very typical high level architecture for [inaudible 00:08:55] mastering. The role that your humans, your experts have to play in this view or in this architecture. We see are typically three very common touch points for our experts, for our users, with our providers of data and our consumers of data.
So the one I’m going to talk about first is in the top right-hand corner. Global view of suppliers. This is often perhaps considered very late in the pipeline or very late in the curation effort, but it’s often the first point at which you start to see real quality issues, real consumption issues with your data set. So if we click onto the next one, going back to this idea of the right context, the right question and the right people.
In this example here, we’re looking at a consumer dashboard. We’re looking at something like monthly sales by state. This is very important context for the question that’s about to follow. The right person is looking at this data and are going to ask the right question. What we see here very importantly, integrated into this dashboard is this little give feedback button in the central panel of the screen. Why is Wisconsin profits so low? And very importantly, the user, the executive view is able to click on this button and very quickly it gets the right people to start asking those questions.
Here in this little illustration, we see Matt, he’s the executive leader. He’s asking a very simple question, I would say at a session. He’s saying, why is this prompt ratio so low? Very important he’s able to interact and single out or find the person who might know the answer to this question, but even more important than identifying the right people in this case to ask the question of, he’s also capturing the correct context for his question. So we see a little data snippet here at the top of the right modal. We’ve got Wisconsin, we’ve got the profit margin, we’ve got longitude and latitude, et cetera. So this data point is captured in the correct context with the question and it’s been pointed at the right person.
So this is very typical. This is right down, as far out on the consumption side as we can go and we’re already capturing these three critical pieces. What’s very important is that this captured context question and people can be fed back in an integrated streamlined fashion back to your curation effort. If you look at another common example. We’re not going to go so far downstream to the top and the right side, we’re going to come back towards the source side. So what’s this look like in a very different example, a little bit closer to the data. Here we’re looking at something as straightforward as categorizing parts. So in this example, we’re saying, well, the context here is extremely important. Two it’s not a dashboard, it’s some snippet of metadata you have in the system presented to the curator.
In this case, we’re showing an image. An image is not terribly informative. Sometimes you’ll see handwritten descriptions or [inaudible 00:11:38] descriptions entered, keywords and so on search criteria. What we’re looking at here is a category tree. Machinery, welding, and soldiering. The soldiering wire is the image and you’re trying to categorize your parts data. In the center, what have is a question from Tamr. Tamr saying something as simple as, hey is this the right category for this part? The part in this illustration is welding wire. Does suppliers indication of where it might go? But very importantly, you’re asking this question in the context of welding wire to a category manager. The expert in this case is saying, actually no, it’s not manufacturing services. It’s actually manufacturing part. So it’s correcting Tamr. In this case, it’s giving it the correct expertise and feedback, and importantly, the context that’s captured or missing, or what could have been missing here, but it’s captured correctly is that they’re talking about the individual item that was bought or sold and this is very important for this use case.
In a different use case, services could have indeed been the correct answer. You could’ve put your welding wire as a component for a service’s provider who had done the welding for you. So capturing the correct context, presenting the correct context to your SMEs, to your subject matter experts, to correctly feed back into the system and very importantly, the data once again is captured as part of this. Here we see two pieces of information, the purchase description and the supplier name, and this is linked to the feedback that it gets from the human being. Where this becomes very, very powerful. It’s not just where you have static images prompting the SMB, setting the context would actually where images or other types of metadata directly set or are linked with the actual data available to the machine learning algorithm.
So in this example, this is the final one. We’re really talking about the central curation effort. Here we have a very data-driven approach to curating your data, to unifying your data. What we have in the center of the screen is a name, it’s an address. We also have a geometry associated with it and that geometry, that data point can actually mapped out, not just as an image, a static image file but actually as a real time map and tile server. So what we see here on the left hand side is the exact context in which you want to ask your question about mastering to site locations for your suppliers. We have two data points, they’re 4.2 meters apart and they’re displayed in view for the SME for the curator to answer a simple question. How are these two supplier sites the same? With that question they’re able to answer something very simply yes or no.
On the right-hand side, you’re asking this question of your procurement leader, for example, your procurement manager, your sourcing experts and all information. The response from the SME, the question, the data points presented, and this image populated by a [inaudible 00:14:29] or from this case are all captured correctly in a system and can be leveraged by the analog columns. They’re the three main consumer points that we see in a federated data mastering architecture.

Mark Marinelli:
Thanks Liam. Now we’re going to move on to Katie, who’s going to share with us some of the fields oriented innovations that her team have been up to.

Katie Porter:
All right, thanks very much Mark. I’m excited to be able to present on behalf of some of my team who have been specifically investigating how we can actually gain even more improvements in this engagement with subject matter experts in order to absolutely maximize the lift that we can get out of their engagement and feedback. So why is this something that we should even be talking about? Tamr already has very efficient feedback workflows. We build a lot of context into the product, as Liam was just talking about. We have high impact questions selected for customers. We use confidence scores in order to direct their attention, but we can often see the same use case over and over again, being deployed at different customer sites. So our team has identified that there’s even more value we could be gaining out of our interactions in the tool with subject matter experts in order to be able to give some of our customers a warm start if they’re doing a familiar use case.
So how can we better leverage these past efforts by SMEs as they’re engaging with the tool. I’d like to start by just orienting ourselves a little bit to that feedback, which Liam’s already giving us a little preview to, in terms of how experts actually engage with the tool today. There’s two main touch points. One in the model training where we have experts responding match or no match to pairs of records and then the cluster curation where we’re looking at multiple records and doing actions like split, merge and verify. So again, in case any of our listeners have not worked deeply with the team or tool, I just wanted to anchor that again, in terms of what those interactions look like. In the model training process, we can make decisions about whether pairs of records are a match or a no match. The goal of that is in order to train our model, to identify duplicate representations of the same entity. So in this case, Randy George.
We also interact with this type of question and answer with subject matter experts when we’re further tuning the model. So gaining improvements on how our data is being handled by the machine learning. In cluster feedback, we have a view of all of the records related to an entity. So here you can see eight different variations on Bob Stokes and here as an expert, I can take actions such as merging, splitting, and verifying that all eight of these records actually do represent the same person. So here I can also take exception handling actions, as well as take actions that can actually impact the model. That is a newer feature that I’m going to tease a little bit in this presentation and then Mark will go into a little bit more depth later on.
So going into that model training process, really focusing on that first touch point with our experts. We have subject matter experts come in, look again at those pairs of records. So Diane Arny and Dianne Arnie with different spellings and then we ask them to decide, are these the same person or are these actually different people? The outcome of those decisions is that we have a trained model that can be applied out at scale to the entire set of data. This is also an iterative process. So as we add more labels, we get in a new and improved model. So this is a critical link that I want to point out, is that the data and the labels on the data are an asset that then are directly linked to an outcome asset, which is the model. So we have these two different assets related to a particular pipeline and they are inherently linked.
So if we wanted to try and leverage an effort from a particular project. Let’s say that we’ve trained a project and a model to master and identify duplicate customers and then we want to reuse that on a new set of data. We can do that in a couple of different ways. The first way that I’m going to talk through is using that first asset, the actual data labels. This is a relatively simple process where you can via API copy the data labels from the source project into the target project. There’s a little bit of complexity here and that in order to have labels, we need to associate them with the records that were presented when the expert was making that decision. So we also need to copy some of the data from our source project into that target project. We can then run a train predict job, get the train model and have our master data as an outcome.
The challenges here are that we don’t always want to copy data from one project or deployment into another because the data might have some sensitive to it. Then we also have this data that’s hanging around from our source project, which we might not want to maintain as we push it through our downstream pipeline and into our analytics.
There is an advantage to this approach however, we can further add new labels in our target project in an additive way, where then we can tune the model. So now we have a second model that has a little bit of variety added to it based on us adding additional labels with our target project data. So that allows us to iterate and improve on our model without losing that initial lift that we got from the source project. That’s one of our options. The other option is to use our second asset from that source project, the trained model itself. So again, we can very easily copy and essentially paste by API, that trained model into our target project.
That’s a very smooth transition, but it has some assumptions. One of the assumptions is that it’s a highly similar use case because again we have this relationship between the actual data and the labels associated with the data and the outcome asset of that trained model. Now if we wanted to further tune that model. We have some data variety in our target project that was not represented in the source project. We’re essentially going to lose any relationship with that original trained model. However, this does have that advantage of not copying any of the source project data into our new environment.
So these are our two options. Today, before and was the state of the field as we were investigating potential ways to improve on this and extend our options. So now we understand a little bit the constraints though. How can we leverage our past SME efforts, but securely and by securely I mean, we don’t want to be copying data from one environment into another. We also don’t want to be at risk of losing that initial progress that we made leveraging the SME feedback from the source project. So the advantages that we can gain when leveraging that feedback in this new way, which we’re labeling bootstrapped models, is that we can take advantage of any effort that our experts are putting into a particular use case. That allows us to deploy more quickly, as well as improve our overall solution quality. So we have more time that’s going to be focused on the edge cases, so to speak. Whatever makes that particular deployment of a use case unique, what is the actual difference in your customer data as compared to other customers or another business unit within your organization?
It also allows us to have more time focus on other areas of the pipeline, your analytics, et cetera. The overall solution.
So when can we use these bootstraps models? Why are we essentially putting time into investigating this as a solution? Data localization is one of the challenges that we see with our customers, where they are unable to store or move data across regional boundaries. So we might train a model on your customers within the U.S and then launch to apply that model out to your customer base in South America or Asia. But then again, there’s probably going to be some data variety in, for example, how addresses are represented in each of those locations, such that we really would like to be able to further tune the model and not lose the initial learnings that we have from our us customer base.
We also can see a lot of value in this within mergers and acquisitions. Where again, you often are repeating the same process over and over again, identifying overlap between your internal customer, supplier or product base and a potential acquisition and their customers and suppliers and product base. But again, it’s very highly sensitive data that needs to be maintained in isolation. So there’s risks to applying at a model without ability to tune it as well as sharing data across these different deployments.
Then the final reason is why not? Because anytime you are seeing reuse of use cases internally across multiple business units, across customers. Why wouldn’t we want to gain a jump-start on these different deployments?
Here’s the overall flow of the solution that we’ve kind of came up with and then I’ll go into a little bit more detail as to how we actually investigated this solution. So as we set up before with our different options, we have the source project where we’re going to be doing that initial training of the model. We label pairs results in the model A. We can then export that model. So again, the data is not being transferred to our target project and apply that out to the data. So then what happens is that Tamr provides predictions based on that model as to how this data should be labeled in terms of matches and non-matches. Then here’s the innovative part. As a method to gain that relationship between the data within this target project and that model that we have imported. We sample some of the pairs, specifically the high impact pairs that Tamr is recommending and applied labels based on Tamr’s recommendations.
So now we have labels that are linked to the data in our target project and are able to rerun that model and get a very close copy of the original model from our source project. Now we are able to iterate on that model. Have experts come in and add additional labels, and we are able to tune that model in a much faster way. So now the outcome of this is that we have a hybrid model where we have taken advantage of feedback from our source project, but also are able to have it be very targeted to our new deployment of this model.
How did we go about proving out this solution? We have our trained model from our source project and then our bootstrapped model from our target project. We then leveraged feature importance as a mechanism to evaluate the similarity between these two different models. So here’s an example of that output, where you can see the different ways that are associated with our different features or attributes that are being leveraged in this particular use case.
So we went through several iterations. Looking at the outcome of these newly hybrid models. The first thing that we tried was labeling and sampling high confidence predictions of matches and non-matches. The outcome of that, is that we actually found that very few features or attributes were being leveraged in our outcome hybrid model. So we tried expanding our sampling to include low confidence predictions from our input source model. That provided a very high confidence hybrid model, but again we did not see the representation of all the different features that were in our original model. So the winner, again as I previewed before, was sampling from the high impact pairs. Those of you who are familiar with [inaudible 00:29:23] will be very familiar with these concept.
Where Tamr is actively selecting and sampling the questions that are most representative of your data and have the most ambiguity, in terms of opportunity for learning. So how we can summarize this process is so that our first two attempts really failed to capture a representative sample of our data in order to gain exposure to how the model should handle all of the different features or attributes within the data. When we leveraged the high-impact payers, then we were able to take advantage of the purpose of that tool within a Tamr. Which is to sample again, the most informative pairs for experts to respond to and train the model in the most efficient manner.
So now that we have this solution, what is our long-term vision? It’s really to speed up that engagement with subject matter experts, such that we can ideally long-term skip over that first initial training step. Where rather than having experts come in and from ground zero apply labels to train a model. We can leverage to these bootstraps models such that experts are coming in only to do that exception handling within cluster curation or that model tuning. And one of the really exciting things that we’ve developed recently in the product, is the ability to actually gain feedback from that cluster curation action. So, not only can we have exception handling within that view, we can also infer additional model training. So it’s actually a model tuning action within an interface that gives the users a holistic view of the entity. So we’re very excited about this as a potential solution and are further testing it internally right now, and are looking for beta use cases. So hope you’re as excited as I am. I’ll hand it back to Mark.

Mark Marinelli:
Thanks, Katie. I’m pretty excited and it’s an excellent segway into where I’m going to start talking. So I’m going to talk a bit about recent innovations in the product. The stuff that you saw, both Liam and Katie described is field based. We learn a lot, we innovate a lot with our customers and then over time harvest those innovations that have been incubated in the field into the core products of it all of our customers can benefit from the best technique that we’ve found for any one of these in a scalable and durable fashion. I’m going to focus here on clusters, where we’ve done a lot of work over the last year, principally over the last couple of quarters. Clusters are the area, that the closest really to the answer right before we create a golden record, we’ve done a clustering and a mastering to say that all of these records belong together.
There are less artificial mechanism for the end users to decide whether Tamr got it right. Pairs are very powerful and they’re very intuitive, but they’re sort of upstream from how the data are actually going to be consumer, clusters are a lot closer to that and thus are a really great opportunity to get that participation. By providing feedback about what people are doing in clusters to elevate our confidence in the outcomes. So we historically only had one type of cluster feedback where you could lock a cluster and say, Tamr has got it right or wrong, but I just want this particular cluster relationship to proceed in perpetuity. We’ve given you more control over that, this was earlier in the year. Where now you have more agency over whether you continue to receive an act upon suggestions from the machine learning model as to which records belong in which clusters or whether you’re actually going to override what Tamr is telling you about them.
So you can see here is sort of a spectrum of how much you trust Tamr. What actions you would take in the user interface as you’re working through clusters, to either accept what Tamr is suggesting, because you trust it or to refute or prevent Tamr from overriding any of your human agency over the outcomes. There’s just a screen as to where it fits into the UI. It’s been in there for quite some time. Arguably more important and very, very useful are high impact clusters.
We’ve had high impact pairs from the beginning, active learning, this great way to ask very targeted questions, algorithmically selected for maximum improvement in the quality of model. We’ve now moved that whole paradigm out to the edge in clusters. So we will ask users to weigh in on specific clusters, because now we can take feedback about those clusters, not as just an acceptance or reputation of Tamr’s correctness, but also as more validation and training data, which can then contribute to the model. So it’s really completing the loop and used to be sort of a linear process and we’d push out to classrooms and then people would decorate the clusters with their version of the truth. Now we’re actually taking that back. We’re being smart about how we leverage these data in a few different elements in the supply chain here of these data to improve the quality, improve the outcome.
We’re also giving you more visibility into how the clusters are performing. This is really in service of people having more confidence and more agency over how the quality of the clusters improves or visibility into when it is not improving over time. So here we’ve instrumented some new records where you can filter on the different types of metrics, whether it’s a precision problem or a recall problem with any of the clusters. Provide some feedback to within that filtered set and then estimate these cluster metrics. There’s a new calculation that’s being done and you get this. You’re getting a history as we’ve evolved through the different iterations on these clusters as to whether the precision and recall of these clusters are improving or falling.
So really now we’ve got the sort of speedometer on the dashboard that allows you to see as you’re accelerating or decelerating. There’s an analogy there. What’s happening and that allows you to then go back as warranted and do more work with those high impact clusters, the same way that in a pair’s context, you’d be looking at those precision and recall numbers and determining how much more training you needed to do and when to stop.
So I’ll wrap it up. That was just a whirlwind tour of some of the more recent product functionality. Please go get one of the more recent versions and avail yourselves of this functionality and get yourselves a Liam or a Katie and avail yourselves of the types of innovation that they and their teams have been providing to our customers throughout the journey. Thank everyone for participating and we look forward to doing more of this with our customers and partners in the future.

Liam Cleary:
Thank you everyone.

Katie Porter:
Thank you.