datamaster summit 2020

How Mass. Executive Office of Education (EOE) Earned an ‘A’ in Analytics Using Mastered Customer Data

 

Louise Baldwin, Brian Czajak, Danielle Norton & Danielle Ondrick

Louise Baldwin, Solutions Director @ Tamr
Brian Czajak, Software Architect @ Mass EOE
Danielle Norton, Integrated Digital Data Services Program Manager @ Mass EOE
Danielle Ondrick, Enterprise Cloud Architect & Project Consultant @ Mass EOE

The Massachusetts Executive Office of Education (EOE) works with early childcare and education providers and 29 public colleges and universities to create a ‘connected’ experience across the education system. Like many organizations, Mass EOE faced major data challenges creating unique or ‘golden’ customer records. A legacy data management system failed to remove duplicate records from siloed systems, making identifying relationships between customers extremely difficult. This inhibited Mass EOE’s mission to connect students, teachers, and other members with beneficial services and resources.

To solve this challenge, Mass EOE turned to Tamr to create golden records for every student, teacher, and organization in their customer base. This 360-degree view provides the analytic insights required to better serve their customers.

Read about it in this case study.

Transcript

Louise Baldwin:

Thank you, very much, for joining our session with Massachusetts Executive Office of Education to look at how they use master data to empower analytics to drive impact within the education system. I’m delighted to introduce our speakers for today’s session from Mass EOE. We have Danielle Norton, Integrated Digital Data Services Program Manager; Danielle Ondrick, Enterprise Cloud Architect & Project Consultant; Brian Czajak, Software Architect for EOE.

Louise Baldwin:

My name’s Louise Baldwin and I’m a Solutions Director at Tamr. The EOE does really important work with early childcare and education providers in 29 public colleges and universities to create that connected experience across the education system. So like many organizations, Mass EOE faced major challenges creating unique or golden customer records.

Louise Baldwin:

They had this legacy data management system that failed to remove duplicate records for siloed systems, making it tricky to identify relationships between customers as they moved through the education system. It became a barrier to EOE achieving its goal of connecting students, teachers, and other members with beneficial services and resources within the education system. To solve this challenge, Mass EOE and Tamr worked together to create golden records for every student, teacher, and organization in their customer base.

Louise Baldwin:

Over the next hour, we’re going to have the chance to really deep dive on Mass EOE’s customer data mastering project, and I’m really looking forward to having a discussion as we go. So to kick us off, Danielle N. is going to set the context behind the project objectives and the data management landscape at EOE. Danielle O. will speak about the insights that Mass EOE is looking to drive from that customer 360 view. I think a really interesting part of the project as well, is how Mass EOE used graph databases to capture new insights, and so Brian is going to chat through some of the analytical benefits there. And then finally, I’ll touch on how, at Tamr, we’ve seen other organizations tackle similar challenges to those taken on at EOE. And so with that, let’s get into the content. I’m excited to pass it over to Danielle N. to get us started.

Danielle Norton:

Terrific, thank you. The Executive Office of Education is a secretariat that was founded in 2008. Prior to that, our four agencies, Children’s Trust, Early Education and Care, and Elementary and Secondary Education, as well as the Department of Higher Education, had no history of working together. Each of those four agencies operated independently and over time built up their own individual sets of systems, they used different standards, and now they’ve come together under a single secretariat: The Executive Office of Education.

Danielle Norton:

One of the things that they’ve agreed upon over time is that they really want to focus on evidence-based policy and decision making. In order to do that, we have a few challenges to overcome. Our big question is, how do we link disparate data that’s been grown over time in siloed legacy systems to create an accurate, 360 degree view of our constituency, and do it accurately, more accurately than we can do right now, using fewer resources, fewer manual interventions, and do that on a budget? Because we are the government and we do have a very tight budget.

Danielle Norton:

So I’ll talk a little bit about our challenges. We do have some. As I said before, all of our systems were homegrown over time. The state makes very, very significant use of its resources, and we have the tendency to continue using our legacy system way past any plans of obsolescent state. We have systems being used that are 20 years old, we have systems in use that are two years old, and we have the challenge of integrating older systems and newer systems over time. We also have a practice of migrating data from older systems to newer systems, and migrating the data quality problems with it. So we tend to keep our data quality problems and migrate them throughout our systems over time.

Danielle Norton:

We also, as we said before, have four different agencies. Each of these four different agencies answer to different federal agencies. They also have adopted different standards. Department of Higher Education uses the PEC standard, DESE uses the [inaudible 00:05:39] standard, and EEC and Children’s Trust use a multitude of standards depending on which of the business units need to report to which federal agency. So we have a lot of challenges with merging our data that blew up over time, following different standards.

Danielle Norton:

Since our systems weren’t actually meant to be integrated from their inception, we also have a lack of ability to link across systems. Even within particular agencies, it’s often very difficult to link the data because originally these systems weren’t flown under a program that focused on the data and focused on interoperability.

Danielle Norton:

All of our systems are meant to achieve a certain business goal and use of those systems in the field are really focused on achieving what we’ll call short-term goals, which are extraordinarily important and it’s the actual business that we’re serving. However, altogether, under the secretariat umbrella, we have a long term goal of that evidence-based policy making. In order to achieve that goal, all of our systems need to be aligned in order to increase their accuracy, de-duplicate their data, and link all of the entities over time.

Danielle Norton:

Now we do have a history of trying to link our data and we have had some successes in linking our data. Our Department of Elementary and Secondary Education has a very good process of master data management and data governance. However, not all of our agencies have benefited from that type of strenuous efforts. We have had previous efforts at implementing software, not exactly like Tamr, but some other bigger names. These efforts, I’ll call it a failure to take off. These efforts, it was a steep learning curve for our staff, it required complex infrastructure to be implemented, and it was expensive to implement. All of these combined made it very hard to even get off the ground.

Danielle Norton:

A lot of our data stewardship efforts right now, require a great deal of manual intervention and manual review. This is something that we feel over time, we’re not going to be able to keep up. Right now, we really very, very heavily on dedicated staff who are committed to reviewing records and following up to establish identity. What we feel over time is that we’re going to need to achieve a certain level of accuracy without so much manual intervention.

Danielle Norton:

And I will hand this over to Danielle Ondrick.

Louise Baldwin:

Perfect. Maybe before we jump into it, you touched on so many interesting areas within that, the large scale data challenges, you mentioned systems from two years old to 20 years old, and managing across the four agencies. I would love to know when you speak about this failure to take off, I think it can be really off-putting when making changes, being worn down by previous attempts. I would love to know a little bit more about how you overcame these and what made you revisit you the project with Tamr again.

Danielle Norton:

Thank you. I think it is important that we focus on the fact that all of the agencies under the secretariat have agreed that evidence-based policy making is one of our highest priorities, and in order to do that, we really need a high level of accuracy in our reference data. We need a high level of accuracy in our linkages over time in order to support that. If we don’t have the accurate data, if we don’t have good data, then we’re not going to have good results for policy making. That’s really what has brought us back to the table.

Danielle Norton:

Danielle Ondrick had some early experiences with the Tamr software, so we were very interested, and we were interested in what Tamr could provide to us in the form of artificial intelligence supported de-duplication. We’re definitely interested in, can we save time and money using a different process than we’ve done in the past? A lot of our previous efforts have focused on hierarchal rule sets, and we’ve put a lot of focus on it, we’ve spent a lot of time refining the rules over time, but we’ve only achieved a certain level of accuracy, and based on the proof of value we did with Tamr early on, we definitely thought we could achieve a higher level accuracy. That’s a long answer for what brought us back to the table.

Louise Baldwin:

That’s fantastic. I love how you kicked it off as well of bringing back to the overarching goal of evidence-based policy decision making, and that being at the heart of it as well. Danielle O., I know you’ve been very [inaudible 00:12:22], and we’re about to launch into it and going into a bit more detail. Would love for you to take us through the approach.

Danielle Ondrick:

Sure. Hi, this is the second Danielle. At EOE, we believe in high availability, so I’m here today as the additional Danielle on our goals of mastering and how we’re using Tamr to do that. First I’d like to talk a little bit more about, what is our use-case? Our use-case is all about customer mastering. For us the customer consists of both organizations such as childcare providers, schools, as well as individual people such as students or educators or even childcare recipients. So when I use the term customer, I could mean it in both a person or an organization context. What we’ve decided to do here is we’ve chosen to use some very common demographic information such as names and address, contact info, and a lot of those can be often shared between both organizations and people, in terms of having maybe the same physical address or things like that. So out of our initial work we set out to say I’m going to master organizations and people, and we also decided that one of the dependencies on this was address and something we call location. So we end up mastering organizations and people, and we can see them, we get golden records for organizations and addresses and people as well.

Danielle Ondrick:

I had a background in customer marketing, large global retail companies before I came to work at state government, and what I found was that many of the strategies that were used in retail for identifying customers and creating a customer 360 view could also be used and applied to our education customers. I was pretty anxious to see if we could use some of the same techniques here at state government that we do out in the retail world.

Danielle Ondrick:

One of the things we also decided to do is as part of our mastering process was decided to assign a unique identifier to each mastered entity whether it’s a customer ID or an address ID. You may hear us referring to that discussion today about this thing we call an EOE ID, and this is going to be our unique education assigned ID. That becomes very important as you’re looking through some of these slides and hearing us talk today when we talk about a master [inaudible 00:15:03] identifier, and some of those golden record components and how we do all the linking that Danielle N. was just referring to that’s very important to the whole success of the project.

Danielle Ondrick:

What are our plans for the downstream consumption of this Tamr master data? To start, one of the things we want to do is, we want to make our application data better at the sources if possible. Better source data is always at the top of the list for success. At a minimum, one of the things we can do right at the beginning of the project, is we can provide the quality reports back to these source applications to show them what we’ve learned about the data and how we think it could be improved. We’re getting a lot of inconsistent data, shall we say, from a variety of those data silos. While the data stewards and the managers of these various applications, they like to believe that the data is always in the best shape it can be, but we often find after running it through a process like mastering, that improvements could be made and that unfortunately, there’s duplicate data, inconsistent data, or incomplete data. We’re hoping to put in a data publishing service that will supplement and augment that UE application, the source application data, using our golden records.

Danielle Ondrick:

The next plan we have is a very externally focused goal and that’s to provide secured and key identified data access to researchers and analysts so they can use the information to make policy decisions based on evidence in the data that we’re seeing. It’s very important that we not only understand who our customers are and who we’re providing services to, but also how effective are the services that we’re providing? For example, we might want to know what aspect of early intervention lead to better long-term outcomes? This will only be possible if we have a good view of who the customers are, how many we have, and what type of relationships we have in our systems. And that, as I said, is a very externally facing goal, whereas that first one is a very internally facing goal to try and fix our own source applications.

Danielle Ondrick:

Then the third big plan we have for the Tamr master data is to make the data available in what we’re calling a data hub across our education enterprise. If I were out in the public sector or private sector rather, I’d be talking about this at the corporate view, but here in the state government we call this a secretariat. This is our across agency view, inter-agency view of the data as it is. What we’re hoping to do here is creating a case for linking those data silos and providing consistent set of metrics, KPIs, reports, and dashboards, very similar to how you’d run the company from the top level. We don’t have a lot of cross-agency visibility today and that’s one of our goals for sure. Again, we want to be able to say with greater confidence how many customers do we serve, and how well were the served? Those are some of the plans we have. I’m sure we’re going to have more ideas as we go along but we’re anxious to try and see if we can’t make each of these a success as we go forward.

Danielle Ondrick:

So let’s talk a little bit about universal challenges: developing better customer insights. Danielle N. already spoke to use about some of these challenges, they definitely come into play as we’re working through the data here at the Executive Office of Education. As she said, one of our biggest challenges is that we’ve got data collected in many different systems. By different, as she said, means systems can range from decades old to modern, they can be developed by different vendors or custom coded, and they often have a specific view of the customer that’s unique to that application. You can’t usually expect a [inaudible 00:19:15] to all your different systems in the portfolio and expect somebody to call it out and say “here’s the customer table”. If you could get lucky, that could be the case, but often it’s not. It’s hidden across multiple tables, multiple systems, multiple sources in reference values and you have to go looking and have to really understand where your customer data is stored and how it’s being stored.

Danielle Ondrick:

We also realized that there’s a level of de-duping and finding the same customer or identifying customers even within the same application. I think a lot of folks, and we included, thought we could just start running our sources through and start de-duping across our sources and across our application, but we found that it’s very important that you run these rules and try to identify customers even within the same system, because they’re not always as clean as we’d hope.

Danielle Ondrick:

So once we start figuring out or have the challenge of overcoming de-duping a senior person or a single organization, we also wanted to focus on the relationships that these customers have in our systems. It hasn’t always been a holistic view of our customers. What we see is that the application often focus on tracking a customer interaction in a particular role. We might have a student collection system or an educator licensing system. So these are examples of a person who is acting in a particular role, and it’s not always easy to tell that they may in fact have been the same person that was both a student and educator.

Danielle Ondrick:

Recognizing our person or organization in a variety of roles is another pretty universal challenge for all of us. We also noticed that roles may collect different data depending on what they’re trying to accomplish. We have some systems that might have great name and address data, but no date of birth, while others might collect great data on email and phone, but have nothing on address. Trying to gather complete information with as much information as we have from a particular role about a particular person is another challenge we have.

Danielle Ondrick:

Finally, tracking a person or an organization in various roles and relationships over time compounds the challenge since now we not only have to re-identify them in a particular application, but across applications, across roles, and now across time. Things like a person’s contact info, or role, or even name may change over time. It would be again, wonderful, if all systems have this notion of a temple aspect where you could see the customer as they’ve evolved over time, but the reality is that this is often a challenge. Not every system records things like a date and time stamp, so understanding whether you’re getting information into the mastering process that’s [inaudible 00:22:14] or stale. Should I use this, should I rely on it, or has it changed? These are an example of some of the more complex challenges that we face, and again, none of these challenges are related to education or even really…

Danielle Ondrick:

What’s next? As we talked a little bit about leveraging these customer 360 views, to anticipate and recommend services, while we’re only at the beginning of our journey, [inaudible 00:22:47] where we do plan to leverage this customer 360 view. Again, many of these are familiar to the retail market such as anticipating your next shopping move, or what you might need, and if we have services we could recommend. These are all things that are very familiar to other industries and they are certainly applicable here in the government sector. You may, for example, be enrolled in a particular education program but might not be aware that you qualify for additional programs or services. This is an area where we can help guide customers to a more complete view of their eligibility and some of our service offerings.

Danielle Ondrick:

Another area is in understanding potentially underserved communities based on potentially demographics or geography. Having an accurate customer list is essential when we’re allocating resources, and getting that view into our customer and maybe identifying what micropopulaations might look like. This is an area where Brian will talk a little bit about where we have greater insight possibly using some of the graph technology and trying to see some of those relationships between entities. But even at the basic level of mastering, having that list is certainly very, very important for targeting our populations to get the limited number of finite resources that we often have to work with at the state level.

Danielle Ondrick:

Finally, the third area we’re hoping to do is to optimize in service delivery and enhancing some [inaudible 00:24:24] team management. If you’ve ever tried to manage a say a large budget of 500 million or more, you know it can be pretty difficult to make sure that you plan how to use the money. The government is no different. The budget cycle is often planned far in advance and while the conditions faced by the customer are constantly evolving. We could not have predicted COVID would happen, so the customer’s needs are changing and services that we may even be able to offer could be changing over time. Unfortunately we’ve had to plan for our budget maybe a year or even longer prior to these cycles. This is an area where we really feel that having accurate customer 360 view would help have the best impact, because we can now track how the customer is evolving in realtime or near realtime I should say even though our budget is established long in advance and we have to work with the money we have in a constantly evolving customer world.

Louise Baldwin:

Danielle, it’s great to hear your bring all of that work to life. What you mentioned around early interventions and how to serve underserved communities and how to target, I know there’s lots of ways that master data and customer data can be used, it’s hard to imagine one more impactful than how EOE is trying to use it within our education system in Massachusetts. I think it stood out as well how you mentioned the lessons that you yourself took from having that private sector experience and bringing it to the role, and then vice versa taking the lessons from state government and thinking about applying that within the private sector. Is there any advice in particular that you would think about offering back to the private sector when you look at how you successfully launched this project?

Danielle Ondrick:

Oh, definitely, sure. Thank you. I think some of the lessons learned would be, one thing we did was we went in with a conceptual model with really high level view of person having relationships with organizations. We pulled out person and organization as very key entities and then growing the relationship between them and recognize them at various roles.

Danielle Ondrick:

One of the things we also probably common in other sectors, we sometimes have people that are acting in a capacity as an organization such as a person doing business as, the DBA role, or a sole proprietorship. We had to think through how did we want to treat this information we have. Did we treat them as a person? Did we treat them as a master [inaudible 00:27:04]? Are they in some sense both: a person having a role in an organization and that being tied to the same person an organization? We really had to understand the structure of the data and the nature of the kind of roles that the people and organizations were playing. I would say definitely understand how your data’s structured and put it together and figure out whether you can use that data as both an organization and a person is a key one.

Danielle Ondrick:

We didn’t have a standard oncology or taxonomy and those are just fancy words for saying, we didn’t call things the same words across the various applications or even agencies. We tried to go in saying we’re using five different words to refer to this, and of course each group wants to keep their own word for something so you have to try to navigate that with the data stewards and application owners, but trying to come up with a common glossary of terms and a common set of values. Even things as seemingly simple and straightforward as gender codes, race and ethnicity codes. It’s one thing to say, oh I’m going to go borrow the ISO Standards for these type of values, but when you look at your own data, they may not fit the standard.

Danielle Ondrick:

You really do have to look at the details, open up all the code, and walk through and try to do some cross-walking between some of these things at that level. Having a common glossary certainly one of the things that if you don’t start with this, you’re going to end up with it, so you might as well try to focus on that as early as you can.

Danielle Ondrick:

I think the last lesson learned I would say is it’s important to let the customers, and the customers in this case our external customers, but the business owners of projects that we’re bringing data for, it’s important to let them know that the mastering is not changing their data. Everybody is worried that, oh my gosh you’re going to change the name or an address. What if you change an email address from another system? We worked pretty hard at saying that’s not the focus, that’s not our goal, and we are not changing your source application data. Instead, what we do is we provide a set of crosswalks or links between systems. It’s definitely more of a reference architecture cross-reference, and we’re able to link and crosswalk back and forth. That seems to help lower the anxiety of people who might be hesitant to start running their data through this mastering process.

Louise Baldwin:

You really hit a lot of the meaty challenges, I think. At the heart of it, it’s often those things that are the most tricky. I think putting people at ease around the changes around master data and that alignment as well on the glossary of terms. I really like how you broke down the jargon, because we do throw around terms like taxonomy and whatnot, but really just getting that understanding of how a person is thought about or defined versus organization and those different levels, it does feel like from projects at Tamr, that it can be so key to success.

Louise Baldwin:

I’m excited we’re getting further and further into the detail. Maybe with that, Brian, should we hand it over to you to bring us through the pipeline?

Brian Czajak:

Sure. Hi. Thank you, Danielles for setting this all up. We’ve heard the why we’re doing it, now let’s talk about what we’re doing. The pipeline. This is an internal view of how we actually assembled the data to get an answer from Tamr on a golden record. You’ll see that the source and result databases which would be external to Tamr are sort of blobs at this point. The activity inside Tamr starts with address and the reason we chose to do that is because by standardizing the address and coming up with the golden record there, we end up with the sig and applying it to org and person. We end up signals that are more accurate at getting at a better answer for de-duping or mastering org and person.

Brian Czajak:

We are looking to create a person-centric view, so we did person last. Org comes next, after address, and we then feed the org along with address into person. So we have the relationships the person has with an organization to provide us with a detailed view of the person as we master it. The golden records from address and org become signals in the person. It’s all daisy-chained together.

Brian Czajak:

What we also see in this picture is that we have separated out each individual source and then we bring them together for a multi-source mastering. The reason we did that is the data creates different signals for the AI in Tamr and the results can counterbalance each other. So what would be a correct answer for source one might not be a correct answer for source two. By separating them out, we’re able to get an effective de-duplication mastering at the source level. It also provides us with a view into data quality for a given source. Then the master result of a source is feed in and they all work together to provide a golden record, a multi-source master, and a golden record. That’s effectively what goes on here. As you can see, we feed the results in.

Brian Czajak:

On the next slide, you’ll see staging and the destinations, and all of the data quality across the top. What happens here is we collect the data from our source systems and stage it. Why we do that is it allows us to consolidate the source data and standardize its structure so that first name in one system, if name in another, first space name in another, all become first name. SSN, where it might be encrypted in one pattern, encrypted in another system in another pattern, we can standardize all of that so that we don’t have any inconsistency.

Brian Czajak:

We also at this point remove default values because they can create unnecessary false positives inside an AI from matching, so we’ll remove things like an SSN of all zeros, which would show poor data quality. Or an unknown at its defaulted value if you don’t have the value. All of these things would be removed at this point because nulls don’t create a signal whereas an actual value would.

Brian Czajak:

The next step would then be to feed this standardized data into Tamr and then it flows through the pipeline that we just discussed. At the top, you’ll see the data quality. At each level, we present data quality reports: a view into the data for the source systems. What we end up with seeing is that at the early stages, your data quality is more structural. What’s wrong with the data? We find that there are issues with data entry or issues with a migration from a prior system, so we end up removing quotation marks or defaulted values, that kind of thing, or structure.

Brian Czajak:

As you move through the pipeline, you end up seeing data that is more bad business practice. You end up with seeing your de-duplicated data and you can identify say you’re entering a person six or seven times and not reusing the same person. There is something wrong with the business workflow that’s causing this to happen, or something wrong with the UI and the system is causing it to happen. We can identify those data quality issues by separating out the sources and managing the data incrementally across the lifecycle.

Brian Czajak:

Once the data is mastered, it flows downstream into the NTN where the golden record is used to create a map. We now have the EOE ID assigned; we have the golden records for the person, the org, and the address; and we store the original source IDs along with that EOE ID and golden record. I have a pointer back to all of the data that came from the source systems. I have a golden record and the ability to attach all the transactions for longitudinal data studies, analysis, that kind of thing, which you can see here.

Brian Czajak:

Next we have a little bit about the actual pairing mastering process. Here, a good slide on how to understand your data. This particular example you see Tamr was able to identify twins and not match them. You end up with something that has a matching birth date, has a matching gender, has a matching last name, the first names are close, but not the same. You’ll also see that it filtered out having mister and full name, and was able to determine that these are not a match.

Brian Czajak:

Key things to take away here is to understand what goes on in the data to create effective rules. Because you have to understand how to bucket or bin your data so that you can effectively pair it and train the AI. What fields are necessary to match the data? Are those fields consistently available? Do you need to scrub them? Based on the robustness of the data, should you be separating out your sources? These are all questions you have to answer to get to this stage.

Brian Czajak:

You can see here Tamr understood aliases. Your source systems may or may not do change data capture. They may or may not store the change to last name. What we did was back on the previous slide you saw that we had a staging environment, and there we would execute a change data capture for the systems that aren’t already providing it which allowed us to create lists of last names so when someone gets married, we have the maiden name, we have their new last name, hyphenated, or their partner’s name. We may have actual aliases from background record check system. So all of these can be brought together to identify a person. In this case, this person has two completely different last names, but you can see in the aliases, Moran is in both lists, and as a result, Tamr was able to create a match here where one would not come up if you were just looking at first and last name.

Brian Czajak:

How do we use all of this data? Fueling data-driven decisions with Tamr. The idea here is, once you get a golden record, you have now your org, you have your person, you have the address, and you have the person to org relationship. All of this is valuable because it provides you with the ability to see all of the different relationships across agencies, across different siloed systems in an agency, so you get an idea of what resources a person’s using. You can now also then tack on all of the transactions inside a graph environment, inside a master data hub, and this gives you a holistic view of the person and a holistic view of the org which allows you to then do predictive analysis, letting you see if a person or an org is using a resource from the state, what are the odds that person will be using resources when they change agencies, as the child gets older? You can then decide, my budget needs to include more money because I have more people using resource A and there’s a high likelihood they will be using resource B, C, and D as they move through the system, we can better predict a budget for those resources.

Brian Czajak:

We can also better design the person’s access to signing up for those resources in having an integrated environment where an offering is made. They also have access to offerings through the other sources. The same data can be used to initiate those offerings. It allows for a more consistent engagement with the state.

Brian Czajak:

Now you see a higher level view of the full pipeline. You can see everything flowing in to Tamr and then we marry the transaction data from the systems and we’re using neo4j to do it. This provides us with those graph relationships. Now graph database provides you with a few of the relationships you know about it, but you can also use graph algorithms to get implicit relationships that you did not know about, and this helps you with compliance to policy, it helps you determine what policies may need to change to implement better policy. There’s a lot of different things going on once you can see how a person interacts with the state because if you have siloed systems in an agency and then across agencies, you don’t know what that person is doing. Once you merge all of that data, and create these relationships across systems, you’re able to see then how better to service that customer.

Brian Czajak:

I think I’ll hand it off now.

Louise Baldwin:

Great. Brian, the more I hear you and Danielle O. and Danielle N. speak, the more I think we need to extend this into an entire day-long session in order to get at all of the ways master data is helping to inform policy and think through policy decisions.

Louise Baldwin:

But you did bring up and introduce the concept of golden records on the slides and would love for you to touch on that in a bit more detail and maybe cover some of the ways that those golden records have added value to [inaudible 00:43:21].

Brian Czajak:

Sure. Golden records are basically a consolidated view of the cluster of source records. We can use rules to determine which values from which sources. We can assemble the individual fields of a record. What this provides you with is the cleanest, most detailed version of a person, or an org, or an address, which you can then use in reporting, standardized data. What we then do with the golden records is we attach a persisted ID called the EOE ID, and that EOE ID is also mapped to all of the source records. Now you have the consolidated view of a person, all of their demographic data, and a link back to all of their transactions.

Brian Czajak:

How does this help us? Initially, during the data analysis, we were able to see, as an example, significant numbers of duplicates for contact records coming from one of our systems. We were able to implement a business process and a structural change in that system so that the contact records were linked to their different organizations as opposed to actually written on the rows for the orgs. Instead of getting 60 different versions of the same person, we got one record with 60 relationships to 60 different orgs. It allowed for a much cleaner, more efficient flow of the data and a better view in the source system.

Brian Czajak:

Another thing that we were able to see: SSN in a system was regularly duplicated or incorrect, all zeros, all ones, that kind of thing. The data was not accurate. We were able to implement new business rules there to clean that up. By building the golden record, and taking data quality at every stage, we were able to see these issues, discuss them with the managers of the source system, and implement actual change.

Louise Baldwin:

That’s great. I think hitting at the real challenges, you mentioned the errors and the effort needed to create that single view and often the scale that encompasses that really hit home at the point we were hoping to make in the followup. I think what this slide speaks to is our view is that big data is really no longer the primary challenge that organizations are facing, it’s really bad data. There has been this massive shift over the last decade in terms of the progression of moving data to [inaudible 00:46:26] data platforms, and along with data, [inaudible 00:46:29] is also moving to the cloud. It’s really ushering in this new opportunity to scale data management capabilities in pretty unprecedented ways.

Louise Baldwin:

But when we get down to it, there’s still this massive gap between data and analytical icons. Some of those evidence-based policy decisions that EOE is driving and traditional NDN solutions often aim to solve this problem, but in some cases overpromise and really fail to deliver on cleanse master data coming through at scale.

Louise Baldwin:

Within that, I think the opportunity that we believe at Tamr lies today is in next gen data management solutions. To really power data op strategies that address both speed and scale. We’ve had the chance to work with global 2000 customers across Tamr and they generally are pretty consistent patterns that drive these core principles of data ops that often do stand in contrast to traditional, single vendor, single platform type data processing approaches that are sometimes advocated by more traditional players, our informatics, oracles, and what not.

Louise Baldwin:

These next gen management solutions are both in principles that describe how people process and tools need to work together to deliver successful initiatives. I think the Danielles and Brian, you hit on this so much more concretely in how you described overcoming some of the challenges, and often elements of it are more around people and process, in terms of bribing true change.

Louise Baldwin:

But a data ops approach and the approach that Tamr takes is really focused on that. Agility, continuity, collaboration, interoperability, and scale at its heart. I think from working through it with some of our customers, sometimes the changes are relatively uncomfortable while getting started, but we think that it’s much, much more effective, especially in the medium to long term. It really represents a winning strategy for maximizing the generation and the use of the data in the enterprise to really stay agile to shifting needs.

Louise Baldwin:

I’m getting into more of the detail on that. We heard Brian really describe in more detail the actual process but Tamr’s approach is very much centered on cloud native data mastering solutions that enable a human-guided machine learning approach to data curation, enrichment, and publishing. The output that we’re always going for is high-quality, cleansed, master data sets that are able to power both analytical and operational use-cases.

Louise Baldwin:

Our philosophy is that machines do some things very, very well and particularly at scale, but that human input and that guidance is still key to getting the data in the format that you want it. The approach combines machine learning with subject matter expert input to train the models and to enable the machine to do the heavy lifting to generate those curated data sets that ultimately are focused on aligning with the business need.

Louise Baldwin:

To go into some of these steps in a bit more detail, on the data curation side, Tamr focuses on combining disparate data sources by mapping different data schemas. I’m conscious I’m going back to some of the jargon, but the attributes and categories such as in this example, where tables have different naming schemas, like name, it could be company, it could be something abstract like CNA1. This process also includes matching and de-duplicating records over thousands or millions of rows and generating that unique Tamr ID for golden records that Brian spoke about, and categorizing the data so leveraging well defined taxonomies or classifications.

Louise Baldwin:

Within curation, our data mastering workflow makes it easy to engage subject matter experts through simple yes or no questions, the type that Brian just showed. You can imagine if Google was your customer, you might want to understand the business you’re doing with them by location, or by region, or even within the corporate parent hierarchy, for example, within Apple.

Louise Baldwin:

We then have enrichment, and that really focuses on making sure the data quality and accuracy is good, particularly around organizations and people. And finally, data publishing. Tamr has an intuitive user interface that focuses on the ability to publish data to downstream how data stores for analytical use-cases with tools like Tableau or Click or Looker, and for those operational needs, be it an Oracle or an SAB.

Louise Baldwin:

Our final overarching slide before we get into some Q&A, the platform is essentially built around these data mastering solutions that are focused on tackling the toughest data challenges. This is the part that I get most excited about, is how our customers are going to actually use the data to drive results. We typically think about this in terms of driving growth, reducing risk, and lowering cost, but really about driving key strategic initiatives with the right data.

Louise Baldwin:

We are really proud to have had the chance to work with EOE and play a small role in the work that they are doing. But a lot of the core challenges are typically the same across organizations. We get to work with public sector from the US Air Force, to private sector companies like Google and J&J. While Tamr is able to tackle many use-cases, for B2B organizations, we always begin with customer data, really for its ability to drive business results quickly.

Louise Baldwin:

With curated, 360 views of customers, companies like [inaudible 00:53:00] or Analog Devices, are better positioned to expand their relationships with customers, offer better customer experience, and lower risk while doing so. Though those worlds seem at times worlds apart, I think our conversation today highlighted a lot of the parallels in terms of driving impact.

Louise Baldwin:

That gave us a little bit more detail around the Tamr overview, but would love to revisit some of the topics that were brought up throughout the session. I think the EOE team hit on so many areas that would love to deep dive on, so maybe for our last few minutes, Danielle N., I would love to hear you talk more as well. When you mentioned about this journey to Tamr, and you spoke about that larger transition, but when you were thinking about looking for a product to help create that 360 customer view, in particular, what features or functions were important to you?

Danielle Norton:

Certainly. Definitely usability is extremely important to us and the ability for us to implement a program that didn’t require all of our resources and wasn’t too complex for us to keep up to date with. As I said before, we had failed in the past at implementing some of the larger master data management software, and it was largely because of the complexity and the amount of effort it took. What we really focused on with the Tamr software was that we were able to get it going and get good results with our original proof of value was less than seven hours of training and analysis and we were looking at a really good return. We were looking at somewhere around over 90% accuracy in the level of mapping. This was really important to us because we’re trying to be good stewards of our taxpayer dollars and leveraging AIs to return results in a quicker fashion with utilizing less resources is hugely important to us.

Louise Baldwin:

Absolutely. It really puts it in the context of driving ROI at the end of the day. Needing to be cost-conscious, especially given the position of the state government and being subconscious around taxpayers’ dollars as you mentioned. I love that drive to really make sure that it’s drive and return, and drive and return, and quickly.

Louise Baldwin:

We’re in the final few minutes so maybe one last question. I think the work really is pioneering that we’ve seen Mass EOE do. I think not only the icons, but also just the entire time how it’s come together to drive impact. Would love for you to talk a little bit more about it, maybe Danielle O. in terms of how this forward thinking mindset came to be at MASS EOE and how important you think it is to achieving success.

Danielle Ondrick:

Yes, exactly. As we were just saying, we have limited funding at the state and we try to be good stewards of the money. As we also talked a little bit about earlier, some of our systems are very old and our legacy systems often have to continue running past their obsoletion date. Sometimes you’ll hear people talk about technical debt, we have to lean on them and they become brittle and old. As somebody who’s in charge of enterprise architecture, we’re looking at modernizing the portfolio as quickly and as cost-effectively as we can. Forward-thinking mindset here really let us try and leapfrog over all of the interim steps of getting this customer list together. Tamr was the human guided AI technology was certainly one of the ways we were able to leapfrog really over, I always joke, into the Flintstones and the Jetsons get together. It was a wonderful experience, and it was easy to use, and that’s why we decided to try doing such a forward-thinking solution here at the state.

Louise Baldwin:

I think that’s a great note to end on. With that, Danielle O., Danielle N., and Brian, as we close out, a big thank you. It’s really incredible to hear more about the work that you’re doing within EOE and we’re just really grateful for having the chance to really deep dive with you and for taking the time. Thank you also to everyone who’s watching. If you’d like to learn more about Tamr, feel free to jump on the website, if you’ve any questions on any of the material covered today, or you’d like to see a demo of Tamr in action, feel free to shoot me an email directly, too. Thanks very much for joining.