datamaster summit 2020

7 Components of the DataOps Ecosystem


Mark Marinelli

Head of Customer and Partner Enablement @ Tamr

DataOps is an emerging set of practices, organizational structures, and technologies for building and automating data pipelines to enable governments to make data-driven decisions quickly. Mark Marinelli, Head of Customer and Partner Enablement at Tamr, shares the key components needed to transform data initiatives.


DataMasters Summit 2020 presented by Tamr.

Five, four, three, two, one. All right, thank you everyone for joining this session. This is the seven components of the DataOps ecosystem with Mark Marinelli. Hi, Mark.

Hey, Megan. Good morning.

Good morning. Mark is the Head of Partner and Customer Enablement here at Tamr and he’s a 20 year veteran of enterprise data management and analytic software. And he’s held engineering, product, management, technology strategy roles at Lucent Technologies, Macrovision, and most recently, Lavastorm, where he was Chief Technology Officer. So thank you again, Mark, for joining. I think that you, with your experience bring a really great perspective on how organizations can implement both the people process and technology to support the DataOps ecosystem. And we’re excited to here more about his session, thank you.

All right, thank you, I’ll take it away. Thanks everybody for joining. Today we’re going to talk about DataOps in depth, starting off with our framework that we’ve developed at Tamr over the years in concert with our customers. Then, some tips and tricks on how to get started with DataOps regardless of your own level of data engineering maturity. And then some technology examples, we’re really going to talk generically about some of the components and tools you need to provide DataOps. But we’ll give some specifics under the auspices of helping you get started to choose those tools.
So I’ll start off with the DataOps framework and starting off with that, I’ll start off with a DataOps definition, to the canonical source here, Wikipedia. DataOps being an automated process oriented methodology used by analytic and data teams to improve the quality and reduce the cycle time of your data analytics. The analog is here is in DevOps which was an effort to reduce the cycle time of delivering functionality into new software, like you see with most modern web apps. Why can’t we get the same speed of new data, fresh data, to our end users in an enterprise data environment? So a heavy reliance upon automating away a lot of the stuff that was traditionally done with a lot of manual labor. And it’s not just technology, I’m going to talk a lot about technology. This is a Tamr summit but we’re also going to talk about how you structure teams and who you bring to there to fulfill a vision of modern data engineering.
And that’s really the framework. As with many frameworks in technology it is comprised of people process and systems. Process technology in organization, I’ll drill into the agile methodology on process. We’ll talk a bit, both about the architectural aspects and the infrastructure underpinnings of technology approach. And organization, different roles are necessary to fulfill this DataOps vision. Different roles then some of the people who have been doing this role historically in your organization. And also ways to structure the teams.
So let’s start off with process and why, how not to do it. I think most of us are familiar with the concept of water fall development in software and water fall delivery in data mastering or data management. It’s the same thing, you do an awful lot of modeling and you build a lot of rules to bring your data together. Do an awful lot of testing and then quarters down the line, out come some data. Those data, hopefully, are useful and only then are you going to know. Will you know that there is some value in what came out the other end. Will you be able to question the quality of results? Maybe provide feedback that says well you got this wrong somewhere along the way. And then you got to go back to the beginning, retrofit a data model, retrofit the rules that bring the data together, test again.
A lot of people need to be involved in this. Very long cycles and it’s very easy in these protracted projects to lose focus, to lose people who are nine months into the gig not seeing any value and have a lot of other things to do. And also to lose the value because the problem or the constituents for that problem have changed in this long duration. And it’s very difficult to course correct when you’ve taken this sort of monolith approach. So the right thing to do in an agile methodology, really, is to break these problems down. Not to do these big bang projects that take a very long time and solve all of the problems of the world. But, instead, to take an incremental approach. Chip away at the problem, breaking it down into smaller parts.
Is there a subset of the data? Is there a region where I can start looking? Is there a subset of the systems that are more available to me that are actually full with pretty good data and I’ll keep the other more difficult systems in reserve? Anyway that you can break down the problem so that we solve a portion of the problem. We get actual value from it, it’s not just POC, we’re getting actionable intelligence or insight out of that first step. We can then course correct, say oh this would be a lot more interesting if we had this data source or this is entirely wrong, we got it wrong. Well, if we got it wrong we only got it wrong for two weeks or a month, not for 12 months and have to do a big retrofit. So, breaking the problem down hugely important in both making sure that you get to get the end state faster and keeping people in the boat on a journey that may take awhile because of the complexity of the problem and the data underneath.
This again, it’s a Tamr summit so we can’t get too far without talking about machine learning. Why is machine learning so important? Everything that I just described in terms of breaking a problem down, in terms of delivering incremental value is facilitated by the automation that comes from the application of machine learning to the problem. Instead of groups of people, teams of people codifying business logic, in conditional logic, in sequel statements, or anything that they’re doing in a mastery data management system. Which is enormously complex and hard to maintain. We have the same experts who know the data sitting and answering questions about their data and the machine can then divine from those answers how to the data are related, bring the data together with drastically accelerated time to results.
It also alleviates the pressure that has been on the people who want to use these data today, the scientists, the data analysts, to spend a lot of their time on getting the data in shape. You hire these people to do data science, not to do data engineering. So the more we can offload this data engineering to machines and algorithms, the better that we can use the high value, and usually pretty expensive, people at solving the real business problems that we brought them in to solve.
Moving on from the sort of process into the technology. We divide, this is our seven components that we’ve got in the title of this. We divide the overarching data management tool set into seven categories of components. Each of which is necessary. The fullness of a data engineer, or digital transformation architecture, is going to include all of these. I have at the end of the presentation, I’m going to get down into specific examples of technologies and a bit more about each. But very broad brush we’ll start out on the left, catalog and crawling. We want to have a layer of abstraction from our raw sources. Every time somebody wants to go get customer data, we don’t want them to go to the CRM system or to Salesforce or have to go to both. We want them to be able to consult the catalog that says here’s where this type of customer data reside. And here’s where this type of customer data reside. And be able to broker some of those data without having to get into the idiosyncrasies or specifics to any one platform.
Optimally this is boosted by a crawling engine which can go out and interrogate automatically all of these systems and gather the metadata that allows us to tag and catalog. Once we’ve cataloged the data, then we have to move it usually. So we’re using ETL, ELT, to take the data from all of these different systems and land them somewhere. That is going to be a storage and compute layer. So we’re going to put the data somewhere in raw form, then we’ll have multiple potential staging areas. We’re going to do a lot of work to it, so we need a compute layer, like Spark or a high performing columnar data store. And then the work we’re going to do is mastering and quality. We’re going to remove duplication, we’re going to find linkage in the data, we’re going to improve the quality of the data by enriching the data or by removing issues with the data.
And then publish out to the right. That’s not just allowing people to query some operational data store, it’s providing them with a lot more information about what data are available to them, what happened to these data that made them analytically available or analytically useful and available to those uses. Who did what to these data? Where did the data come from? What version of the data is this and why is this data different from how it was last week? That’s what we mean by publishing and versioning, not just exposing the data for frequent query. Sort of on top and below are adjacent services that allow us in the governance and policy side to make sure that as we’re brokering all of these data out to people, we’re doing so in conforming to whatever regulatory or data protection or any policies that apply to who should be able to access this data and under what conditions.
Feedback and usage is as people are starting to use these data that we have published out to them, they should be telling us whether it’s right or wrong. Whether it’s useful or not useful. And in doing so, we can build a large corpus of very interesting information. We can determine which of these sources are the most valuable and be able to focus our energy on improving those sources and via feedback from ensuing applications, we get everything we need to go and fix the data upstream in those systems. So a real virtuous cycle arises here when we’ve done this in it’s fullness. So these are technology components. What is it about certain technology components of which there is a massive landscape. If I showed you a logo chart for this whole ecosystem, they’re enumerable options here. Why would we select one technology over another? So we have principles at Tamr that we propose should inform the selection of anyone of these technologies. And broadly I’ll say, each one of those boxes may have a different technology.
It’s a lot easier to stitch these technologies together than it used to be, given API driven design and then the sort of modern infrastructure and if you choose a single, if I superimposed one logo over all of that. Said, I’m just going to get this full stack from one vendor, you’ll probably end up getting the third or fourth best choice for each one of those components. Instead, I want the best choice and I want the best choice for me and I want the best choice for me today, versus a year from now, these things will change. So having these decoupled and choosing a best of breed approach for each is enormously important. It gives you a lot of optionality as your needs grow and as the state of the art changes. So these architectural principles, scale out and distributed. If you can choose cloud first technology, do it. If you can’t, get there as soon as you possible can.
The economics of scaling large data workloads in the cloud are just night and day compared to what you’d be able to do on [inaudible 00:14:08]. So the fast you can get there, the better, the quicker you can start there, even better. Collaborative, I talked about machine learning, we do that a lot. Anything that leverages machine learning or AI requires human beings to provide the guidance and training that these algorithms need to produce the right outcome. So very, very important. Just throwing a bunch of unsupervised machine learning to try to figure it out is not going to get you very far. It may help you start, but really in the end you need input from the people who use these data, who understand how the data are employed and interpreted to provide their insight into how the data should be shaped for them.
I talked about open, best of breed, don’t use one platform. Make sure that the only way this works is if each one of these tools is loosely coupled. It provides open APIs, it provides open or simple data interfaces for exchange, and can be latticed into any sort of orchestration layer. Continuous, the data arrive every minute, every hour, every second. They’re going to change all the time, they change day-to-day, so you need to accommodate how the data evolve. Make sure you know what changed, make sure that you are taking into account both batch and streaming workloads. The big dump that came out of the CRM system everyday versus the click streams that are coming off of our customer service portal every second. And accommodating both federated and aggregated storage. Depending on whether it’s an operational or an analytical work load, people may want individual records. They may want aggregated views, all of those should be brokered out, without anybody at the end consumption having to do that work themselves.
Lastly, if anybody’s going to believe any of these outcomes, which is enormously important, the lineage and provenance. Where did the data come from? Which shore systems contributed to the corpus of data that I’m going to use for my downstream application and what was done to it? What data models, what machine learning models, who contributed to those? Who knows the most about these data as evidenced by their contribution to the supply chain that got me to it? All of that stuff is very relevant when somebody’s just trying to figure out whether the data are appropriate for what they’re trying to do.
Infrastructure, there are a raft of tools. There’s the traditional OnPrem, a deep stack that allows us to do all of the heavy lifting that is necessary to do data engineering at scale. But we also have, from the three principle stack vendors, Microsoft, Google, and Amazon, we’ve got full stack of compute storage infrastructure. It’s become a lot easier to do management of applications given containerization and container management frameworks like Kubernetes. So the world is your oyster out there, choose a cloud, don’t feel like you may have seven years ago that you’re stuck with that cloud either. We’re evolving to a multi-cloud or a hybrid multi-cloud environment where some of the workloads may be better in one cloud than the other. Or maybe one cloud provides better infrastructure, economics. The other one provides better development tooling. Stand astride a few different clouds, don’t feel like you’re making a 10 year bet if you get there.
Moving on, organization, there are categories of participants in this whole flow. We have always had the data suppliers and we’ve always had the data consumers. Data consumers ranging from a citizen who just uses excel everyday, through a data scientist who’s writing a predictive models, to a developer who’s latticing data into consuming applications, like CRM portal of whatever. They’ve always been with us, that’s why we’re getting the data together. Data suppliers, who owns the system, who knows the most about how the data are structured, that’s it. What we’ve seen over the last, say, decade or so as we’ve had more and more data and more and more people trying to do more and more things with it, is the rise of this whole group of specialists. We call them data preparers, who are responsible for making sure that that data supply is provided to the data consumer in the right format, adhering to corporate data governance standards.
It’s just no longer the case that somebody who needs to run a report so that they can do a marketing campaign. Can just call up the person who owns Marketto and say, can you give me an extract, that falls down at scale when you have, as I said, more constituents and more data platforms. So these data preparers, in the form of data engineers, who build the infrastructure to get data from left to right. Data curators, who are responsible for making sure that the right data are provisioned to the right users and that the users are given enough context about the those data so that they can make good decisions. And data stewards, think of them sort of in the other direction, going right to left, ensuring that everything that we find out from the consumption of those data and the corrections of those data are incorporated back into the source systems or at minimum at least into the datasets that we’re brokering out to those consumers.
The worst thing that can happen is that somebody out in Excel land finds something wrong with the data, they correct right there in their Excel spreadsheet and nobody ever finds out about it. We really want to facilitate through good data stewardship, through feedback collection, that everybody can leverage what one person has learned and corrected in the data. This is just enumerating the different roles that people play in this game. This is to necessarily to say that you’re going to have eight people involved in each of these projects initially. Some of these are more hats that people wear than titles that people have. Say a curator or a steward. But overtime as the organization grows you’re actually going to nominate somebody that there full-time job is going to be data curation or their full-time job is going to be data stewardship.
This is just a representation to try divine right now who in your organization may already be fulfilling these roles or where you need to go if you needed to bring in more muscle and fil out a team. So the tools that they use are a pretty good proxy for the skillset that’s necessary to fill in anyone of these positions. Then structure, how do we structure the team? We’ve seen two major models. One of which has been with us for awhile, shared services, or center of excellence, or center of expertise. That you have a centralized, competency that has all of those seven categories of tooling, setup, they’ve built the infrastructure, they’ve build the methodology, the agile methodology.
And they are going to go to each line of business operation and say, bring me your problems. Then they’re going to prioritize those problems and then they’re going to solve those problems. They’re actually bringing everybody to the party. So you’re essentially as a line of business, constituent for one of these data projects. You’re just giving them your requirements, you’re providing some experts to help guide their hand, but you’re not actually doing any of the work yourself. There’s an advantage to this in that, you are centralizing this technical knowledge. It’s a one-stop shop and as you go from project to project they get better at it. And so, project five is going to probably have better tooling and be a bit faster than project one was because everybody is doing the same work. The trick here is prioritizing when you’ve got 14 different line of businesses, operations coming to you and saying my end is the most important problem, most valuable problem to solve. How do you prioritize that?
Because you are the one who actually has to prioritize it. The advisory model is more, or you can say the federated model is that we have this centralized competency where we have methods and procedures and technology selections and we’ll go to each line of business and explain to them how they should go about this, but then it’s on the line of business to muster the technology team that’s actually going to perform the work. So, and advisory model maybe we have a technology strategy division that is going to have all this competency.
Then we have sort of traditional IT who’s going to apply all of that guidance alongside the line of business, but they are segmented. We’re still getting some centralized technology expertise and we’re not having to make the choices and prioritization. However, we are potentially fragmenting some of the knowledge that we’re gaining from the application to each project and not bringing that back in. So if we learned something in project five, that it might have helped project one, it may never get back there unless we’re really, really rigorous in this advisory model about capturing not just how did the technology work, but how did the team work. This will fluctuate over time, recommendations usually start with [inaudible 00:23:52] services because you’re going to start with a small team and you want that team to really have to do the whole problem so they learn all of that and then overtime scale out to an advisory model.
Getting started with all of this. I’ve given you sort of an academic treatment of the people, process, and technology. How do you actually get going? The big rule here that informs everything that we talk about, is optimizing around immediate and then incremental value delivery. I showed you that waterfall versus the nice staircase pictorial representation earlier. And that is just enormously important to put points on the board early to say, all right we’re just a few weeks into this project and we’ve already produced an actionable dataset. It’s just a subset of our customer base but this is the subset that you’re going to use for the marketing campaign of the quarter. So go off and do that while we figure out how to get the rest of the systems online and get you the rest of the data.
And here it speaks to prioritization. It’s better to solve the third most valuable problem in the business that has the most accessible data and the most accessible team, then to solve the most important problem in the business that has the third most accessible data and the third most accessible team because it’s going to take you forever to onboard those data, to get those people in the boat. And we don’t have nine month before we can show some value if we want to make sure that we maintain mind share and investment in the projects and infrastructure that we’re building to support all.
Quick wins or quick losses, just make sure that it’s quick. So again, that sort of staircase methodology, make sure that if we chose the wrong tools to solve the problem or if we chose the wrong problem to solve, we were wrong for a month, not for a quarter. And that allows us to do a lot of course correction as not just, oh we make maybe bad decisions or the wrong decisions in the context, but also as things change. People are going to be wanting different things from the data six months from now then they said they wanted at the beginning and we need to be able to accommodate that.
Lastly, breaking the problem down. If you break it down into discreet milestones and at the beginning, do that. Say, all right, when we get to milestone one we think we’re going to have this much data to work with. We think these many people are going to be able to use and we’re going to be able to solve this much of the problem. Those are testable. That says, well where are we now? We said we were going to do this and we’re three months into it. Did we do that? If we didn’t, let’s have a retrospect and find out why we didn’t get there and course correct. But a priori, trying to break this thing down. Don’t boil the ocean, make very small portions of the problem, that’s huge.
So, getting started on the process side, agile is key. Talked to death about that. Inventorying the set of available projects. Go out and look at everything that you could possibly be doing and score them on availability of the data and the people versus the value of the problem. Like I said, it may not just be the most valuable problem that we solve because it will take us to forever to get there. We’ll get there, but let’s start out with something maybe less value but something that’s more actionable. But when we do solve that problem, we want a high value, data rich project because we want to do something worth doing when we put those first points on the board.
But we also want to make sure that we’re doing, we’re sort of stretching ourself to make sure that that end-to-end functionality is going to be covered so that we are starting to build some of that cataloging. And that we’re doing something that is subject to data governance even if we’re not going to slow ourselves down by fulfilling all of the data governance needs. Let’s just be mindful of all of these components and choose a problem that’s actually going to touch each.
Getting started with technology, start off with where you want to be. Find those seven categories and I’ll speak in a little bit about some of the technologies that we like within them. Create that end state, you know the vision for your technology and then inventory where you’re at, gap yourself against it. We may have grand aspirations for all these data to be served up in Microsoft synapse as this wonderful, analytical database. But right now we’ve got a bunch of old Teradata humming along. Okay, that’s fine, we’re deliberately going to get there. We’re going to chart a course there. We’re going to do that with all of our different tooling and then we’re going to be able to prioritize what sequence we do. Do we put a catalog in first or do we try the move the data out of Teradata first? Well that depends on who’s doing what with the data. How sticky some of those systems are and when we could conceivably turn them down. So start with the vision, gap yourself against it, decide the path.
On that path, decoupling monolithic processes. If a whole bunch of stuff is sitting in that enterprise data warehouse and you can pull some of that functionality out. Maybe pull some of the data transformation or data wrangling out into tools that are closer to the line of business and pull it out of materialized views. Let’s say that you’ve gotten an operational data store. That’s great, that’s starting to decouple. The other is to wrap some of these technologies, these legacy technologies that you know you’re going to throw out because the division you’ve established with APIs so that they are emulating the way that the future technology is going to work and allowing all of those other technologies to be built around them in a loosely coupled sense. So that when, down the line, you finally can turn off that legacy database, you’re not going to have to know, retrofit how all of the other tools work with it. And then start building with the new tech, choose a new project, greenfield, and kick the tires on these new technologies. Make sure you chose the right ones.
On the organization side, it’s looking at your current team. You’ve already got people doing this work. We all want to do it better but let’s inventory what sort of skills we bring to bare. Who knows what technologies. We’ve probably already got a lot of people on field who can play the sport. We may need to supplement with others. So only when we know what we’ve got do we know where we need to go. Creating a small cross-functional team that will be the nucleus of a DataOps organization. Let’s just start small and incremental. The data consumers have always been there, they’ll be there. But nominate, grab an ETL person or somebody who maybe already is calling themselves a data engineer. You are now the data engineer. You may also be the curator and the steward depending on the scope of the project. But somebody else in the line of business may end up being the curator or the steward. They wear that hat for awhile and then we build out and scale up behind them into people actually fulfilling their role, but start small.
But emulating the model. Start with shared services model, as I said earlier. Just one-stop shop, people bring their first few problems and you just solve them. Obviously collaborating with them but doing the work. Another really important thing here is making sure that these CVO or equivalent is aligned on this project. Without executive oversight and accountability and authority to make all of these people work together, a lot of these projects go off the rails. A grassroots movement to improve data engineering and instill the company with DataOps will only get so far. You need somebody who has the executive sway to be able to marshal resources, to be able to reprioritize projects, et cetera and achieve DataOps or equivalent is going to be your icon to represent this group.
What not to do. We’ve got a whole other segment in the summit about not boiling the ocean. And I’ve already talked really about not choosing waterfall and breaking it down and the virtues of doing so. Reiterating not to choose a single platform. You’re going to build a platform and don’t be scared, it’s not that hard given the interoperability that all of these modern tools are built for to stitch things together. But going with a single vendor, as I said, is going to give you suboptimal choices in each one of these components. If you build the thing yourself, the optionality you have on when somebody builds a better mousetrap for one of those components to just eject the one you have and slot in the other one, with minimal incremental effort, allows you to really molt and regenerate your data architecture ever year, every two years. So you’re not just stuck with a decision now for the next 10 years. You look back at some of the legacy systems you’ve got right now, they were chosen a long time ago. You don’t want to be saddled and constrained with them.
So single platform, single vendor, mm-mmm. If you’re going to bring in open source projects that do not have a sort of nominated curator, just don’t underestimate the work in that. Let’s say you’re going to bring in Apache Atlas for example as a catalog. It’s enormously powerful, enormously complex, know what you’re getting yourself into versus something like Alation which is a proprietary yet pretty open platform that just provides a lot more support, guidance, for the cataloging venue than Atlas. Not saying one is better than the other, it’s just you need to know, align your skills and your expectations accordingly. The other thing is we can have all the right people, process, and infrastructure, and organization structure in place, we’re not going to change people.
A lot of the reason why our data are vulcanized and why we may be working inefficiently is because of people and data hoarding or you parochialism. We need to be mindful of that always. And you know, that’s kind of manifest but just reminding everybody that everything that I just described is not going to solve the problem without some cultural shift. So, closing it out there, key principles here on the process side, quick wins, incremental value delivery on an agile track. On the technology side, loosely coupled, best of breed components with a lot of automation with humans being able to participate.
Infrastructure cloud native as soon as possible given the favorable economics and infinite scalability. On the organization side, specialization and separation of duties. Let people really focus on the task, just like they would in a traditional DevOps chain. You know, you’ve got billed people who are not writing the end user software and you’ve got end user software people who are not writing the build system. That separation of concerns around specialization and a lot of efficiency. And structuring the organization, centralized expertise and making sure that you’re capturing knowledge across all of those projects and getting better every time you do one.
So in conclusion, I’m going to give some technology examples because we said the seven components up front and I want to drill into the seven components. Technology components, catalog and crawling. So catalog, definitionally, we’re going to inventory all of the different sources and allow the users to easily find where types of data reside. We can do a lot more with that, but that’s really what we’re trying to accomplish. And then crawling can be a way to catalog the data via RPA or just some scripting, we can interrogate the thousands of data sources that you have and incorporate them, give the catalog something to work with, without a lot of human effort. Examples here, Microsoft Azure Data Catalog. We’ve been doing some work recently with that, it’s a great cloud based offering. Alation, really a best of breed tool here. Waterline is still used in a lot of places, they were really one of the first movers in this space for a proprietary yet pretty open system.
Movement and automation, what do we mean by that? Well moving the data, so ETL and ELT. Automation is the orchestration of all of this. So, we move the data, we land it here, then what happens and how frequently? Do we take this dump that we get from the CRM system every night and then process it every hour alongside some other system that is producing data every minute? That’s complicated and making sure that as the different systems and different users are working on different timetables, that we’re accommodating and revealing what’s changing with each of those iterations. So here you see Talend, great ETL vendor. Knime, sort of ETL self-service data prep for data scientists. StreamSets, fantastic best of breed on ETL.
Mastering and quality, well there’s a picture of Tamr so that’s taking all of the data, making it better, removing duplication, uncovering relationships that say that these 17 records aren’t actually all me and what’s the most current and accurate picture of me. Well, we’re really good at that and I’m not going to do anybody any favors if I talk about any competing technologies in any depth, so use us.
Storage and compute, huge ecosystem there. The major cloud vendors and then this sort of modern, hybrid lake, like Snowflake and Databricks that have made this massive database in the cloud that has the favorable attributes both of a traditional relational database management system and a modern data lake. Publishing data, there’s a picture of something that an application that you could build on top of your data to publish data out to the data citizens. These are people who just want to look at the data and then maybe download it and do something in Excel. Versioning is really important so that they know what, as I said earlier in the presentation, what changed in the last two days in the data. Do I actually want those changes or do I not want them? I need to know about them.
For analysts, you’re going to be using more like, your Tableaus, PowerBIs, Qlik, et cetera. Publishing out to data scientists, now they’re going to want to put their Jupyter Notebooks or Zeppelin notebook or something like Amazon SageMaker against the technology. Just making sure that they have the datasets in the native formats, like a data frame or just a CSV, whatever it is, but just giving them access to the data the way that they want to use it from those tools. And then developers, they just want to hit an API, so they want to hit a restful endpoint, get some JSON out and then do their thing.
Technology on feedback and usage, this is an example of a Tamr technology, or Tamr steward component wherein we are soliciting feedback direction from the end user applications and then treating systematically like you would treat bug reports in a system like JIRA. Really, really interesting if you can get people to participate in the ongoing feedback about the quality of data from the tools where they’re using the data. Rather than have to go into some data stewardship of portal and describe what’s broken about the data.
Lastly here, governance and policy. Governance is making sure that you have controlled access to data based upon whatever policies, either internal or external regulatory policies and those policies being the rules under which people are allowed to share and see data. Data governance here, you’ve got Collibra, they’re a big player there. Immuta, Privitar, on the sort of data protection and obfuscation side. These are some of the players in a growing segment.
So I will close it off and thank everyone for their participation here. Hopefully, you see, you learned some germane information here as you layout your own DataOps strategy.

This was great, Mark, thank you so much for this playbook. And for our attendees, it was a lot of content to pack into a short session. So I’m going to propose that we do a follow-up Q and A session later this month and potentially engage Mark in further discussion on this really great topic. So thank you all, thank you Mark, and enjoy the rest of the event.

Thank you.