We recently hosted a webinar with two DataOps industry experts: Mark Marinelli, Head of Product at Tamr, and Wayne Eckerson, President of Eckerson Group, a firm specializing in helping business leaders use data and technology to drive better insights and actions.
Here are some of the key points these two experts shared to help you get started on your DataOps journey.
The Origins of DataOps
Over the past 10 years, we’ve seen the emergence of DevOps–an approach to software development that accelerates the build lifecycle (formerly known as release engineering) using automation. This merging of software development and IT operations reduces time to deployment, decreases time to market, minimizes defects, and shortens the time required to resolve issues. Now, data engineers and data scientists are embracing a sister discipline—data operations (DataOps).
DataOps applies the rigor of DevOps to speed analytic outcomes for the enterprise. It is a set of practices, processes, and technologies for building, operationalizing, automating, and managing data pipelines from source to consumption.
How Do You Know if Your Organization Needs DataOps?
So what are the signs that your organization needs DataOps? Here are some of the most telling indicators based on Wayne’s research at the Eckerson Group:
Your data team is in full burnout mode, because they’re being inundated with too many minor request tickets. Business users don’t understand why it takes so long to get data, and even when they do get it, they often don’t trust it because the data contains too many errors.
Data analysts write the same jobs and reports with minor variations. Data scientists may wait for months for data and computing resources. Your organization may have started self-service initiatives, but this strategy has spawned hundreds of data silos. It may take months to deploy a single predictive model. Your organization also might not know about the trade-offs of adopting on-premise or cloud-based solutions for their projects.
If this sounds familiar, you’re probably not alone. According to an Eckerson Group survey of 175 respondents conducted in April 2019, 43% of organizations do not have DataOps initiatives and only 30% “somewhat” do. Only 27% say they have established active DataOps programs.
Why the Slow Adoption?
There are many benefits to adopting a comprehensive DataOps strategy, including faster cycle times, fewer data defects and errors, faster change requests, scalability and reliability, lower costs, more innovation, improved data governance, and happier business users, among others. Despite these advantages, the percentage of organizations with initiatives remains low.
In this same Eckerson Group survey, organizations cited many common reasons why they have struggled with getting started with DataOps, including difficulties with:
- Establishing formal processes (55%)
- Orchestrating code and data across tools (53%)
- Staff capacity (50%)
- Monitoring the end-to-end environment (50%)
- Building rigorous tests upfront (47%)
- Lack of adequate automation tools (42%)
- Getting business users to buy into the process (35%)
- Adopting agile methods and teams (34%)
- Data too hard to find (26%)
- Getting technical users to buy in to the process (23%)
How to Build a DataOps Framework
According to Mark, a DataOps framework is about three basic components:
1. Technology—both architecture/tools—and infrastructure, or a platform to support the architecture
The exact architecture for your organization will likely be unique, but there are several important caveats. When designing an architecture, organizations should think “cloud first” and assume that data will always change. Choose open/best-of-breed technologies and be sure that humans are always at the core. Every infrastructure, too, needs to have several important components: management, search, compute, storage, and a cloud-based foundational infrastructure.
This refers to a division of labor across multi-disciplinary teams (data suppliers, preparers, and consumers) and a working structure for projects across technical and business teams. Be sure to define the roles of every participant, from data source owners to end-users making business decisions.
3. Process—an agile, incremental delivery model
The appropriate model will fluctuate with the scale of your DataOps project work. There are several options, including an advisory model that bootstraps projects with best-of-breed tools and approaches. This approach has pros and cons. Another option is the shared services model, with full-service data applications, developed in collaboration with business. Again, this approach has unique advantages and disadvantages.
The important thing to remember about process is that rules-based approaches that rely on modeling and testing are too labor-intensive, monolithic, and IT-driven. In today’s fast-paced business world, DataOps initiatives that are automated, incremental, and collaborative are a must.
Get Started Today
People have been managing data for a long time, but we’re at a point now where the quantity, velocity, and variety of data available to a modern enterprise can no longer be managed without a significant change in the fundamental infrastructure and supporting DataOps processes.
Learn more from industry experts about best practices for getting started with DataOps by watching the recording of our latest webinar, Best Practices in DataOps.