Organizations have debated centralized vs. decentralized data for decades. As data warehouses and data lakes grew in popularity, centralized systems became the norm. But with data mesh emerging as a new and effective way to manage data, decentralized systems are gaining momentum.
There are benefits to both centralization and decentralization, but deciding which one is right for your business can be a challenge. The first step is to understand the differences between centralized and decentralized systems. Then, you can better assess the approach that is right for your business.
What is data centralization?
Data centralization emerged from the rise in popularity of data warehouses and data lakes. Simply put, data centralization manages the storage, cleaning, optimization, and consumption of data from a single point. It aims to minimize data silos by pooling disparate data together in a single place and making it accessible to anyone who needs it.
When data is centralized, ownership often resides within the centralized data team. They are responsible for maintaining, cleaning, and optimizing the data, as well as monitoring access to the data based on the organization’s governance and security policies.
Centralized approaches based on data lakes held great promise for many organizations, but the reality is that many pristine data lakes quickly deteriorated into data swamps. To address this issue, data mastering can be employed to clean up the data in the data lake, ensuring that users have access to accurate, consistent, and trustworthy information. Despite these efforts, many organizations are exploring alternative options to data lakes so they can better deliver the clean, integrated data their users need to make decisions. In fact, earlier this year, we predicted that companies will realize that data lakes are dead.
What is data decentralization?
Data decentralization, on the other hand, occurs when the storage, cleaning, optimization, and consumption occur not in a central repository like a data warehouse or a data lake. Data mesh is a popular example of a decentralized approach.
In a data mesh ecosystem, the data is distributed, with many more individuals across the organization assuming responsibility for ensuring the data is clean, integrated, continuously-updated, and consumable by those who need it. A data mesh ecosystem embodies a number of key principles:
Bring the data ownership closest to the people who know the data using data ownership by domain
Treat data as a product to avoid silos and make the data teams accountable for sharing the data as a product
Implement a new generation of automation and platforms to drive autonomy, making data available everywhere (self-serve)
Govern data where it is by introducing a new way of governing the data that avoids introducing risk
For data mesh to work, organizations must clean and standardize their enterprise data. Data mastering can help by serving as both a complement and augmentation to distributed data initiatives. Data mastering helps to standardize data to promote better understanding across systems and domains, and creates useful mappings between data identifiers across the organization.
Which approach is right for my organization? Many businesses struggle to decide which approach – centralized or decentralized – is right for their organization. But in reality, it’s not an either/or proposition. Most organizations will actually land somewhere in between, where they can take advantage of the best of both approaches.
When organizations experience the best of centralization and decentralization, they can continue to employ approaches such as data mesh and data fabric. These approaches enable them to rationalize and standardize as much as possible, without the need to fully decentralize their data. But in order to make data mesh successful, organizations will need a consistent version of the best data across their organization. And the only way to achieve this at scale is through data mastering.
Data mastering is a key, foundational step when it comes to implementing data mesh successfully. Why? Because data mesh empowers data ownership by a domain owner. But in reality, many domain owners may not want to assume the burden of owning a traditional master data pipeline on their own.
In a traditional pipeline, it’s difficult for data consumers to get feedback on the data issues being addressed. That’s why data mastering is so critical. With it, you gain functionality that enables you to solicit and capture feedback from those who know the data best.
When organizations land somewhere on the spectrum between centralized and decentralized data management styles, they are better positioned to experience the benefits of both, which leads to an increased ability to realize the holistic value of their enterprise data.