What is Data Unification?

To those unfamiliar, data unification may seem insignificant. After all, it can’t be that hard to unify data, right? Unfortunately, this is a huge misconception. Data unification is an incredibly complex process and one of the biggest challenges many large organizations face today.

Before we dig into how it works, let’s take a look at the definition of data unification:

Data Unification (n): The process of ingesting data from various operational systems and combining them into a single source by performing transformations, schema integrations, deduplications, and general cleaning of all the records.

The Data Unification Challenge

To understand the major challenges with data unification, think about all the different systems and programs used at your organization. Each one captures data differently. Now imagine trying to combine all of the data across your organization into one master source. This process is incredibly difficult to achieve at scale, especially when it involves hundreds of thousands of datasets.

To give you a better idea of what this process entails, here’s a high-level breakdown of the data unification process from the viewpoint of Michael Stonebraker:

  • Ingesting data, typically from operational data systems in the enterprise.
  • Performing data cleaning, e.g., -99 is often a code for “null,” and/or some data sources might have obsolete addresses for customers.
  • Performing transformations, e.g., euros to dollars or airport code to city_name.
  • Performing schema integration, e.g., “salary” in one system is “wages” in another.
  • Performing deduplication (entity consolidation) e.g., I am “Mike Stonebraker” in one data source and “M.R. Stonebraker” in another.
  • Performing classification or other complex analytics, e.g., classifying spend transactions to discover where an enterprise is spending money. This requires data unification for spend data, followed by a complex analysis on the result.
  • Exporting unified data to one or more downstream systems.

As you can see, unifying data is complex, which is why the vast majority of today’s organizations face a data mastering crisis.

The Data Preparation Ecosystem

Because of this data mastering crisis, there is an immediate need for organizations to have internal and external datasets that are agile and curated. And organizations that provide this type of datasets are part of the rapidly-expanding data preparation industry that is expected to grow to $12.89 billion by 20281.

That explosive growth is largely because of the fact that data scientists spend nearly 50%2 of their time on data prep alone. New tools aim to greatly reduce this time and are quickly becoming the industry standard, with many new projects incorporating the use of data preparation tools. 

Data unification is an integral part of this new data preparation ecosystem and is an essential input to tools used by analysts and consumers, such as self-serve data prep tools and data catalogs. These users can’t be expected to be productive and generate meaningful business insights without a foundation of trustworthy data, which data unification provides. Further, the emergence of DataOps and the strategic need to increase analytic velocity in the enterprise has accelerated the move towards this modern architecture.

Out With The Old: The Traditional Process to Unify Data Isn’t Effective

Legacy approaches to data unification typically revolve around ETL and MDM.

ETL or Extract, Transform, and Load involves writing an upfront global schema and then relying on a programmer to understand the schema and write conversion, cleaning, and transformation routines as well as all necessary record updates.

MDM or Master Data Management involves creating a master record where all entities across the organization are defined and then merging all records to match the master.

Both ETL and MDM are incredibly labor-intensive, requiring complex rules systems to be developed to unify data. These systems have a high upfront cost to develop and are costly to maintain. As a result, data unification efforts are often limited to a select few high-value data sources.

In With The New: A New Approach To Data Unification Is Working Wonders

A new, more effective data unification process has emerged. Using concepts from agile software development, global organizations such as Toyota have fully mastered their data and gained access to powerful insights that have increased efficiencies up to tenfold and saved  millions in the process. 

The agile approach uses a powerful data unification platform and a combination of machine-learning and human expertise to conquer the data. The result is data that is unified, mastered, and up-to-date, something that was near impossible with the old methods.

Customer Data Unification at Toyota Motors Europe

As a Toyota subsidiary that oversees European Operations, Toyota Motors Europe (TME) embraced Consumer One to help ensure it was delivering on its customer’s expectations in an increasingly connected, digital age. TME’s goal: to better understand and predict customer needs and to make each interaction feel cohesive, resulting in an enhanced experience with their brand.

To deliver on this goal, TME needed to unify its customer data at both the national marketing and sales companies (NMSCs) and pan-European level. But the level and method of customer data integration varied greatly by NMSC. While some created centralized customer databases, others built separate databases for different business functions. Integration methods varied, too, from simple, manual integration to full-blown master data management solutions – and everything in between. And while effective with small amounts of static data, these approaches could not scale as the volume and complexity of their data increased. 

Because of these disparate approaches, massive amounts of customer data, with varying degrees of quality, were spread across multiple siloes. As a result, the lack of a consistent view of data hindered TME’s ability to innovate and meet customer expectations.

TME partnered with Tamr because they provided speed and scalability, embraced data “entropy” as a fundamental property, delivered expert sourcing capabilities, and blended accuracy and efficiency. As a result of implementing Tamr, TME has seen a 40% reduction of duplicative customer records, enabling them to increase efficiency and business value. TME views Tamr’s technology as a strategic resource within their organization, helping them to improve sales and marketing initiatives. And, they also see it as a platform for innovation across the enterprise.

You can hear more about Toyota Motors Europe’s customer data unification using Tamr here.

Final Thoughts

According to Forbes3, Internet users create 2.5 quintillion bytes of data each day. And this number is only growing. Businesses need data unification to make sense of this endless data stream to make smart, data-driven decisions and compete in a global economy. You’ve heard the expression knowledge is power. For modern-day businesses, that knowledge comes from having complete access to reliable, up-to-speed data, and avoiding these pitfalls along the way.

To learn more about data unification and how Tamr can help you address these challenges, please schedule a demo

Schedule a Meeting


  1. https://www.grandviewresearch.com/press-release/global-data-preparation-tools-market 
  2. https://www.datanami.com/2020/07/06/data-prep-still-dominates-data-scientists-time-survey-finds/
  3. https://www.forbes.com/sites/forbestechcouncil/2021/08/02/understanding-generation-data/?sh=74fe4d0636b7