What Is Data Unification And How Do You Achieve It?

If you’ve spent some time on Tamr’s website you’ve most likely stumbled across the phrase “data unification” at least once. But what exactly is data unification?

Data Unification (n): The process of ingesting data from multiple systems throughout an enterprise and cleaning it through performing data cleansing, transformations, schema integration, deduplication, and classification so that the end result is one unified and accurate data source.


The 20 Year Challenge With Data Unification

Data unification is essential to ensuring an organization’s data is accurate so that it can be leveraged to gain valuable insights that will drive better business outcomes. But the reality is that most companies lack the resources and technology to accomplish this task.

That’s because large organizations take in a massive amount of data on a daily basis, and that data keeps growing as technology makes it easier (and more valuable) to collect. For some major corporations out there like GE, it’s simply not feasible for data engineers to keep up using traditional data unification solutions.

These traditional solutions have remained largely unchanged for the past 20+ years. They are the ETL (extract, transform, and load) and  MDM (master data management) systems and unfortunately, both come plagued with limitations:

  • ETL limitation: Requires substantial bandwidth from programmer (interviewing various business owners, writing rules and scripts, etc.)
  • MDM limitation: Relies on a collection of rules written for an environment and doesn’t take into account nuances

A New Way To Tackle Data Unification

ETL and MDM both serve their purposes as a means to an end when it comes to making sense of some of an organization’s data. But in today’s day and age, it’s no longer acceptable to only have a partial understanding of your data and how it relates to your business goals. Competition is fierce, and when you leave a large portion of your data unmastered you’re vulnerable to a host of problems including inefficiencies and data breaches.

To tackle the present day data mastering problem, engineers must look at data unification in a different light and find a solution that addresses the following shortcomings of ELT and MDM:

  • Scalability: At a certain point data records collected far surpass the rate that any feasible manpower can clean and master them. Being able to scale your data unification process is key to success as you grow.
  • Schema The amount of time and bandwidth it would take to build a schema-first model that ELT and MDM rely on is unfeasible. Having a schema-last model is key to success here.
  • Collaboration: A data scientist can create the framework for data mastering but may not know how to correctly interpret all the data, making collaboration essential to getting accurate results while data mastering.

Michael Stonebraker’s 7 Tenets Of Scalable Data Unification

The above is just the tip of the iceberg when it comes to what is needed to successfully accomplish data unification at scale.

Tamr’s founder and renowned data scientist Michael Stonebraker recently put together a whitepaper on the topic of data unification and how to successfully achieve it in today’s day and age.

Click here to read the whitepaper and learn about the 7 tenets of scalable data unification today.