Tamr Insights
Tamr Insights
The Leader in Data Products
SHARE
May 11, 2023

Master Your Data with Tamr & Gigasheet

Master Your Data with Tamr & Gigasheet

Master Data Management (MDM) is the process of creating a unified master record for a business or person across an entire enterprise. It involves deduplication, entity resolution, reconciliation, and enrichment. However, mastering data can be a complex and daunting process, especially when dealing with large amounts of data from different sources. In this blog, we will discuss how you can use Tamr Mastering to streamline your MDM process and Gigasheet to inspect the data before and after running the data mastering and enrichment pipelines.Gigasheet is a powerful tool that allows anyone to explore, analyze, and transform big data in a spreadsheet-like interface (up to one billion rows). Tamr provides an AI-powered data mastering solution that can help you automatically identify and resolve duplicate or conflicting records across the enterprise and enrich your data with additional firmographic data.In this example, we’ll use Gigasheet to quickly inspect the data before and after the Tamr data mastering and enrichment pipeline. Gigasheet makes a handy tool as its cloud-based spreadsheet removes the need to set up a staging system for data preparation and visualization. It can handle large data sets (up to one billion rows) without the need for coding or any IT infrastructure.

Before Tamr

We’ll start with data from various enterprise systems, including SAP, Salesforce, and Marketo. This file contains approximately two million records on companies, associated spending with those vendors and the data’s source.Check out the before Tamr data hereAt first glance, this data looks relatively straightforward and clean. But after doing some basic grouping in Gigasheet, we see inconsistencies. In this simple example, we see obvious duplicates for Leidos, as the only difference is proper case and all caps.

Looking closer, we see other potential duplicates that are not as easy to identify:

I happened to remember that Science Applications International Corporation (SAIC) rebranded as Leidos a number of years ago. I also recall that they split out an independent SAIC corporate entity. Now, I just happened to know this because these are large government contractors and I live in the Washington, DC area. But what about all of these other companies that I’m not familiar with? I’m certain there are tons of other similar situations in this data, but I’m not about to comb through two million records by hand.You can see how data like this can quickly become a huge problem. In this case, the records are inconsistent because of company rebranding and spinout, but the same issues can occur with company mergers and acquisitions. There are many other factors that can also contribute to these inconsistencies, like human error. For example, in the CRM, one person may enter the account name as "Leidos, Inc.," while another may enter "Lidos North America." These issues in data across systems can make it difficult to pull together even the most basic reporting. Imagine trying to answer CEO questions like - how much business are we doing with Leidos?

After Tamr

After running the data through Tamr's pipeline, we now have a file with the same number of records but with nearly 100 additional columns. Upon browning the file, we see the columns contain additional firmographic data that can help organizations better analyze their data and make more informed decisions. For example, Tamr has added extensive information from Dun & Bradstreet, such as the company's legal formation and country. We also see Tamr cleaned up the addresses and created a clean Tamr-mastered company name.Check out the after-Tamr file here

Let’s explore how Tamr resolved the inconsistencies with the Leidos company names and duplicates. Now, when grouping in Gigasheet, we can see that Tamr has standardized the company name Leidos for easier analysis. Tamr's pipeline specializes in entity resolution, which disambiguates data points that may refer to the same entity. This pipeline results in “golden records'' that are the cleanest and most accurate representations of an entity in the enterprise data. Organizations can use these records to verify the work that Tamr has done and ensure that they are working with the most accurate data.

In Conclusion

Tamr Mastering is a powerful tool for transforming enterprise data. By cleansing, enriching, and resolving inconsistencies in the data, Tamr makes it easier for organizations to analyze their data and make informed decisions. With the creation of a lookup file of golden records, organizations can verify the accuracy of their data and have confidence in their analysis. As enterprise data continues to grow in volume and complexity, tools like Tamr will become increasingly essential for organizations that want to make the most of their data. Request a demo to learn more. Gigasheet offers a number of features that make it a powerful tool for data analysis, including:

  • The ability to import data from a variety of sources, including CSV files, JSON files, and even live databases
  • A variety of tools for cleaning and transforming data, such as pivot tables, formulas, and macros.The ability to visualize data using charts and graphs
  • The ability to share data with others and collaborate on projects

Gigasheet is a powerful tool for anyone who needs to analyze large data sets. It is easy to use, affordable, and offers a variety of features that make it a great choice for businesses of all sizes.Watch the demo to learn more about Gigasheet.