Tamr Tamr
  • Solutions

    Platform

    • Cloud-Native Master Data Management (MDM)
    • Accelerating Analytic Outcomes

    Solutions

    • Clinical Trials Data Repository
    • Customer 360 for B2C
    • Customer 360 for B2B
    • Data Mastering for Product Rationalization
    • KYC
    • Reference Data Management
    • Spend Optimization

    Industries

    • Financial Services
    • Manufacturing
    • Healthcare
    • Life Sciences
    • Energy, Mining, & Utilities
    • Public Sector
    • Consumer Packaged Goods

    Master Your Data. On the Cloud. It’s On Us.

    Modernize your master data management solution today and you’ll realize value from your customer data—on us—at unprecedented speed.

    Why You Should Upgrade

  • Partners

    Cloud Partners

    • AWS Cloud Partner
    • Microsoft Azure Cloud Partner
    • Google Cloud Partner

    Partners

    • Partner Overview
    • Public Sector Partners
    • Become a Partner

    Featured Cloud Partners

    The only data mastering solution offering cloud-native deployments. Learn more:

    AWS Cloud Partner
    Google Cloud Partner
    Microsoft Azure Cloud Partner

  • Customers
    • Featured Customer Stories
    • Customer Resource Case Studies

    Hear from our Customers

    Tamr is proud to collaborate with leading companies across the globe to master data and realize desired business outcomes.

    Our Customers Stories

  • DataMasters

    Data Masters

    • Videos On-Demand
    • DataMasters University - DataOps
    • DataMasters University - Modern MDM
    • 2020 DataMasters Summit Overview
    • 2020 DataMasters Agenda & Speakers

    Podcast

    • All Episodes
    • Latest Episode
    • Subscribe via Apple
    • Subscribe via Spotify
    • Subscribe via Google Podcast

    CDO Interview Videos

    • All Videos
    • Latest Interview

    Tamr DataMasters University

    Learn about the most important topics in data management from industry leaders and Tamr experts.

    DataOps
    Modern MDM

  • Company
    • About Us
    • News
    • Careers
    • Contact us
    • Michael Stonebraker

    Who is Michael Stonebraker

    Dr. Michael Stonebraker is an adjunct professor at MIT CSAIL and a database pioneer who specializes in database management systems and data integration.

    Who is Michael Stonebraker

  • Resources
    • All Resources
    • Podcast
    • Upcoming Webinars & Events
    • Ondemand Webinars
    • Blog
    • Ebooks & Whitepapers
    • Analyst Reports
    • Case Studies
    • Videos

    Gartner 2020 Market Guide for Data Preparation Tools

    According to Gartner, “By 2021, organizations that offer users access to a curated catalog of internal and external prepared data will realize twice the business value from analytics investments than those that do not.” View the Guide

  • Support & Docs
    • Documentation
    • Support Login
    • Services

    Contact Support

    Log in and create a support ticket to get started.

    Contact Support

  • Filter Search
    Filter Search
    * To search Docs click here

Schedule a Meeting
metadata enrichment

Using Tamr’s metadata enrichment, discovery, and standardization to unlock new possibilities in data management

Bernie Kuan
Author: Bernie Kuan
  • Featured Content
Published Feb 11, 2020

At the genesis of any data-driven initiative or project, we must first find and access the relevant data in our organization to pull together. Better yet if the data we find can be inter-operable or re-usable across both new and old use cases.

But how often is this process easy?

While there is a growing trend of being compliant to FAIR principles in data management, experienced data practitioners know that achieving this state is easier said than done.

However, with the help of Tamr, upgrading data management expectations has become a lot simpler. Tamr’s effectiveness is especially apparent for organizations currently relying on large, de-centralized data management processes; the ability to leverage machine learning for linking data sets across the organization not only make data more accessible, but greatly enables large data transformation initiatives for enterprises that may at first seemed impossible without a complete change over of how the company operates.

In a previous post, I described how Tamr’s agile data mastering capabilities can augment the effectiveness in data searchability and access when using data catalog and management tools. In this post, I’d like to further detail situations where using Tamr on metadata can:

  • Enrich metadata for actionable analytics
  • Simplify the discovery of new data model schemas, and
  • Support the foundation for integrating disparate data sources into standardized data outputs.

Metadata Enrichment

The previous post’s example of applying Tamr on an organization’s metadata is to enrich the metadata of data catalog views with attributes that can increase searchability within a catalog. However, it is important to realize that even outside of the cataloging tool, Tamr’s generated metadata can also be used for identifying other relevant attributes, streamline the identification of disparate attributes, and the support analytics executed on data assets.

Data Relevance – Is one column attribute similar to another?

Even with a large data catalog, it may still not be obvious what data is available for your specific data project without spending a lot of time digging through it. This problem is common among organizations using catalog software; most cataloging tools are made to optimize for data governance rather than access and usability.

To tackle this challenge, Tamr can be used to group together data attribute names across datasets based on various configurations of granularity such that there are multiple ways to find relevant data attributes.

In the below example, it may be difficult to determine at-a-glance whether study_id, geneid, and ensembl_id are related just from the table and column names.

 

However, if a user manually reviews other metadata columns such as dataType and sample values, the user might be able to infer that (nvivo_table_1, enseble_id) is similar to (genetics_table1, geneid), and (invivo_table_1, study_id) is very different.

Tamr automates this manual approach into a machine learning model to group together similar attribute names based on the human understanding of how similar attribute metadata are from one attribute to another; this is illustrated in a sample of the Tamr model training interface shown below.

 

After attribute names are clustered together, Tamr can provide appropriate tagging for easy searchability such that the user can now search for “ensemble_gene_id”, for example, and capture all relevant data attributes across sources.

Clarity & Context – What does an attribute refer to?

Even if all users and data stewards become subject matter experts of the organizations wealth of data, having to manually review, tag, and maintain every dataset is not a sustainable or scalable approach.

Another advantage of having Tamr’s machine learning approach to group metadata attributes together is that attributes can be tagged in an automated and scalable way immediately whenever new datasets are added over time.

In the below example, Tamr processed the attributes study_id, geneid, and ensemble_id into distinct clusters that can be tagged by two different contexts. The first Tamr tag helps identify whether attributes refer to gene ids or study ids, while the second tag helps identify whether the table involves invivo or genetics research. Tamr’s approach to classifying metadata can help organize and maintain large volumes of datasets quickly.

Analytics – What actionable insights around my data can I generate?

Finally, with a feasible, scalable way to group, tag and categorize metadata, various types of operational and strategic analytics can be done to further digital transformation goals.

In a very simplistic example, trying to determine how many datasets exist within your organization contain data involving gender may not be straightforward without considerable knowledge of what you’re looking for. Searching and parsing catalog data for a simple answer is not obvious due to the variety of ways gender information can be represented or miss-spelled.

Using Tamr to generate standardized tags for analytics can help organizations better understand available data, as well as make actionable decisions on resolving any issues with data collection or usage.

Schema Discovery

As an extension of generating standardized tags for analytics through metadata enrichment, Tamr can be used to greatly simplify the naming of new data structures.

As we know, research data can be named very differently depending on the context of the project. In the below example, data tables were taken from two different research groups and it is unclear whether the data can be combined in a useful way. Using Tamr to link similar attribute names and metadata, it is possible to find attribute relationships across the two datasets and propose an example of what a unified view may look like.

Schema Standardization for Data Integration

Finally, building on top of Tamr’s capability for metadata enrichment and schema discovery, Tamr can cluster common attributes names across disparate datasets to create golden record names of attributes standardizing data models. This built-in advantage is significant because it sets the foundation for downstream data integration and harmonization pipelines (whether using Tamr or other ETL software); in the past, this process would traditionally have taken weeks to months of data definitions and requirements gathering to prepare.

Below is a potential view in which Tamr helps build to support any data integration or data harmonization activity. For example, if we are seeking to search through the entire R&D data lake to analyze production data for invivo studies, we can probably pull all the data across different sources into one table to start investigating within hours.

 

With such a view readily available, highly manual and mistake-prone activities such as building out entity relationship diagrams, mapping attributes, and auditing data lineage across large volumes of sources becomes much more manageable!

Conclusion

The traditional approach of manual metadata curation is inefficient and simply not sustainable for growing organizations with increasing volumes of decentralized datasets being generated over time. With Tamr, organizations can now finally leverage a true modern, agile approach in metadata management, and achieve significant acceleration to digital transformation initiatives in various ways from cataloging data assets to the actual integration of specific dataset records.

Based on the Tamr’s capabilities in linking metadata, are there other applications that you can see in your organization?

To learn more about the role of Tamr in enriching, discovering and standardizing metadata, reach out or schedule a demo.

Share this

  • Click to share on Twitter (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Facebook (Opens in new window)
  • More
  • Click to print (Opens in new window)
  • Click to email this to a friend (Opens in new window)
  • Click to share on Reddit (Opens in new window)
  • Click to share on Pinterest (Opens in new window)

Subscribe to our newsletter!

Now you can get the top stories from Tamr delivered to your inbox. Enter your email below.

Subscribe

Suggested

Drive an Intelligent Cloud Migration with Tamr, intelia and Google Cloud in 2021

Tamr Apr 7, 2021

Three ways Tamr can enhance your data catalog

Qin Li Mar 25, 2021

How to maximize your cloud migration from Google Cloud and Tamr

fredoconnor Mar 22, 2021

Quickly Building Agile Data Pipelines Using Modern Data Mastering

Ethan Peck Mar 11, 2021
View All

Industries

  • Financial Services
  • Manufacturing
  • Healthcare
  • Life Sciences
  • Energy, Mining, & Utilities
  • Public Sector
  • Consumer Packaged Goods

Solutions

  • Clinical Trials Data Repository
  • Customer 360 for B2C
  • Customer 360 for B2B
  • Data Mastering for Product Rationalization
  • KYC
  • Reference Data Management
  • Spend Optimization

Platform

  • Cloud-Native Master Data Management (MDM)
  • Accelerating Analytic Outcomes

Partners

  • Partner Overview
  • Public Sector Partners
  • Become a Partner

Cloud Partners

  • AWS Cloud Partner
  • Microsoft Azure Cloud Partner
  • Google Cloud Partner

Customers

  • Featured Customer Stories
  • Customer Resource Case Studies

CDO Interview Videos

  • All Videos
  • Latest Interview

Podcast

  • All Episodes
  • Latest Episode
  • Subscribe via Apple
  • Subscribe via Spotify
  • Subscribe via Google Podcast

Data Masters

  • Videos On-Demand
  • DataMasters University - DataOps
  • DataMasters University - Modern MDM
  • 2020 DataMasters Summit Overview
  • 2020 DataMasters Agenda & Speakers

Company

  • About Us
  • News
  • Careers
  • Contact us
  • Michael Stonebraker

Resources

  • Ebooks & Whitepapers
  • Analyst Reports
  • Case Studies
  • Videos
  • All Resources
  • Podcast
  • Upcoming Webinars & Events
  • Ondemand Webinars
  • Blog

Support & Docs

  • Documentation
  • Support Login
  • Services

Have questions or want To schedule a demo?

contact us
© 2021 Tamr all rights reserved
  • Contact us
  • Terms of Use
  • Privacy Policy
  • Facebook
  • Instagram
  • Twitter
  • LinkedIn
Get this Offer Today!

Launch Your First Project. It’s on us.

Tamr needs this information to fulfill contact requests. By proceeding, you agree to our Privacy Policy.

Talk to Our Data Experts

Join a brief consultation about how data mastering at scale can drive business outcomes.

Tamr needs this information to fulfill contact requests. By proceeding, you agree to our Privacy Policy.

Talk to Our Data Experts

Join a brief consultation about how data mastering at scale can drive business outcomes.

Tamr needs this information to fulfill contact requests. By proceeding, you agree to our Privacy Policy.

Schedule Your Meeting

See how data mastering at scale can drive business outcomes.

  • This field is for validation purposes and should be left unchanged.

Tamr needs this information to fulfill contact requests. By proceeding, you agree to our Privacy Policy.

Get this Offer Today!

Launch Your First Project. It’s on us.

Tamr needs this information to fulfill contact requests. By proceeding, you agree to our Privacy Policy.

Register for Data Masters Summit 2020

  • Tamr needs this information to fulfill contact requests. By proceeding, you agree to our Privacy Policy.

Talk to Tamr Data Expert Now

See how data mastering at scale can drive business outcomes.

  • This field is for validation purposes and should be left unchanged.

Tamr needs this information to fulfill contact requests. By proceeding, you agree to our Privacy Policy.

Register for Data Masters Summit 2020

  • Submit

Subscribe todata masters

listen-apple
listen-spotify
listen-google
rss-feed

Subscribe to our newsletter!

Now you can get the top stories from Tamr delivered to your inbox. Enter your email below.



Tamr needs this information to fulfill contact requests. By proceeding, you agree to our Privacy Policy.

loading Cancel
Post was not sent - check your email addresses!
Email check failed, please try again
Sorry, your blog cannot share posts by email.