Tamr F.A.Q

Learn more about the Tamr platform and how it can be used to connect disparate data sources within the enterprise.

Table of Contents

What is Tamr?
What is Data Curation?
What is Data Unification?
Who are Data Stewards?
Is Tamr part of a continuous business process, or is it a one-time solution to a specific problem?
How does Tamr distinguish itself from MDM and other competitors?
How soon can Tamr show value?
How do you know when you’ve finished the model?
What kind of infrastructure does Tamr require?
What data sources can Tamr ingest?
Does Tamr look at the transactions around the data sources or only the reference data?
For extremely large datasets, does Tamr provide a sampling suggestion based on data profiling, or will I have to provide the data sample?
How long does training take?
How do you validate the model generated by Tamr?
How does Tamr parallelize the clustering/model building?
How do I update Tamr with new data? How does Tamr deal with versioning?
Does Tamr maintain history and records of what’s getting changed?
Does Tamr support multiple models?
How does Tamr publish a unified view to different systems?
How can the output from Tamr be used with my system?
How do ETL processes interact with Tamr?
Does Tamr engage with quality or profiling tools?
How does Tamr work with various analytics tools?
Does Tamr help me understand an appropriate ‘type’ for an attribute?
How do you operationalize the model?

What is Tamr?

Tamr lets organizations connect and enrich all of their data sources, both internal and from
partners and third parties. It overcomes the traditional problems of manual data curation and
allows for continuous, cost­-effective and timely connectivity of hundreds and thousands of
sources.

What is Data Curation?

Data curation is the process of creating a unified view of your data with the standards of quality, completeness, and focus that you define. A typical curation process consists of:

● Identifying data sets of interest (whether from inside the enterprise or external),
● Exploring the data (to form an initial understanding),
● Cleaning the incoming data (for example, 99999 is not a valid zip code),
● Transforming the data (for example, to remove phone number formatting),
● Unifying it with other data of interest (into a composite whole), and
● Deduplicating the resulting composite.

What is Data Unification?

Data Unification is part of the curation process, during which related data-sources are connected to provide a unified view of a given entity and its associated attributes.

Who are Data Stewards?

Data Stewards are those responsible for the management and oversight of an enterprise’s data assets. Their mission is to provide business users with high-quality data that is easily accessible and best represents the underlying entities and their associated attributes. Data Stewards will typically work with developers, business owners, and data architects to ensure that data is consistent with organizational policies and standards.

Is Tamr part of a continuous business process, or is it a one-time solution to a specific problem?

Tamr is a solution to a persistent problem. Data is constantly changing, and new sources are always popping up. The Tamr platform provides a method to continuously update your unified view based on these ongoing changes.

How does Tamr distinguish itself from MDM and other competitors?

Tamr is purpose built for scale and data variety. Where MDM excels with connecting limited sources, Tamr can connect many more sources in a cost effective, scalable way. Our platform actively learns and improves with each integration. Unlike MDM, the cost per source should decrease over time.

How soon can Tamr show value?

We usually engage in 30 day pilots to demonstrate value, where the output is production quality and ready to be operationalized.

How do you know when you’ve finished the model?

Our field engineers will work with you to establish criteria for a successful model, including desired precision and granularity. We can compare the Tamr output to a system currently in place, if you have one.

What kind of infrastructure does Tamr require?

Tamr is a java based web app running on Tomcat and is designed to be backend agnostic (work with a triple store data structure). RESTful APIs are used to ingest sources and publish output.

What data-sources can Tamr ingest?

Tamr can ingest data from a variety of structured and semi-structured sources, including PostGres, JSON, and XML formats.

Does Tamr look at the transactions around the data sources or only the reference data?

Tamr has the ability to extract features and relationships in both transactional and reference data.

For extremely large datasets, does Tamr provide a sampling suggestion based on data profiling, or will I have to provide the data sample?

Tamr can typically begin building a model using only a few thousand rows of training data. During a pilot, Tamr field engineers will work with you to help identify the appropriate data for your specific use case.

How long does training take?

Training is dependent variety of factors, including finding suitable experts, the complexity of the underlying data, and the similarity to other Tamr use cases. It can range anywhere from 24 hours to a couple a weeks.

How do you validate the model generated by Tamr?

The model is constantly validated based on expert feedback. Outside of training, we can compare the output of Tamr’s model to a system already in place.

How does Tamr parallelize the clustering/model building?

Tamr has major optimizations to scale for schema mapping and record matching. We size the back end to meet SLAs. (Address N^2)

How do I update Tamr with new data? How does Tamr deal with versioning?

We currently offer a RESTful API that can both ingest data and update the system of any changes to the data. The API will send data through the already trained model and assign or reassign cluster IDs.

Does Tamr maintain history and records of what’s getting changed?

Tamr is late binding, meaning that we always preserve the original state of the data. Though there may be some transformations and normalizations necessary, the data itself is left in its original state. We do keep records of expert responses and data steward actions in order to learn from these decisions.

Does Tamr support multiple models?

Yes. As an example, one customer currently uses multiple Tamr models to match records at varying levels of granularity. In another engagement, the customer has two differently optimized models currently in production.

How does Tamr publish a unified view to different systems?

The Tamr output is accessible via a RESTful API. It is system agnostic for both input and output functions.

How can the output from Tamr be used with my system?

Tamr’s output can be used in a variety of ways depending on your use case. It can be used to create a master referential data-set, map linkages between datasets, and provide a consolidated view across clustered entities.

How do ETL processes interact with Tamr?

Tamr’s output essentially acts as a system of reference or “map” for where data exists in your enterprise environment. ETL processes can then use this output in order to create a dataset for a given project or use case. Tamr does not transform the data nor transfer it to a different location.

Does Tamr engage with quality or profiling tools?

Tamr profiles available data-sources for the purpose of discovering potential relationships amongst the various entities and attributes. Tamr can help highlight issues with data quality, such as duplicate records or mislabeled attributes, by producing a unified view of a given entity across multiple data-sources.

How does Tamr work with various analytics tools?

Tamr sits upstream of analytics, connecting various data-sources to provide a unified view of a given entity that can then be consumed by various business intelligence and analytics tools.

Does Tamr help me understand an appropriate ‘type’ for an attribute?

Tamr is agnostic to where it publishes the data and is capable of adding data type recommendations on the associated output.

How do you operationalize the model?

Currently we offer a RESTful API that can match incoming records to the already trained model and assign a cluster ID. This API is used in production with several Tamr customers and is used to create a master record of various entities.