Connect: Align Your Datasets Into A Unified Schema
Within the “connect” phase, Tamr aligns all relevant source dataset attributes to a unified schema that is most effective and relevant for project goals. Human-guided machine learning is employed to union these datasets and offers a significant improvement in speed and scale as compared to traditional methods that rely on developers creating hard-coded rules.
Clean: Identify Unique Entities Within The Unified Dataset
Tamr’s “clean” phase deduplicates and masters the entities within the unified dataset. The platform automates this challenge with machine learning and ensures high levels of accuracy by capturing and incorporating the expertise of data stewards. The core output of this phase is a pipeline that delivers a unified dataset containing mastered entities to feed downstream analytical and operational uses.
Classify: Categorize The Unified Dataset Records To Any Taxonomy
Once a clean, unified dataset of a particular entity has been produced by Tamr, the user has the option of “classifying” the records to a company-specific or commonly used taxonomy for more in-depth analytic capabilities downstream. Tamr’s classify phase operates in the same manner as the connecting and cleaning phases do, leveraging the product’s unique blend of human-guided machine learning to rapidly and accurately categorize records to the deepest levels of a provided taxonomy.