Written by Sohaiyla Khalili
Tamr’s human-guided machine learning platform has an exciting new feature, active learning for categorization, that will increase the accuracy and efficiency of categorization projects by highlighting high impact entities to categorize. A categorization project solves the task of placing records into categories. It is a top-down organizational project designed to classify individual records into a collection of hierarchical categories, referred to as a taxonomy.
Human labeling and categorization is a cumbersome problem which requires a huge amount of time and compromises the accuracy of the output due to human error. A huge Tamr advantage with categorization is the ease of multi-user collaboration and the benefit that comes from humans training the machine learning models. As part of the workflow of a categorization project, users are able to collaborate and iterate on the categorization or taxonomy.
Since the human users are the subject matter experts when it comes to the input data, Tamr requires a percentage of the entities to be categorized by humans in order to teach Tamr how to categorize the remaining data or subsequent data added. Active learning eliminates the need for the user(s) to provide a balanced amount of training examples for each category in the dataset; it removes the uncertainty/iterations on how much training the model needs and which category needs it to accelerate model training.
The screenshot shown below shows the menu of possible filters you can apply on the categorized data within Tamr Unify. Users can collaborate and accelerate model training by using a combination of filters.
Let’s focus on the filter names “high impact”. When this filter is selected, Unify produces simple high-impact questions regarding whether or not certain records, that are representative of a large portion of the unified dataset records, are categorized appropriately. Reviewer(s) then give their feedback–driving accuracy and enhancing future automation. High impact entities are denoted with the lightning bolt symbol. A screenshot of what the UI looks like once the “high impact” filter is selected is shown below.
This active learning for categorization feature which highlights high impact entities is game-changing for categorization projects. This allows the user(s) to easily see what data needs their attention and automatically balance out the training by selecting strong representations of the data to categorize. The time savings and increased confidence in the machine learning model is invaluable.