451 Research Report: “Preparing for the new guard in data preparation”

“Data preparation involving the integration, cleansing and enrichment of data so that it is ready for analysis is a tedious and time-consuming process, and it can occupy up to 80% of an analytical development effort. Because it is vital to analytics, given the ‘garbage in garbage out’ maxim, data preparation can’t be circumvented. But it can be made easier, more efficient and quicker. That’s the promise behind a new breed of data-preparation offerings that are coming online to revitalizing this piece of the data management sector, which hasn’t been a focus for any real innovation for some time.”

“Tamr is a key startup in the data-preparation sector. Like Paxata, it has taken a machine-learning-based approach. However, Tamr is specifically focusing on the attribute mapping and recording matching process for semi-structured and structured data with the endgame of easing integration and cleansing tasks for data scientists and other technical folks involved in analytics projects. Tamr refers to its offering as a data-curation service. It is available on-premises or in the cloud and doesn’t rely solely on machine learning. The startup has a number of other technical smarts involved in presenting users with integrated data and metadata that is already related, including a homegrown triple-store database to accommodate a wide variety of independently constructed data, as well as schema-less data, dirty or missing data, and information that is poorly formatted.”