How to Clean Noisy and Erroneous Big Data Using Machine Learning

Practical Advice from Both Academic and Commercial Applications   Data unification/deduplication and repair are proving to be difficult for many organizations. In fact, data unification and cleaning account for about 60% to 70% of the work of data scientists. It’s…

Read More


Three Enablers For Machine Learning In Data Unification: Trust, Legacy, And Scale

Note: This article was originally posted on the O’Reilly website. Data unification is the process of combining multiple, diverse data sets and preparing them for analysis by matching, deduplicating, and otherwise cleaning the records (Figure 1). This effort consumes more…

Read More


From Data Variety to Data Opportunity

tamr data curation insights image

Among its well-known challenges, we are getting better and better at handling the volume aspect of Big Data; we buy more machines, we “shard” tables, and we even port solutions to clusters and MapReduce platforms.  But it is the “Variety”…

Read More