New Paper from the IEEE Technical Committee on Data Engineering “Data Integration: The Current Status and the Way Forward”

Tamr was born out of academia and we pride ourselves on continuing this relationship with top notch universities as we have grown into a mature company. The relationship is mutually beneficial as we receive cutting edge research from the best schools like MIT and University of Waterloo. In return, the universities learn from Tamr’s work in the field done with Fortune 500 companies like GE, GSK, and Thomson Reuters. The results of this simbiosis are compelling and are now available from in a new academic paper co-written by 2014 Turing Award Winner, MIT Professor, and Tamr CTO, Dr. Michael Stonebraker and Tamr Co-Founder and Waterloo Professor of Computer Science Ihab Ilyas.

The paper is titled “Data Integration: The Current Status and the Way Forward” and featured in the June issue of the Data Engineering Bulletin published by the IEEE Technical Committee on Data Engineering. According to the two authors the paper uses multiple real customer examples to:

“highlight the technical difficulties around building a deployable and usable data integration software that tackles the data silos problem. We also highlight the practical aspects involved in using machine learning to enable automating manual or rule-based processes for data integration tasks, such as schema mapping, classification, and deduplication”.

The paper goes on to break out “future directions and challenges” that are likely to have a big impact in building end-to-end scalable data integration in the large enterprise settings. These include:

  • High Provisioning and Tuning Cost
  • Data Discovery
  • Human Involvement
  • Batch vs. Exploratory Workflows
  • Model Reusability

You can access the full paper by these two data integration luminaries compliments of Tamr here.