Data Stewardship in the Age of Machine Learning

Suppose you are a data steward, responsible for integrating a collection of data sources, S1, …, Sn.  Historically, you would perform the following steps: Have your best programmer define a global schema GS,  which the various sources will accommodate. Have…

Read More


Scalable Data Integration: Five Tenets for Success

By Michael Stonebraker, Tamr Co-Founder and CTO Introduction Data curation involves: ingesting data sources, cleaning errors from the data (-99 often means null), transforming attributes into other ones (for example, Euros to dollars), performing schema integration to connect up disparate…

Read More