Data Mastering at Scale

Data mastering (sometimes called Master Data Management or MDM for short) is now 15 years old. It arose because enterprises have been creating independent business units (IBUs) for a long time with substantial freedom of action. This allows IBUs to…

Data Stewardship in the Age of Machine Learning

Suppose you are a data steward, responsible for integrating a collection of data sources, S1, …, Sn.  Historically, you would perform the following steps: Have your best programmer define a global schema GS,  which the various sources will accommodate. Have…

Scalable Data Integration: Five Tenets for Success

By Michael Stonebraker, Tamr Co-Founder and CTO Introduction Data curation involves: ingesting data sources, cleaning errors from the data (-99 often means null), transforming attributes into other ones (for example, Euros to dollars), performing schema integration to connect up disparate…

