Andy Palmer MIT CDOIQ Talk Slides: Data Quality Through Curation at Big Data Scale

team_face2

In his talk at the MIT CDO IQ Symposium on July 23, Tamr Co-Founder and CEO Andy Palmer shared his vision for “Data Quality Through Curation at Big Data Scale.”

Click here to view the slides from this talk

Palmer starts by urging CDOs to “always start” big data analytics with “questions and context” — describing the three flavors of analytics context as 1) Prescriptive, 2) Predictive, 3) Descriptive.

After outlining big data opportunity for enterprises, Palmer defines their “data source problem”: the number and diversity of data sources (private/public, structured/semi-structured, etc.) are exploding.

Multiple approaches have emerged to deal with this Data Variety problem, with the current state dominated by extreme top-down management (95% deterministic to 5% probabilistic). Palmer predicts that the shear number of data sources and complexity of change is going to drive us toward a bottom-up approach (80% probabilistic to 20% deterministic).

“The only viable way to tame enterprise data variety,” he argues, is through bottom-up, collaborative data curation” that complements traditional MDM, ETL, data profiling and data quality methods.

This is the approach that Tamr is taking in building next-generation data curation technology that creates “rich context” for enterprise data variety:

  • Identify relationships across your sources using a machine learning “bottom up” approach
  • Continuous active learning combining machine/human insight
  • Cost effective as you unify more sources – marginal cost of new source = at least linear
  • Deploy context-rich sources for the different LOBs across the enterprise
  • Enterprise metadata catalog – all your attributes, all your sources
  • Services (e.g. APIs) can also be directly deployed in data warehouses/lakes and operational workflows
  • In-situ curation of large sources



Andy is the co-founder and CEO of Tamr. Previously, Andy was co-founder and founding CEO of Vertica Systems, a pioneering big data analytics company (acquired by HP). During his career as an entrepreneur, Andy has served as founding investor, BOD member or advisor to more than 50 start-up companies in technology, healthcare and the life sciences. He also served as Global Head of Software and Data Engineering at Novartis Institutes for BioMedical Research (NIBR) and as a member of the start-up team and Chief Information and Administrative Officer at Infinity Pharmaceuticals. Additionally, he has held positions at Bowstreet, pcOrder.com, and Trilogy.