From ‘Junk Drawers’ to the ‘Long Tail’

Screen Shot 2015-08-28 at 7.39.25 AM

Check out this terrific WSJ CIO Journal column by Randy Bean, CEO and managing partner of NewVantage Partners, on his recent sit down with Tamr CTO and Co-Founder, Mike Stonebraker: “Making the Case for the ‘Long-Tail’ of Big Data.”

Randy offers as clear and concise a definition of the Big Data challenge as we’ve seen …

For most large companies, Big Data is less about managing the “volume” of data they have, and much more about integrating the wide “variety” of data sources that are available to them – which can include data from legacy transaction systems, behavioral data sources, structured and unstructured data, and all sizes of data sets.

… and then lets Mike make the case that neither data lakes (“junk drawers” of un-curated data) nor top-down global data models (“fantasy”) are sufficient for managing variety at scale. Instead, Randy writes, Mike envisions that:

… the future of data management lies in “data curation … aimed directly at the ‘long tail’” – the hundreds or thousands of data silos not captured within the traditional data warehouse, and which can only be captured and integrated at scale by applying automation and machine-learning based on statistical patterns.

This is of course exactly the approach that Mike and his fellow researchers laid out in their 2013 paper “Data Curation at Scale: The Data Tamer System” — and that became Tamr’s core design pattern.

Give Randy’s CIO Journal column a read for all of Mike’s thoughts on why the ‘long tail’ is where game-changing complex data analysis will occur — “and change the Big Data landscape.”