Stonebraker on … Hadoop in Managing Big Data


Forrester sat down recently with Tamr Co-Founder and CEO — and 2014 A.M. Turing Award winner — Mike Stonebraker for a wide-ranging conversation covering “data and innovation issues confronting today’s enterprises.”
Today’s topic: The role of the Hadoop stack in managing big data in the enterprise.
As Mike told Forrester,
Historically, Hadoop was the open-source version of MapReduce running on top of HDFS, with HIVE or PIG above that. … In 2009, we wrote a paper saying MapReduce is ridiculous for two reasons.
Number one, Hive equals SQL unless you squint. … If you’re doing SQL, the last thing you want is MapReduce as an interface.
[Number two,] unless you have something that’s embarrassingly parallel, you don’t need MapReduce. … But only about five percent of the problems that anybody’s interested in are embarrassingly parallel. So, basically, MapReduce is this insignificant little corner case and is a terrible internal interface for a higher-level system.
For Mike’s complete Q&A with Forrester, downloaded the report here (subscription or fee is required).