This morning, at their I/O Conference, Google revealed that they’re not using Map-Reduce to process data internally at all any more.
The truth is that Map-Reduce as a processing paradigm continues to be severely restrictive, and is no more than a subset of richer processing systems.
... ever since Dryad, in 2007 (at least), it was clear to me that Map-Reduce’s days were numbered.
Google has abandoned MapReduce, the system for running data analytics jobs spread across many servers the company developed and later open sourced, in favor of a new cloud analytics system it has built called Cloud Dataflow.
“We don’t really use MapReduce anymore,” Hölzle said in his keynote presentation at the Google I/O conference in San Francisco Wednesday. The company stopped using the system “years ago.”
Yesterday, at Google I/O, you got a sneak peek of Google Cloud Dataflow, the latest step in our effort to make data and analytics accessible to everyone. You can use Cloud Dataflow:
for data integration and preparation (e.g. in preparation for interactive SQL in BigQuery)
to examine a real-time stream of events for significant patterns and activities
to implement advanced, multi-step processing pipelines to extract deep insight from datasets of any size
You will always get what you always got If you always do what you always did