Posted by on
Categories: Hadoop Hive Hortonworks MapR Pig Verizon Yahoo

#Yahoo put its massive #Hadoop investment on display this week at #Dataworks Summit, the semi-annual big data conference that it co-hosts with #Hortonworks. While Hadoop is no longer the conference headliner that it once was, the platform is still critical for the daily operations of Yahoo, which officially became part of #Verizon Communications this week when the $4.5 billion acquisition finally closed. With 120,000 servers and 800 PB of in storage, few companies have the computing scale of Yahoo. And as the birthplace of this distributed computing platform called Apache Hadoop, it’s worth keeping an eye on what Yahoo is doing with its collection of big data tech. Sumeet Singh, the senior director of cloud and big data platforms at Yahoo, took to the Dataworks Summit stage on Wednesday to describe how the technological makeup of Yahoo’s massive cloud platform has evolved over the years. For starters, the company is moving solidly away from #MapReduce. Over the past 17 months, Tez has replaced MapReduce as the underlying engine for many of the batch-oriented #Pig and #Hive workloads that Yahoo relies on to serve its 1 billion monthly users. Today, 70% of the Hadoop workloads and Yahoo run under Tez, according to Singh. Use of Apache Spark has also grown, but not nearly as quickly as Tez, he says. Singh referred to this switch from MapReduce to Tez and Spark as “compute shaping.” “What compute shaping does, it allows us to make better use of the platform,” he says. “This is fantastic for the company and our customers because they can make better use of the capacity.”

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.