Posted by on
Categories: Hadoop MapReduce Spark VMware Yarn

The new Deployment Guide for Virtualizing #Hadoop on #VMware vSphere describes the technical choices for running Hadoop and #Spark -based applications in virtual machines on vSphere. Innovative technologies and design approaches are appearing very regularly in the big data market; the pace of innovation has not slowed down for sure! A prime example of this innovation is the rapid growth in Spark adoption for serious enterprise work over the past year or so, overtaking MapReduce as the dominant way of building big data applications. Spark holds out the promise of faster application execution times and easier APIs to use to build your application. A lot of innovation work is now going into optimizing the streaming of large quantities of data into Spark, with an eye to the large data feeds that will appear from connected cars and other devices in the near future. This new version of the VMware Deployment Guide for Hadoop on vSphere brings the information up to date with developments in the Spark and YARN (“Yet Another Resource Negotiator”) areas. The YARN technology is the general name for the updated job scheduling and resource management functions that have now become mainstream in Hadoop deployments. The older MapReduce-centric style, once the central resource management scheduler in Hadoop, is now relegated to just another programming framework. MapReduce is still used for Extract-Transform-Load (ETL) jobs, running in batch mode on a common resource management and scheduling platform (YARN) – but now, to a large extent, MapReduce is no longer the dominant paradigm for building applications. Spark is seen as much more suited to interactive queries and applications. Spark also runs as an example of another application framework on YARN, and that combination is popular in enterprises today – and so it is the focus of much of our testing currently, as you will see. Spark runs in standalone mode outside of the YARN resource manager context too, but that option is out of scope for the current Deployment Guide, as we see that less often within enterprises today. Of course, that may change in the future.

Leave a Reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.