To Infinity, and Beyond! – Scaling Your Hadoop Infrastructure
In this special guest feature, Tom Lyon, Chief Scientist at #DriveScale, describes how to run demanding analytics applications and/or 1000+ node Hadoop workloads on commodity servers and storage. Tom is a computing systems architect, a serial entrepreneur and a kernel hacker. He most recently co-founded DriveScale, a company that is pioneering flexible, scale-out computing for the enterprise using standard servers and commodity storage. He received a B.S. in Electrical Engineering and Computer Science from Princeton University. Tom was also a founder at #NuovaSystems (sold to #Cisco) and #IpsilonNetworks (sold to #Nokia). Additionally, as employee #8 at #SunMicrosystems, Tom made seminal contributions to the UNIX kernel, created the SunLink product family, and was one of the NFS and SPARC architects. So, you’ve had your Hadoop cluster for a while. You’ve got maybe 50 to 100 nodes running stably, you’ve got some mastery of the analytics frameworks – whether #Spark or #Flink or good old #MapReduce. You’ve been able to demonstrate real business value from your cluster and you’re ready to take it to a whole new level with lots more data and a lot more applications and users. The hardware for your cluster was probably not a big concern as you dove into #Hadoop, so you went with the typical racks of commodity servers, each with 12 or 24 hard drives. It works, so why think about different hardware? Well, because as your cluster size approaches many hundreds of nodes, it will certainly be the biggest cluster in your data center, and may even become the majority of your compute infrastructure. At this scale, inefficiencies caused by poorly balanced resources can add up to a lot of wasted time, money, power, heat, and space!