The Evolution and Future of Hypervisors
#Hypervisors are a #virtualization technique that powers #cloudcomputing infrastructure like @Amazon #EC2 and @Google #ComputeEngine. Although #container virtualization technology like @Docker and @Kubernetes have taken the spotlight recently, #containers are often deployed on top of hypervisors on the cloud. In this article, we will first outline the architecture of classical trap-and-emulate hypervisors that were invented in the 1970s. We then describe how hypervisors evolved from the 1970s to today’s cloud computing era. Finally, we will look at future trends that affect hypervisor design. (This article was inspired by an awesome talk on Amazon’s Nitro project by Anthony Liguori, which I highly recommend everyone interested in hypervisors and cloud infrastructure to watch.) Architecture A hypervisor is a piece of system software that provides virtual machines (VMs), which users can use to run their OS and applications on. The hypervisor provides isolation between VMs, which run independent of each other, and also allows different VMs to run their own OS. Like other virtualization techniques, hypervisors provide multitenancy, which simplifies machine provision and administration. One of the main criticisms against hypervisors is that they tend to usually be heavy-weight compared to other virtualization techniques like containers (Morabito et al., 2015). However, it’s also possible to build hypervisors that are light-weight (Manco et al., 2017) and also make the guest OS more light-weight when running in under a hypervisor (Madhavapeddy, 2013). A hypervisor can be decomposed into two major parts: the virtual machine monitor (VMM) and the device model. The VMM is responsible for setting up VMs and handling traps (a.k.a VM exits) caused by the guest OS executing privileged instructions like I/O access. The device model, on the other hand, is responsible for implementing I/O interfaces for all the devices like networking cards, storage, and so on, the hypervisor supports. Hypervisor architecture is illustrated in the following diagram. Hypervisor architecture. The hypervisor can be decomposed into two parts: the virtual machine monitor (VMM) and the device model. (The terms hypervisor and VMM are often used interchangeably. However, we refer to hypervisor as the combination of a VMM and a device model.) Virtual machine monitor (VMM) A VMM must satisfy three properties (Popek and Goldberg, 1973): Equivalence property states that program execution has identical observable behavior on bare metal and under VMM, except for timing and resource availability, which are difficult to preserve because of shared physical hardware. Efficiency property states that the majority of program instructions are executed directly on a physical CPU without interference from the hypervisor. Resource control property states that the VMM manages all hardware resources. Virtual machines require permission from the hypervisor to directly access hardware. As a side note, it’s worth noting that emulators satisfy both equivalence and resource control properties, but does not satisfy the efficiency property. The KVM subsystem in the Linux kernel (and other OS’es it has been ported to), for example, provides the building blocks for implementing a VMM. The KVM subsystem is effectively a portable abstraction over CPU hardware virtualization capabilities, which can be leveraged by userspace applications like QEMU to implement a VMM or a full hypervisor. Device Model The device model is the part of a hypervisor, which provides I/O interfaces for virtual machines. While the VMM is responsible for handling traps, it delegates I/O requests to the appropriate device model. Examples of device models are virtualized NICs and storage devices. Device models can either provide interface for a real hardware device or a paravirtualized device. The device model can be implemented either using software, like the virtio family of drivers, or in hardware, using SR-IOV, for example. I/O Virtualization To implement a device model, I/O virtualization is needed. The two approaches for I/O virtualization are software-based and hardware-assisted. Software-based I/O virtualization implements I/O interfaces in software to allow sharing the same physical devices across multiple virtual machines. Software-based I/O virtualization can be implemented on top of various different backends. For example, a software-based storage device can be layered on top of a block device or a filesystem. One issue with software-based approach is that the device model uses the same CPU resources that the vCPUs, which reduces available CPU capacity and causes jitter. Hardware-assisted I/O virtualization implements I/O interfaces in hardware. This approach requires hardware support for sharing the same physical device across multiple virtual machines. SRV-IO is a PCI extension, which allows a physical PCI function to be partitioned into multiple virtual PCI functions. Evolution The semantics of a trap-and-emulate VMM was formalized in the early 1970s (Popek and Goldberg, 1973) and made popular again in the mid-1990s for running commodity OS’es on multicore machines (Bugnion et al., 1997). However, the most popular machine architecture at the time, Intel x86, was not virtualizable because some of it’s privileged instructions did not trap. The VMware hypervisor, which targeted x86, was first released in 1999. It used binary translation to replace privileged instructions to trap into the hypervisor, while still running unprivileged instructions directly on the physical CPU, which solved x86’s virtualization issues (Adams and Agesen, 2006). This allowed the VMware hypervisor to run unmodified commodity OS’es on x86 hardware in virtual machines without the performance penalty of emulation. The Xen hypervisor released first in 2003 took a different approach to solving the x86 virtualization issue. Instead of binary translation, they modified the source code of the guest OS to trap to the hypervisor instead of executing non-trapping privileged instructions. Intel and AMD released x86 CPUs with virtualization extensions in 2005 and 2006, which made classic trap-and-emulate virtualization possible. KVM, initially developed for Linux, implements a kernel subsystem that in combination with QEMU’s device model provides a full hypervisor. Initially, the KVM project provided software-based device model that emulated full hardware devices, but later acquired paravirtualized I/O device model when the virtio device model was introduced. Future The classic hypervisor architecture has stood the test of time but there some trends that affect hypervisor design. Hardware virtualization is becoming more wide-spread. For example, the Amazon Nitro project (talk by Anthony Liguori) takes an unconventional approach to hypervisor design, which replaces all of the software-based device model with hardware virtualization as illustrated in this diagram. Amazon’s Nitro hypervisor also uses a custom designed VMM that leverage’s Linux KVM. Operating systems have also started to evolve to accommodate hypervisors better. Unikernels are an interesting OS design approach that packages the OS and the application into one bundle, which runs in the same CPU protection level (Madhavapeddy, 2013). This eliminates the traditional separation between kernel and user space, which reduces context switch and system call overheads at the expense of losing some OS functionality. The basic idea was already pioneered earlier in the form of library OSes, but the much simpler device model of a hypervisor compared to bare metal made the idea much more feasible for real world use. Light-weight virtualization is becoming more and more important as the use of cloud computing grows. Containers are excellent technology for providing light-weight virtualization. However, containers are unable to provide the full isolation capabilities of VMs, and have various security problems because containers share the same host OS and have access to the large OS system call interface (Manco et al., 2017). Hypervisors can be slimmed down significantly (Manco et al., 2017) and unikernels provide even larger opportunity to optimize the hypervisor if we relax the equivalence property requirement of VMMs (Williams, 2016). Serverless computing is a new computing model, better described as Functions as a Service (FaaS), that allows application developers to deploy functions instead of applications to a managed platform. One approach to serverless computing is to use hypervisors and unikernels for packaging and deploying the functions (Koller and Williams, 2017). Energy efficiency is another important future direction for hypervisor design. Communications technology, which cloud computing is a large part of, is forecasted to consume around 20% of global electricity by 2030, or as much as 50% in the worst case (Andrae and Edler, 2015)! The energy overhead of a hypervisor can be extremely high depending on workload. One experiment reports between 59% and 273% energy overhead for KVM (Jin et al., 2012)! Kernel-bypass networking has become important recently because NICs are getting faster and traditional TCP/IP and POSIX socket abstraction is proving to have high overheads (Han et al., 2012; Young et al., 2014; Yasukata et al., 2016). Hypervisors that implement the device model using I/O paravirtualized effectively introduce another layer to the networking data path, which increases networking overheads. In Linux, the vhost architecture is one solution to the problem. Vhost moves the virtio paravirtualized I/O device model from QEMU (which is the VMM userspace kernel) to the host kernel (which also hosts the KVM module), which eliminates the exit from host kernel to userspace VMM. Another solution is full hypervisor kernel-bypass using hardware NIC virtualization introduced by the Arrakis project (Peter et al., 2014). Summary The hypervisor architecture invented in the 1970s has stood the test of time. The x86 architecture quirks meant that the first successful hypervisors had to resort into binary translation to handle privileged instructions. Binary translation solutions were followed by paravirtualization (popularized by Xen) but hypervisor architectures were consolidated to the classic model as Intel and AMD added virtualization extensions to the x86 architecture. Although containers have recently become a very popular virtualization technique, emerging computing paradigms like serverless computing could make hypervisors an attractive technique again. Light-weight hypervisor designs, unikernels, and hardware-assisted virtualization all reduce hypervisor overheads, which also makes hypervisors more competitive against containers.