NEVE: Nested Virtualization Extensions for ARM
Jin Tack Lim, Christoffer Dall, Shih-Wei Li, Jason Nieh, and Marc Zyngier*
*
Nested Virtualization
VM
Nested VM Nested VM
App App App App
VM
Kernel Kernel App App
Guest Hypervisor Kernel
Host Hypervisor
Hardware
Nested Virtualization
VM
Nested VM Nested VM
App App App App
• Run your own VM in public clouds VM
Kernel Kernel App App
Guest Hypervisor Kernel
Host Hypervisor
Hardware
Nested Virtualization
VM
Nested VM Nested VM
App App App App
• Run your own VM in public clouds VM
Kernel Kernel App App
Guest Hypervisor Kernel
Host Hypervisor
• Run OSes which have built-in
hypervisors in a VM
Hardware
Key Problem
• No nested virtualization support on current hardware ARMv8.0
• Nested virtualization supported in future hardware ARMv8.3
• Nested virtualization performance on ARM is unknown
• ARM hardware virtualization support different from x86
Key Contributions
• Introduced paravirtualization for architecture evaluation
• Evaluated nested virtualization performance
• Proposed a new architecture extension, NEVE
• NEVE improves performance up to 10x
• NEVE is included in the next ARM architecture, ARMv8.4
Evaluation Challenges
• No ARMv8.3 hardware, No idea about performance
• ARMv8.0 is the latest hardware publicly available
• Long development cycles
Architecture A few years.. Hardware
Evaluation
Design Release
Current Approaches
• Cycle-accurate simulators
• Costly, too slow and lack of device support
• Simpler architecture models, e.g. ARM Fast Models
• Provides only correct hardware functionality, not performance
Paravirtualization for Architecture Emulation
• Possible if existing hardware has instructions to mimic new
architecture features
• Architectural features for virtualization often involve traps
Paravirtualization for Emulation of ARMv8.3
ARMv8.3
VM
Instructions that
do trap
Guest Hypervisor
Trap
Host Hypervisor
New Hardware
Paravirtualization for Emulation of ARMv8.3
ARMv8.0
VM
Instructions that
don’t trap
Guest Hypervisor
Host Hypervisor
Existing Hardware
Paravirtualization for Emulation of ARMv8.3
ARMv8.0
VM Paravirtualized
Instructions that
do trap
Guest Hypervisor
Trap
Host Hypervisor
Existing Hardware
Benefits
• Makes possible to evaluate new architecture features with real
workloads on real hardware
• Allows co-design and rapid prototyping of SW and architecture
• Make development cycles short
Architecture
Evaluation
Design
Implementation
• Designed and implemented KVM/ARM Nested Virtualization
• First ARM hypervisor supporting nested virtualization
• Similar approach to Turtles [OSDI 2010] - KVM on x86
Application Workloads
Application Description Application Description
Kernbench Kernel compile Netperf TCP_RR Network performance
Hackbench Scheduler stress Netperf TCP STREAM Network performance
SPECjvm2008 Java Runtime Netperf TCP MAERTS Network performance
MySQL Database management Apache Web server stress
Memcached Key-Value store Nginx Web server stress
Experimental Setup
• ARM • Native/VM/Nested VM Setup • Software
• APM X-Gene • 4-way SMP • KVM on KVM
(ARMv8.0)
• Virtio • PV ARMv8.3
• x86 (VM/nested VM)
• Intel E5-2630 v3
Normalized overhead
(lower is better)
Application Benchmarks
45 ARMv8.3 VM
40 ARMv8.3 Nested VM
x86 VM
35 x86 Nested VM
30
25
20
15
10
0
ch c h 0 8 R R A M T S h e i n x e d Q L
n ben k b en
m 2 0
C P T RE A E R p a c Ng
c ach MyS
e r a c C jv T S M A m
K H S P E TC
P
TC
P Me
Nested Virtualization
Why is it so slow on ARMv8.3?
ARM Virtualization Extensions
VM
EL0 App App App
EL1 OS Kernel
EL2 Hypervisor
ARM Virtualization Extensions
VM
EL0 App App App
EL1 OS Kernel EL1 System Registers
EL2 Hypervisor EL2 System Registers
ARM Virtualization Extensions
VM
EL0 App App App
EL1 OS Kernel TTBR0_EL1
EL2 Hypervisor TTBR0_EL2
ARM Virtualization Extensions
VM
EL0 App App App
EL1 OS Kernel EL1 System Registers
EL2 Hypervisor EL2 System Registers
Nested Virtualization on ARM
VM
Nested VM
App App App
EL0
EL1 OS Kernel
Guest Hypervisor
EL2 Host Hypervisor
Nested Virtualization on ARM
VM
Nested VM
App App App
EL0
EL1 Nested VM Exit
OS Kernel Nested VM Entry
Guest Hypervisor
EL2 Host Hypervisor
Trap Trap
Nested VM Entry on ARM
VM
Nested VM
App App App
EL0
EL1 Nested VM Exit
OS Kernel Nested VM Entry
Guest Hypervisor
…
EL2 Host Hypervisor
Trap Trap Trap TrapTrap … Trap Trap Trap
Exit Multiplication
• A single exit from the nested VM leads to lots of traps
• It slows down ARM nested VM performance badly
• x86 has this problem, but not bad as ARM
NEVE: NEsted Virtualization Extensions
for ARM
• Supports unmodified guest hypervisors and OSes
• Improves performance of nested virtualization
• Provides two techniques to avoid traps based on register
classification
Register Classification
• VM registers, which affect VM execution
• Hypervisor control registers, which affect hypervisor execution
VM Registers
VM
Nested VM
App App App
EL0
EL1 OS Kernel
VM Exit VM Entry
Guest Hypervisor EL1 Registers
…
EL2 Host Hypervisor
This is when VM register states are used
VM Registers: Redirection to Memory
• NEVE redirects VM register access instructions to memory
• On nested VM entry, the host hypervisor can get VM register
states from memory
VM Registers
Guest Hypervisor
Memory
VM Registers: Redirection to Memory
• NEVE redirects VM register access instructions to memory
• On nested VM entry, the host hypervisor can get VM register
states from memory
VM Registers
Guest Hypervisor
Memory
Hypervisor Control Registers
• The hypervisor accesses them to control execution
• EL2 registers
• Can’t apply the technique for VM registers
• Traps are handled by redirecting to EL1 registers in software
EL1 Guest Hypervisor EL1 Registers
EL2 Host Hypervisor EL2 Registers
Hypervisor Control Registers
• The hypervisor accesses them to control execution
• EL2 registers
• Can’t apply the technique for VM registers
• Traps are handled by redirecting to EL1 registers in software
• Redirect in hardware instead!
EL1 Guest Hypervisor EL1 Registers
EL2 Host Hypervisor EL2 Registers
NEVE Evaluation
• NEVE is a new architecture extension, but no hardware
• Use paravirtualization for architecture evaluation
• Memory redirection emulation
• Register access instructions -> load/store instructions
• Register redirection emulation
• EL2 register access instructions -> EL1 register access instructions
Application Workloads
Application Description Application Description
Kernbench Kernel compile Netperf TCP_RR Network performance
Hackbench Scheduler stress Netperf TCP STREAM Network performance
SPECjvm2008 Java Runtime Netperf TCP MAERTS Network performance
MySQL Database management Apache Web server stress
Memcached Key-Value store Nginx Web server stress
Normalized overhead
(lower is better)
Application Benchmarks
45
ARMv8.3 Nested VM
40 NEVE Nested VM
x86 Nested VM
35
30
25
20
15
10
0
nc h n ch 0 0 8 R R A M T S c h e i n x e d Q L
be k be m 2 C P T RE A E R p a Ng ach MyS
ern a c C j v T S M A m c
K H S P E TC
P
TC
P Me
Conclusions
• Introduced paravirtualization for architecture evaluation
• Built the first ARM hypervisor supporting nested virtualization
• Nested virtualization on ARMv8.3 performs poorly
• NEVE improved performance up to 10x
• NEVE is included in the next ARM architecture, ARMv8.4