HyperBench is a set of micro-benchmarks for analyzing how much hardware mechanisms and hypervisor designs support virtualization. We designed and implemented HyperBench from ground up as a custom kernel. It contains 15 micro-benchmarks currently covering CPU, memory system, and I/O. These benchmarks cause various hypervisor-level events, such as transitions between VMs and the hypervisor, two-dimensional page walk, notification from front-end to back-end driver. HyperBench is aimed at quantifying those costs.
For information, please read:
Wei S, Zhang K, Tu B. HyperBench: A Benchmark Suite for Virtualization Capabilities[C]//Abstracts of the 2019 SIGMETRICS/Performance Joint International Conference on Measurement and Modeling of Computer Systems. ACM, 2019: 73-74.
Wei S, Zhang K, Tu B. HyperBench: A Benchmark Suite for Virtualization Capabilities[J]. Proceedings of the ACM on Measurement and Analysis of Computing Systems, 2019, 3(2): 24.
Download HyperBench into the directory /opt/.
# cd /opt
# git clone https://github.com/Second222None/HyperBench.git
After modification, type the following instruction directly.
# make
HyperBench measures CPU cycles using the RDTSCP instruction. Sometimes Time Stamp Counter clock and CPU core clock are different. What’s more, to save energy the CPU chip adjusts the CPU core frequency dynamicly. Before starting HyperBench, you would better pin the CPU at a fixed frequency to avoid the error. Typing the following command multiple times to determine the CPU frequency is stable.
cat /proc/cpuinfo | grep MHz | uniq
cat /proc/cpuinfo | grep constant_tsc
menuentry 'HyperBench'{
insmod part_msdos
insmod ext2
set root='hd0,msdos1'
multiboot /boot/hyperbench.64
}
Enter HyperBench directory and run the following script.
# qemu-system-x86_64 -enable-kvm -smp 2 -m 4096 -kernel out/hyperbench.32 -nographic | host/host
# ./script/xen | host/host
Idle benchmark performs two consecutive reads of the time counter. It is used to check the stability of the measurement results. Ideally, the result is zero.
Store the corresponding register value into memory repeatedly.
The PUSHF and POPF instructions execute alternately on the current stack. The time between the first PUSHF instruction and the last POPF instruction is measured.
Read the current value of the register during the initialization phase and load the value into the corresponding register repeatedly in the test phase.
Execute an instruction in the VM which leads to a transition to the hypervisor and return without doing much work in the hypervisor.
Issue an IPI from a CPU to another CPU which is in the halt state. IPI benchmark measures the time between sending the IPI until the sending CPU receives the response from the receiving CPU without doing much work on the receiving CPU. In the virtualized environment, this benchmark emulates an IPI between two VCPUs running on two separate physical CPUs (PCPUs).
Read many different memory pages twice and the time of the second memory access is measured. The default guest page size is 4KB.
This benchmark reserves a large portion of memory that has never been accessed before and performs one memory read at the start address of each page. The reading over different pages eliminates TLB hits due to the prefetcher, as the prefetcher cannot access data across page boundaries. The default guest page size is 4KB.
Map the whole physical memory 1:1 to the virtual address space. This benchmark creates a lot of page table entries, which is a frequent operation in heavy memory allocation. The default guest page size is 4KB.
……
Polling and interrupt are two main approaches for notifications from host to guest. This benchmark reads the register of the serial port through the register I/O instructions repeatedly, which emulates the polling mechanism.
OUT benchmark outputs a character to the register of the serial port repeatedly.
This benchmark outputs a string to the serial port through the I/O address space, which is handled through the string I/O instructions.
HyperBench kernel is designed as a standalone kernel. It can run as a test VM on various hypervisors and run directly on bare-metal.
Linear-Address Translation to a 2-MByte Page using 4-Level Paging.