Not too many people know about the intricacies of virtualization CPU scheduling and its impact on the performance of the VMs, so application owners out there – listen up! I’ve written about Ready Time (VMware ESXi) / Wait Time Per Dispatch (Microsoft Hyper-V) in the past, but a different challenge arises with VMs that have large vCPU count footprints. It’s called CPU Co-Stop, and it can devastate your application’s performance in a VM.
Be warned: I have not found an equivalent counter in Microsoft’s Hyper-V platform, so if anyone knows about a counter that is similar, please let me know. This system state is sure to exist in every virtualization platform, and is a possible performance killer.
You’ll need access into VMware’s vCenter Server to see this metric. It’s not visible from inside the VM. At least read-only access is advised, and if you’re a DBA, you need access into this layer anyway so you can better do your job.
First of all, a hypervisor metric called Ready Time / Wait Time Per Dispatch indicates that the hypervisor is queuing up the VM’s requests for executing tasks on the physical CPUs. The amount of time taken in the vCPU scheduling queue during an individual scheduling queue is measured, and the percentage performance hit on that vCPU is measured and reported. This metric is fairly easy to understand if you are given the math and explanation. But, what about the scheduling of a widely parallelized process or task, such as a database query that is executed on all vCPUs? Is it different?
In early versions of the hypervisors, if you had multiple vCPUs on a VM and executed a task that needed all available vCPUs to process a task, the host would need to each physical CPU free before that task would be executed in parallel. It was very strict, and it meant that the performance overhead was exceptionally high. Newer versions of the hypervisor have a relaxed CPU ‘co-scheduler’, where the CPU queues for a VM’s vCPUs might not execute exactly in parallel, but that the performance impact on the source VM and background VMs on the same host is not as great. However, as a result of this overhead, the overall time taken to execute that task is limited by the slowest physical CPU, or most bottlenecked CPU queue, in the group. Even with relaxed co-scheduling, sometimes all the vCPUs need to be scheduled to run simultaneously, and this is where Co-Stop comes in to play.