Thanks to Ryan Baker for writing an in-depth explanation of this issue.
vVols use sub-LUNs with the Protocol Endpoint being the regular disk device which the ESXi host see. A problem arises when QoS is added into the mix.
When a VM hits its QoS limit the array with start throttling the IOs, as it should, but when the ESXi host sees this it thinks the array is overloaded and reduces the queue length for the array. The device queue towards the array on that ESXi host is filled by the IOs from the VM hitting its QoS limits. Causing increased latency for all other VMs with that ESXi host and array combination regardless of their QoS limit since they share the same device queue.
The default setting for ESXi with DSNRO is to reduce the queue length to 32 when it sees congestion on a device and the default queue size for a VM is 64 so a single VM can easily fill that.
To solve this, set DSNRO higher than what a single VM can queue.
For example, set HBA queue to 256:
esxcli system module parameters set -m iscsi_vmk -p iscsivmk_LunQDepth=256
Reboot so it takes effect. You can not set the DSNRO higher than the HBA queue.
Afterwards set DSNRO to 256 for the device:
esxcli storage core device set -d naa.xxxxxxxxxxxxxxxxx -o 256
This is set on each ESXi host and on each device. I suggest scripting it if you have a large environment.
If you keep the VMs at the default queue of 64 in their OS you can now have 3 VMs hitting their QoS limit on the same ESXi host and array without them causing latency for other VMs.