Each FC storage array port has a maximum queue depth of 2048. For performance reasons we’ll have to do the math with 1600. Suppose a large number of HBAs (initiators) are generating IOs, a specific port queue can fill up to the maximum. The host’s HBA will notice this by getting queue full (QFULL) messages and very poor response times. It depends on the Operating system how this is dealt with. Older OSs could loose access to it’s drives or even freeze or get a blue screen. Modern OSs will throttle IOs down to a minimum to get rid of this inconvenience. VMware ESX for example decreases it’s LUN queue depth down to 1. When the number of queue full messages disappear, ESX will increase the queue depth a bit until it’s back at the configured value. This could take up to around a minute.
During the QFULL events the hosts may experience some timeouts, even if the overall performance of the CLARiiON is OK. The response to a QFULL is HBA dependent, but it typically results in a suspension of activity for more than one second. Though rare, this can have serious consequences on throughput if this happens repeatedly.
Some Operating Systems and (HBA) drivers can set a ceiling on the queue depths per LUN. This is commonly referred to as the target queue depth. VMware ESX limits the queue depth on a per LUN basis for each path.
An EMC CLARiiON (as well as many other storage arrays) will return a QFULL flow control command under the following conditions:
1 The total number of concurrent I/O requests on the Front-End FC port is greater than 1600.
2 The total requests for a LUN is greater than it maximum queue depth (32+(14*LUN’s data drive quantity)). For example, for a 5 drive RAID 5 (4+1), maximum queue depth=32+4*14=88
The HBA execution throttle thresholds on the hosts may be set at too high a value (such as 256).
On a Windows machine using QLogic HBA’s use the SANsurfer utility to change the Execution Throttle for each HBA. This can be done on-line. In new versions of SANsurfer, the Execution Throttle is found on the “Advanced HBA Settings” – select a HBA port, then Parameters, then on the “Select Settings section” drop down. The default setting for Execution Throttle in an EMC environment is 256 – if this is higher than 256, then change to 256; if the setting is 256 try lowering it to 32.
The same target queue length restrictions apply to all other HBA makes and models. With Emulex, these settings could be changed used HBA Anywhere.
In order to avoid hammering the storage processor FE FC ports, you can calculate the maximum queue depth using a combination of the number of initiators per Storage Port and the number of LUNs ESX uses. Other initiators are likely to be sharing the same SP ports, so these will also need to have their queue depths limited. The math to calculate the maximum queue depth is:
QD = 1600 / (Initiators * LUNs)
QD = the required Queue Depth or Execution Throttle, which is the maximum number of simultaneous I/O for each LUN any particular path to the SP.
Initiators = the number of initiators (HBAs) per Storage Port, which is normally equivalent to the number of ESX hosts, plus all other hosts sharing the same SP ports.
LUNs = the quantity of LUNs for ESX which are sharing the same paths, which is equivalent to the number LUNs in the ESX storage group.
Two ESX parameters should be set to this Q value. These are the queue depth of the storage adapter and “Disk.SchedNumReqOutstanding.” Most of the time, the “Disk.SchedNumReqOutstanding” is set to a lower value than the HBA queue depth in order to prevent any particular virtual machine from completely filling up the HBA queue and therefor not allowing other HBAs to perform any IO requests. If this is currently the case in the ESX environment, these settings should evenly be decreased. For example, if the HBA queue depth is 64 and “Disk.SchedNumReqOutstanding” is 32 (the default setting), then reducing to reduce the QFULL, the HBA queue depth could be set to 32 and ‘Disk.SchedNumReqOutstanding’ set to 16.
For example, a farm of 16 ESX servers have four paths to the CLARiiON (via two HBA each) and these FC ports are dedicated for use by ESX (which makes keeping queue depths under control easier). There are multiple storage groups in this example to keep each ESX servers boot LUN private, but each storage group has 5 LUNs.
This leads to the following queue depth:
QD = 1600 / (16 * 5) = 20
In practice a certain amount of over-subscription would be fine because all LUNs on all servers are unlikely to be busy at the same time, especially if load balancing is used. So in the example above, a queue depth of 32 should still not cause QFULL events under normal circumstances.
Also see the following VMware knowledge base articles: