Increased response times on VNX when using Windows 2012

Windows 2012 can cause higher response times on VNX

When Windows 2012 issues Trim or Unmap commands to thin LUNs on a VNX, the Storage Processor response times can increase or may initiate a bugcheck.

As part of disk operations to reclaim free space from thin LUNs, Windows 2012 Server can issue large numbers of the SCSI command 0x9E/0x12 (Service Action/Get LBA Status). This SCSI command results in what is called a “DESCRIBE_EXTENTS” I/O on the VNX Storage Processor (SP.) These commands are used as part of the Trim/Unmap process to see if each logical block address (LBA) that has been freed up on the host’s file system is allocated on the VNX thin LUN. The host would then issue Unmap SCSI commands to shrink the allocated space in the thin LUN, thus freeing up blocks that were no longer in use in the file system. RecoverPoint also issues these same SCSI commands when the Thin LUN Extender mechanism is enabled, which can cause similar performance issues. See knowledge base article KB174052 for more information about the RecoverPoint variation of this issue and how to prevent it.

Windows 2012 appears to remain in the initial range of LBA for its “Get LBA Status” requests and does not accomplish the objective of the Trim/Unmap operation. This has been corrected in Windows 2012 R2. The repeated requests generate a high load on the SP CPU increasing proportionately with the number of volumes/LUNs that have this Trim/Unmap operation in progress at the same time. In some cases, the performance degradation can be so severe that it can cause one or both storage processors to reboot with a bugcheck. One tell-tale symptom for this particular bugcheck is that it most often occurs at or very near 3:00 AM, which is a default time when Windows 2012 server may perform Trim/Unmap operations.

If you suspect you may be seeing these symptoms, you should open up a ticket with VNX support to investigate fully. See knowledge base article KB182688 for extensive information about this known issue. The knowledge base solution offers a workaround that involves disabling the feature that is triggering the Trim/Unmap operation, and that needs to be applied to all Windows 2012 server hosts attached to the storage system. You can also avoid this issue by exclusively using Thick LUNs as opposed to Thin LUNs. A VNX Block OE remediation for this issue is planned for the next major release within the R33 family of VNX Block OE. A timetable for a fix within the R32 code family has not yet been committed. For further questions, please open a ticket with support.

 

Original document: Q2 EMC Uptime Bulletin

  1. You need to be extra careful with RecoverPoint though. This issue hit us even though we only use Thick LUNs because RecoverPoint still issues the commands. Fortunately it can be disabled from the RecoverPoint side of things.

    And thanks for this… If only you had published two months ago we could have saved some grief.

Would you like to comment on this post?