Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Technical Instruction Sure Solution 1010555.1 : I/O Timeouts Result in Unexplainable Delay
PreviouslyPublishedAs 214514 Description Customer generates I/O with dd if=/dev/rdsk/c3t0d0s0 of=/dev/null bs=32k and watches the I/O progress with 'iostat -xtcn 1' to verify when I/O starts and stops. He has the following topology Sun280r---[Cisco MDS Switch]===3 Inter Switch Links===[Cisco MDS Switch]---HDS 9970v running Solaris[TM] 8 108528-27 on a Sun fire 280R. While the I/O is running, the customer pulls one of the links between the two MDS switches. This will cause a loss of some frames (those on the wire). The customer watches the I/O stop and restart using the iostat command. When the link is pulled, some frames will be lost as they are 'on the wire'. This will result in an I/O timing out, and a pause of sd:sd_io_time=## + 20seconds will result. (Example, if sd_io_time=60, then the pause will be observed to be 80 seconds). This is an explanation of where the extra 20-second timeout comes from. Steps to Follow The driver sets a timeout for each I/O of sd_io_time. The HBA driver uses this parameter to determine when a command is to be timed out. When a target device is no longer responding to commands, every I/O can take up to sd_io_time * sd_retry_count to be failed. The SAN configuration in the problem description actually uses the ssd driver (i.e. ssd:ssd_io_time). In the case of an all SUN stack in SAN config, HBA driver would be attaching to the ssd target driver. If customers use a 3rd party HBA driver (like lpfc from Emulex or JNI drivers), they would attach to the sd target driver. The latest driver from JNI attaches to ssd as they are LV compliant and use the Solaris[TM] drivers for port, transport and scsi-fc mapping (fp/fctl/fcp). Within the LV stack (Leadville), there is a 20-second delay to avoid any unintentional removal of the cable. This ensures that the loss of sync is due to a failure and not due to removing the wrong cable and then realizing the mistake. The 20 seconds is used to correct the mistake. If they do not put the cable back within 20 seconds, error recovery will start. A simple way to determine if a 3rd party HBA driver has a similar timeout value in addition to the sd_io_time is to remove and replug the cable back in and measure the delay. The Leadville stack is made of several Sun[TM] drivers: # modinfo | egrep '(SunFC|mpxio|scsi_vhci)' 21 101fb03a 1002c 150 1 fcp (SunFC FCP v6.0.1-2-1.20) 22 1020ab02 6f48 - 1 fctl (SunFC Transport v6.0.1-2-1.17) 23 102101a2 49ac - 1 mpxio (MDI Library v6.0.1-1-1.7) 24 1021484c 7ac8 195 1 scsi_vhci (SCSI vHCI Driver v6.0.1-1-1.8) 25 1021be3c 10cd3 149 1 fp (SunFC Port v6.0.1-2-1.19) 27 10239c2a 48988 153 1 qlc (SunFC Qlogic FCA v6.0.1-2-1.19) In this example, 6.0.1 is the version of the LV stack. Product Sun Fire 280R Server Internal Comments Audited/updated 12/03/09 Silvana.Villamil@SUN.com, Entry Level SPARC Content Team Member I/O, timeouts Previously Published As 73581 Change History Date: 2010-01-05 User Name: Silvana Villamil Action: updated Comments: Currency check, audited by Silvana Villamil, Entry Level SPARC Content Team Member Audited/updated 12/03/09 Silvana.Villamil@SUN.com, Entry Level SPARC Content Team Member Date: 2004-02-05 User Name: C149439 Action: Approved Comment: Made minor edits to the document. Gail Waldron Version: 0 Date: 2004-02-05 User Name: 26074 Action: Approved Comment: This appears to be technically correct. Version: 0 Date: 2004-02-03 User Name: 27190 Action: Approved Comment: Thanks for the review. Most of the doc came from Renaud Manus and Sanjay Tripathi. Version: 0 Date: 2004-01-27 User Name: 27190 Action: Created Comment: Version: 0 Attachments This solution has no attachment |
||||||||||||
|