Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1020941.1
Update Date:2010-11-17
Keywords:

Solution Type  Sun Alert Sure

Solution  1020941.1 :   Solaris scsi_vhci Driver may not Fail Over Devices Properly  


Related Items
  • Sun Storage 6780 Array
  •  
  • OpenSolaris Operating System
  •  
  • Sun Storage Flexline 380 Array
  •  
  • Sun Storage 6540 Array
  •  
  • Sun Storage 6580 Array
  •  
  • Solaris SPARC Operating System
  •  
  • Sun Storage 2530 Array
  •  
  • Sun Storage 6140 Array
  •  
  • Sun Storage 2540 Array
  •  
  • Sun Storage 2510 Array
  •  
Related Categories
  • GCS>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  
  • GCS>Sun Microsystems>Sun Alert>Criteria Category>Availability
  •  

PreviouslyPublishedAs
267709


Bug Id
<SUNBUG: 6783286>, <SUNBUG: 6808529>

Product
Solaris 9 Operating System
Solaris 10 Operating System
OpenSolaris

Date of Resolved Release
05-Oct-2009

The Solaris scsi_vhci driver may misinterpret SCSI information from the storage device:

1. Impact

The Solaris scsi_vhci driver may misinterpret SCSI information from the storage device when an externally initiated failover of device paths occurs on any asymmetric array that supplies SCSI sense data in "descriptor" format, as opposed to a "fixed format". As a result, the driver may fail to manage failover between primary and secondary RAID controllers for a given volume. This could potentially cause a loss of access to data on the storage device.

2. Contributing Factors

This issue can occur in the following releases:

SPARC Platform
  • Solaris 8 with patch 111412-03 or later
  • Solaris 9 with patch 113039-01 or later
  • Solaris 10 without patch 140919-02
  • OpenSolaris based upon builds snv_01 through snv_112
x86 Platform
  • Solaris 10 without patch 140920-02
  • OpenSolaris based upon builds snv_01 through snv_112
Note 1: This issue only occurs on asymmetric RAID array devices using SCSI Descriptor Sense format that are 1 Terabyte(TB) or greater in size.

To determine if an array is asymmetric on Solaris 10 and OpenSolaris, the following command can be used:
    # mpathadm show lu <device> | grep Asymmetric
For example:
    # mpathadm show lu  /dev/rdsk/c3t0690A018007144392846C48B30F02F66d0s2 | grep Asymmetric
Asymmetric:  yes
#
To determine if an array is asymmetric on Solaris 9, use the method defined in the following document:
Note 2: There is no method to identify whether a storage array uses SCSI Descriptor format, however, the following Sun equipment uses this format:
  • Sun StorageTek 2510 with firmware 07.35.10.10 or later
  • Sun StorageTek 2530 with firmware 07.35.10.10 or later
  • Sun StorageTek 2540 with firmware 07.35.10.10 or later
  • Sun StorageTek 6140 with firmware 07.10.25.10 or later
  • Sun StorageTek 6540 with firmware 07.10.25.10 or later
  • Sun Storage 6580 with firmware 07.30.22.10 or later
  • Sun Storage 6780 with firmware 07.30.22.10 or later
  • StorageTek Flexline 380 with firmware 07.10.25.10 or later
Note 3: The above list is accurate as of the time of this writing. It does not include other storage vendor array products that are not sold by Sun.

3. Symptoms

An external failover is one which is not initiated by the host directly, but rather by a different host, or by the array itself based on some condition outside the physical connections to this specific host. The host should report this as an externally initiated failover and change it's online path(s) accordingly:
    Nov 14 12:15:34 kvsfs2 scsi: [ID 243001 kern.info] /scsi_vhci (scsi_vhci0):
Nov 14 12:15:34 kvsfs2  /scsi_vhci/ssd@g600a0b80004881da0000063b48d8ee41
(ssd252): path /pci@12,600000/SUNW,qlc@0/fp@0,0 (fp5) target address
201400a0b84881da,0 is now ONLINE because of an externally initiated failover
Nov 14 12:15:34 kvsfs2 scsi: [ID 243001 kern.info] /scsi_vhci (scsi_vhci0):
Nov 14 12:15:34 kvsfs2  /scsi_vhci/ssd@g600a0b80004881da0000063b48d8ee41
(ssd252): path /pci@2,600000/SUNW,qlc@0,1/fp@0,0 (fp13) target address
204500a0b84881da,0 is now STANDBY because of an externally initiated failover
After these messages, i/o to the volume should continue on the new online path(s).

For volumes or luns greater than 1TB the messages reported by the array indicating that the ownership has changed are not in the format that the scsi_vhci driver is expecting.  The result is that the scsi_vhci will not transition to the new owning path and route IO accordingly.

Additionally, messages similar to the following may be seen in the "/var/adm/messages" file for a given volume from the disk storage array:
    Jul 23 05:32:40 myhost         Error for Command:
read(10)                Error Level: Retryable
Jul 23 05:32:40 myhost scsi: [ID 107833 kern.notice]   Requested Block:
2493911074                Error Block: 0
Jul 23 05:32:40 myhost scsi: [ID 107833 kern.notice]   Vendor:
SUN                                Serial Number:
Jul 23 05:32:40 myhost scsi: [ID 107833 kern.notice]   Sense Key: Reserved
Jul 23 05:32:40 myhost scsi: [ID 107833 kern.notice]   ASC: 0x0 (no
additional sense info), ASCQ: 0x0, FRU: 0x0
Jul 23 05:32:40 myhost scsi: [ID 107833 kern.warning] WARNING:
/scsi_vhci/ssd@g600a0b80002a2f760000037548ac002c (ssd89):
As a result, Solaris will fail to access the volume at even the most basic levels.

In the following example, format(1M) will show the volume as "Unavailable" and attempts to label the device result as follows:
    [disk unformatted]
Disk not labeled.  Label it now?
If you enter "y" to label the disk it fails :
    Warning: error writing EFI.
Write label failed
There may also be the following associated error report in the messages file:
    Nov 14 13:49:42 kvsfs2 scsi: [ID 107833 kern.warning] WARNING:
/scsi_vhci/ssd@g600a0b800026b13400000f40461072f9 (ssd225):
Nov 14 13:49:42 kvsfs2  i/o to invalid geometry
4. Workaround

There is no workaround to prevent this issue from occurring. To recover from this issue, perform a host boot cycle, or use the following command:
    # mpathadm failovor lu <logical unit name>
5. Resolution

This issue is addressed in the following releases:

SPARC Platform
  • Solaris 10 with patch 140919-02 or later
  • OpenSolaris based upon builds snv_113 or later
x86 Platform
  • Solaris 10 with patch 140920-02 or later
  • OpenSolaris based upon builds snv_113 or later
Solaris 8 and Solaris 9 will require an upgrade to Solaris 10 with the appropriate patches to resolve this issue.




References

<SUNPATCH: 140919-02>
<SUNPATCH: 140920-02>

Internal Comments
Please send technical questions to the following email:
sunalert-tech-questions@sun.com
and CC the following persons:
Internal Contributor/Submitter
Internal Eng Responsible Engineer
Internal Services Knowledge Engineer

Internal Contributor/submitter
curtis.decotis@sun.com

Internal Eng Responsible Engineer
Sheshadri.Vasudevan@Sun.COM

Internal Services Knowledge Engineer
jeff.folla@sun.com

Internal Eng Business Unit Group
OP/N1 RPE (Operating Platforms/N1 Revenue Product Engin.)

Internal Escalation ID
71376902, 70671616, 1-25129773, 70666914, 70678378, 71091096, 71226886, 71397710

Internal Resolution Patches
140919-02, 140920-02

References

SUNPATCH:140919-02
SUNPATCH:140920-02

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback