Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1000405.1
Update Date:2011-03-01
Keywords:

Solution Type  Sun Alert Sure

Solution  1000405.1 :   The Use of Certain sscs(1M) Commands, Array/StorEdge 3900SL CLI Commands, or Certain StorEdge 3900SL/6320 GUI Actions to Manage a Sun StorEdge 3900SL/6120/6320/T3+ Array, Attached via Certain FC Switches, May Cause Loss of Connectivity to a Host(s)  


Related Items
  • Sun Storage 6320 System
  •  
  • Sun Storage T3 Array
  •  
  • Sun Storage T3+ Array
  •  
  • Sun Storage 6120 Array
  •  
  • Sun Storage 3910 Array
  •  
Related Categories
  • GCS>Sun Microsystems>Sun Alert>Criteria Category>Availability
  •  
  • GCS>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  

PreviouslyPublishedAs
200533


Product
Sun StorageTek 3900 Series
Sun StorageTek T3 Array
Sun StorageTek T3+ Array
Sun StorageTek 6120 Array
Sun StorageTek 6320 System

Bug Id
<SUNBUG: 5061572>

Date of Workaround Release
28-JUL-2004

Date of Resolved Release
19-AUG-2004

Impact

Under rare conditions, the use of certain sscs(1M) commands, array/StorEdge 3900SL CLI commands, or certain StorEdge 3900SL/6320 GUI actions to manage a Sun StorEdge 3900SL/6120/6320/T3+ Array, attached via certain Fibre Channel (FC) switches (listed below), and with Host Bus Adapters (HBA) using the Sun QLC HBA driver, may cause loss of connectivity to a host(s). As a result, it is possible the use of these commands can cause multiple path failures, which could lead to a complete loss of host access to the array.


Contributing Factors

This issue can occur in the following platforms:

SPARC Platform

  • Sun StorEdge 3900SL Array
  • Sun StorEdge 6120/6320 Arrays
  • Sun StorEdge T3+ Array

connected to the following switch models:

  • SG-XSWBRO3250 - 3250 switch with 8 ports without firmware patch 115361-05 (firmware version 4.2.2)
  • SG-XSWBRO3850 - 3850 switch with 16 ports without firmware patch 115361-05 (firmware version 4.2.2)
  • SG-XSWBRO3900 - Silkworm 3900 32-port switch without firmware patch 115361-05 (firmware version 4.2.2)
  • SG-XSWBRO12000-32P - 12000 switch with 32 ports without firmware patch 115361-05 (firmware version 4.2.2)
  • SG-XSWBRO12000-64P - 12000 switch with 64 ports without firmware patch 115361-05 (firmware version 4.2.2)
  • SG-XSWBRO12000-MOD - 12000 switch with 16-port module for expansion
  • SG-XSWBRO24K-32P - 24000 switch with 32 ports without firmware patch 115361-05 (firmware version 4.2.2)
  • SG-XSWBRO24K-MOD - 16-port Fibre Channel switch module for 24000 128-port switch

The described issue may occur in the configurations described above when the following sscs(1M) commands, or array/StorEdge 3900SL CLI commands are issued:

sscs(1M) commands:

  • sscs modify volgroup
  • sscs create volume
  • sscs create initiator
  • sscs create pool
  • sscs modify array
  • sscs add initgroup

StorEdge 6120/T3+ telnet(1) commands:

  • lun perm
  • hwwn
  • volslice
  • vol mount
  • sys mp_support

StorEdge 3900SL Service Processor (SP) CLI commands:

The following menu options in the program "/opt/SUNWsecfg/runsecfg" :

  • 3) Configure Sun StorEdge T3+ Array(s)
  • 6) Modify Sun StorEdge T3+ Array Sys Parameters
  • 8) Manage Sun StorEdge T3+ Array LUN Slicing
  • 9) Manage Sun StorEdge T3+ Array LUN Masking

The following commands from the directory "/opt/SUNWsecfg/bin" on the Service Processor (SP):

  • createt3group
  • addtot3group
  • delfromt3group
  • rmt3group
  • createt3slice
  • rmt3slice
  • modifyt3config
  • savet3config
  • modifyt3params
  • sett3lunperm

Notes:

1. Equivalent StorEdge 3900SL/6320 GUI actions to these commands will also cause the issue to occur.

2. The following Read-Only commands will not trigger the described issue:

  • lun perm list
  • hwwn list
  • hwwn listgrp
  • volslice list

3.The described issue may only be encountered under the above mentioned conditions, on hosts using Sun Fibre Channel Host Bus Adapters (HBA) with the QLC driver, for the connection via the above types of FC switches, to Sun StorEdge 3900SL/6120/6320 /T3+ Arrays. Hosts which have the QLC driver loaded, but use a different HBA driver for connecting via the above types of switches to those specific Sun StorEdge arrays, will not be affected by this issue.

To determine if a system uses the Sun QLC HBA driver for its connection to an array, do the following:

Run the format(1M) command as the "root" user, and choose a LUN on each controller path for each 3900SL/6120/6320/T3+ Array from the output list:

    # format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c0t0d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248>
	      /ssm@0,0/pci@1a,700000/pci@2/SUNW,isptwo@4/sd@0,0
1. c0t6d0 <SUN18G cyl 7506 alt 2 hd 19 sec 248>
	      /ssm@0,0/pci@1a,700000/pci@2/SUNW,isptwo@4/sd@6,0
--> 2. c3t60003BA27CC6B00040C472BF000262B0d0 <SUN-T4-0301 cyl 11705 alt 2 hd 7 sec 128>
	      /scsi_vhci/ssd@g60003ba27cc6b00040c472bf000262b0...
	      [lines omitted] ...
Specify disk (enter its number): ^D
#

In the above example, we see a STMS device path for a StorEdge 6120 device on c3, so we need to know which HBA(s) are used to access that STMS device. This is done by using the "luxadm display" command.

The following steps need to be repeated for each different controller number on a system for any controller numbers which are used to connect to StorEdge 3900SL/6120/6320 /T3+ arrays:

Run "luxadm display" on the STMS device path, in order to see the physical HBA(s) which are used to access that STMS device. This uses the STMS device name from format (in our example, that device name is c3t60003BA27CC6B00040C472BF000262B0d0) prepended with "/dev/rdsk" and adding the suffix "s2" (assuming the Solaris standard label with a slice 2 has been used on that device; if this is not the case in your environment, then add a slice number which is used on the STMS device).

As root user:

    # luxadm display /dev/rdsk/c3t60003BA27CC6B00040C472BF000262B0d0s2
DEVICE PROPERTIES for disk:
/dev/rdsk/c3t60003BA27CC6B00040C472BF000262B0d0s2
Vendor:               SUN
Product ID:           T4
Revision:             0301
Serial Num:           Unsupported
Unformatted capacity: 5121.812 MBytes
Write Cache:          Enabled
Read Cache:           Enabled
Minimum prefetch:   0x0
Maximum prefetch:   0x0
Device Type:          Disk device
Path(s):
/dev/rdsk/c3t60003BA27CC6B00040C472BF000262B0d0s2
/devices/scsi_vhci/ssd@g60003ba27cc6b00040c472bf000262b0:c,raw
-->   Controller
/devices/ssm@0,0/pci@1a,700000/SUNW,qlc@1/fp@0,0
Device Address              20030003ba27cc6b,6
Host controller port WWN    210000e08b0aadac
Class                       primary
State                       ONLINE
-->   Controller
/devices/ssm@0,0/pci@1a,700000/SUNW,qlc@1,1/fp@0,0
Device Address              20030003ba27cc63,6
Host controller port WWN    210100e08b2aadac
Class                       secondary
State                       STANDBY
#

From the above output of the "luxadm display" command, the 2 physical HBA paths used for connecting to the StorEdge 6120 LUN are:

  1. /devices/ssm@0,0/pci@1a,700000/SUNW,qlc@1/fp@0,0
  2. /devices/ssm@0,0/pci@1a,700000/SUNW,qlc@1,1/fp@0,0

Note how these device paths contain the string "SUNW,qlc". This means that these HBAs will be using the Sun QLC driver.


Symptoms

If the described issue occurs, "lun failover" messages and host messages from STMS reporting that LUNs are being offlined, and that the paths allowing access to those LUNs are now degraded due to the loss of one path, will be displayed in the array syslog:

    [date time hostname] scsi: [ID 243001 kern.info]
/ssm@0,0/pci@1a,700000/SUNW,qlc@1,1/fp@0,0 (fcp1):
[date time hostname] offlining lun=1f (trace=0), target=90100
(trace=2800004)
...
[date time hostname] Initiating failover for device ssd (GUID
60003ba27cc6b00040c473c3000525ab)
[date time hostname] mpxio: [ID 669396 kern.info]
/scsi_vhci/ssd@g60003ba27cc6b00040c473d600076246 (ssd0) multipath status
: degraded, path /ssm@0,0/pci@1a,700000/SUNW,qlc@1,1/fp@0,0 (fp1) to
target address: 20030003ba27cc63,1f is offline. Load balancing:
round-robin

Note: The above are examples only. On each system, the LUN numbers, target numbers and device paths will vary. To identify that this issue is being seen, check the target trace value ("trace=2800004" above) and the overall sequence of events, where many LUNs failover, and a path is reported to be "offline", after performing any of the commands shown in section 2.


Workaround

To prevent this issue from occurring, ensure that there are no I/O's being generated on the StorEdge 3900SL/6120/6320/T3+ Array that are being routed via the FC switch types listed above, when a command described in Section 2 is issued.

This can be done by quiescing the application on the host system and performing these commands during a maintenance window. Care needs to be taken to do this on all hosts connected to the StorEdge 3900SL/6120/6320/T3+ through the switch and for both of the array controllers if in a partner pair configuration.

If the above issue does occur, then wait for any LUN failovers to complete and follow the recommendations shown below:

On the hosts(s) where the above STMS "offlining lun" and "multipath status: degraded" messages were seen, run the following luxadm(1M) command as root:

    # luxadm -e port
Found path to 2 HBA ports
/devices/ssm@0,0/pci@1a,700000/SUNW,qlc@1/fp@0,0:devctl    CONNECTED
/devices/ssm@0,0/pci@1a,700000/SUNW,qlc@1,1/fp@0,0:devctl  NOT CONNECTED
#

The path that says "NOT CONNECTED" is not operational. If it were operational, it would say "CONNECTED".

Note: On some systems, a path saying "NOT CONNECTED" might mean exactly that. To verify that this is the correct path, please refer to the path that was shown in the STMS error message reporting "multipath status: degraded".

To reconnect the path, issue the following "luxadm -e forcelip" command for any path which is shown as "NOT CONNECTED", but which should be shown as "CONNECTED" in your configuration. In this example, one path is shown as "NOT CONNECTED" but it should be "CONNECTED" so the following command is used:

    # luxadm -e forcelip
/devices/ssm@0,0/pci@1a,700000/SUNW,qlc@1,1/fp@0,0:devctl
#

After running "luxadm -e forcelip" on the path(s) required above, you can confirm that all paths are now usable by running "luxadm -e port" again as shown below:

    # luxadm -e port
Found path to 2 HBA ports
/devices/ssm@0,0/pci@1a,700000/SUNW,qlc@1/fp@0,0:devctl    CONNECTED
/devices/ssm@0,0/pci@1a,700000/SUNW,qlc@1,1/fp@0,0:devctl  CONNECTED
#

The command to initiate a manual failback for LUNs which have failed over to the alternate path is:

    # luxadm failover primary <device path to LUN

Resolution

This issue is addressed on the following platforms:

  • SG-XSWBRO3250 - 3250 Switch with 8 Ports with firmware patch 115361-05 (firmware version 4.2.2) or later
  • SG-XSWBRO3850 - 3850 Switch with 16 Ports with firmware patch 115361-05 (firmware version 4.2.2) or later
  • SG-XSWBRO3900 - Silkworm 3900 32-Port Switch with firmware patch 115361-05 (firmware version 4.2.2) or later
  • SG-XSWBRO12000-32P - 12000 Switch with 32 ports with firmware patch 115361-05 (firmware version 4.2.2) or later
  • SG-XSWBRO12000-64P - 12000 Switch with 64 ports with firmware patch 115361-05 (firmware version 4.2.2) or later
  • SG-XSWBRO24K-32P - 24000 Switch with 32 Ports with firmware patch 115361-05 (firmware version 4.2.2) or later


Modification History
Date: 19-AUG-2004
  • State: Resolved
  • Updated Contributing Factors and Resolution sections

Date: 20-AUG-2004
  • Updated Contributing Factors and Resolution sections


References

<SUNPATCH: 115361-05>

Previously Published As
101540
Internal Comments



PTS Escalation Engineers:



jerome.smith@sun.com Escalation# 1-1330848



brian.austin@sun.com Escalation# 1-1024782



terrie.douglas@sun.com Escalation# 1-1246203



sam.gibson@sun.com



Internal Contributor/submitter
dave.bruce@sun.com

Internal Eng Business Unit Group
NWS (Network Storage)

Internal Eng Responsible Engineer
sam.gibson@sun.com, joseph.poon@sun.com

Internal Services Knowledge Engineer
jeff.folla@sun.com

Internal Escalation ID
1-1330848, 1-1024782, 1-1246203

Internal Resolution Patches
115361-05

Internal Sun Alert Kasp Legacy ID
101540, 57609 (Sun Alert)

Internal Sun Alert & FAB Admin Info
Critical Category: Availability ==> Pervasive
Significant Change Date: 2004-07-28, 2004-08-19, 2004-08-20
Avoidance: Patch, Workaround
Responsible Manager: shailesh.patel@sun.com
Original Admin Info: This document has been imported from KMS Creator and may need adjustment before re-publishing.

This imported document has been reviewed/adjusted by:
Review Name:
Review Date:

Original KMS Creator attributes below:

--- PLEASE DO NOT MAKE ANY CHANGES BELOW THIS LINE! ---

Sun Alert ID: 57609
Synopsis: The Use of Certain sscs(1M) Commands, Array/StorEdge 3900SL CLI Commands, or Certain StorEdge 3900SL/6320 GUI Actions to Manage a Sun StorEdge 3900SL/6120/6320/T3+ Array, Attached via Certain FC Switches, May Cause Loss of Connectivity to a Host(s)
Category: Availability
Product: Sun StorEdge 3900SL Array, Sun StorEdge 6120/6320 Array, Sun StorEdge T3+ Array
BugIDs: 5061572
Avoidance: Workaround, Patch
State: Resolved
Date Released: 28-Jul-2004, 19-Aug-2004, 20-Aug-2004
Date Closed: 19-Aug-2004
Date Modified: 19-Aug-2004, 20-Aug-2004
Escalation IDs: 1-1330848, 1-1024782, 1-1246203
Pending Patches:
Resolution Patches: 115361-05
FIN:
FCO:
Date Submitted: 23-Jul-2004
Submitter: dave.bruce@sun.com
Responsible Engineer: sam.gibson@sun.com, joseph.poon@sun.com
Responsible Manager: shailesh.patel@sun.com
CTE group: PTS NWS US
Responsible Writer: jeff.folla@sun.com
Distribution: Public SunSolve

Workflow History:

WF State: Issued, 19-Aug-2004, Jeff Folla
WF Note: Updated sun alert with available patches. This issue is now resolved.
Sent for re-release.

WF State: Issued, 28-Jul-2004, Jeff Folla
WF Note: sent for release.

WF State: Draft, 28-Jul-2004, Jeff Folla
WF Note: sent for release.

WF State: Draft, 26-Jul-2004, Jeff Folla
WF Note: sent for review.

WF State: Draft, 23-Jul-2004, Jeff Folla
WF Note: Article created.

Exported from KMS Creator Sat May 21 09:12:01 2005 GMT, olaf.reineke@sun.com
Internal SA-FAB Eng Submission
The Use of Certain sscs(1M) Commands, Array/StorEdge 3900SL CLI Commands, or Certain StorEdge 3900SL/6320 GUI Actions to Manage a Sun StorEdge 3900SL/6120/6320/T3+ Array, Attached via Certain FC Switches, May Cause Loss of Connectivity to a Host(s)

Product_uuid
04ccc2c2-16a1-11d7-9f9a-f83fdd2e2f1b|Sun StorageTek 3900 Series
2a6d7d50-0a18-11d6-8e0b-f0bd33b24928|Sun StorageTek T3 Array
2a714b10-0a18-11d6-86e2-d56b387d4fbf|Sun StorageTek T3+ Array
2cd2e7d2-2980-11d7-9c3f-c506fe37b7ef|Sun StorageTek 6120 Array
4de60cc2-a08e-4610-b8bf-6a1881cb59c6|Sun StorageTek 6320 System

References

SUNPATCH:115361-05

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback