Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1003736.1
Update Date:2010-10-14
Keywords:

Solution Type  Problem Resolution Sure

Solution  1003736.1 :   Avoiding SCSI transport errors while running explorer/extractor/sccli with Sun Storage 3310 Arrays  


Related Items
  • Sun Storage 3310 Array
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - 3xxx Arrays
  •  

PreviouslyPublishedAs
205265


Applies to:

Sun Storage 3310 SCSI Array
All Platforms

Symptoms

{SYMPTOM}

When a sccli command is run on a host connected to a Sun Storage 3310 SCSI Array
with I/O occurring to the array, in some configurations, we may see a
long pause followed by bus resets. The following is an example of the messages:
Dec  6 10:00:08 xyz scsi: [ID 107833 kern.warning] WARNING: /pci@1d,700000/pci@1/scsi@4/sd@0,1 (sd79):
Dec 6 10:00:08 xyz SCSI transport failed: reason 'reset': retrying command
Dec 6 10:00:11 xyz scsi: [ID 107833 kern.warning] WARNING: /pci@1d,700000/pci@1/scsi@4/sd@0,1 (sd79):
Dec 6 10:00:11 xyz Error for Command: write(10) Error Level: Retryable
Dec 6 10:00:11 xyz scsi: [ID 107833 kern.notice] Requested Block: 310860508 Error Block: 310860508
Dec 6 10:00:11 xyz scsi: [ID 107833 kern.notice] Vendor: SUN Serial Number: 6215C0B5-00
Dec 6 10:00:11 xyz scsi: [ID 107833 kern.notice] Sense Key: Unit Attention
Dec 6 10:00:11 xyz scsi: [ID 107833 kern.notice] ASC: 0x29 (power on, reset, or bus reset occurred), ASCQ: 0x0, FRU: 0x0
Bus resets cause the pending I/Os to be aborted, causing unnecessary command retries 
which may impact array performance.

Changes

{CHANGE}

Sun[TM] Explorer and Sun StorEdge[TM] 3000 Series Extractor
(se3kxtr) use sccli commands to gather necessary information from the
array. This problem effects Sun Storage 3310 SCSI Arrays which use versions
1.6.2 or lower of the host management software (SUNWsccli), and Firmware revisions
4.11 or lower.

Cause

{CAUSE}

There are two issues associated with this problem:
1. The first issue was a result of a SAF-TE firmware bug in revisions prior to 1159.
2. The second issue was related to a bug in the sscs agent component of the SUNWsccli
software.



Solution



Solution


1. Ensure customers have the latest firmware and 2.x sccli software installed. 
Note: Please refer to the patch README for a detailed procedure on how to upgrade
from 3.x firmware to 4.x if required.
2. Avoid SCSI transport errors while running explorer/extractor/sccli by ensuring that 
there are no host channels having target IDs with un-mapped logical units (LUNs):

There are two ways of avoiding these errors.
1. Invoke SUNWsccli (sccli) out of band. This requires specifying the IP address of the
array. For example:
   # sccli ip-address
2. Ensure that there are no host channels defined for target IDs with un-mapped
logical units (LUNs).
As an example, for the SCSI messages shown above, the following is the 
associated configuration:
  • channels

Ch Type Media Speed Width PID / SID
--------------------------------------------
0 Drive SCSI 80M Wide 6 / 7
1 Host SCSI 80M Wide 0 / 5 <-- Both Primary/Secondary IDs
2 Drive SCSI 80M Wide 6 / 7
3 Host SCSI 80M Wide 4 / 3 <-- Both Primary/Secondary IDs
6 Drive Unknown 1G Narrow NA / NA
7 Host LAN N/A Serial NA / NA

and the lun-maps:

  • lun-maps

Ch Tgt LUN ld/lv ID-Partition Assigned Filter Map
--------------------------------------------------------------
1 0 0 ld0 766975E3-00 Primary
1 0 1 ld1 6215C0B5-00 Primary
3 4 0 ld0 766975E3-00 Primary
3 4 1 ld1 6215C0B5-00 Primary

The above lun-maps output shows that only the Primary controller IDs are utilized, 
although secondary controller IDs are also specified.
To avoid this problem,  remove (un-configure) the SID for the host channels which will 
NOT effect anything else.


Please use following steps to remove the SID from the above configuration.

Schedule a maintenance window and ensure there
is no host activity to perform the following as an array reset
is required which will cause I/O disruption.

1. Telnet into the array
2. Choose "view and edit Scsi channels."
3. Select the host channel on which you want to edit the Primary/Secondary ID.
For out example, it would be channel 1.
4. Choose "view and edit scsi Id."
5. Choose the ID 5 (Secondary Controller)
6. Choose "Delete Channel SCSI ID", select "Yes".
7. You will be prompted to reset the array, select "No".
8. Follow steps 2 to 6 for the host channel 3.
9. At this time, when prompted to reset the array, select "Yes".

After the array is reset, the channels will now look as follows:

  • channels

Ch Type Media Speed Width PID / SID
--------------------------------------------
0 Drive SCSI 80M Wide 6 / 7
1 Host SCSI 80M Wide 0 / NA
2 Drive SCSI 80M Wide 6 / 7
3 Host SCSI 80M Wide 4 / NA
6 Drive Unknown 1G Narrow NA / NA
7 Host LAN N/A Serial NA / NA


As mentioned above, this problem is fixed in the 4.13B or later patch of the firmware, 
and 2.x of the SUNWsccli software. After upgrading to the latest firmware and software, it is
recommended to specify the SIDs only when actually utilized in an array configuration.


Please see bug ID 5007911, Bug ID 4802207 SCSI Bus Reset and @daemon.error under heavy load test for 3310

transport errors, SE3310, explorer, SeExtractor

Change History: Update and Currency
susan.copeland@oracle.com
Change Date: 10/14/10


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback