Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1021045.1
Update Date:2011-02-25
Keywords:

Solution Type  Sun Alert Sure

Solution  1021045.1 :   During "supportdata" Collection Sun Storage Arrays May Experience Loss of Data Access  


Related Items
  • Sun Storage 6540 Array
  •  
  • Sun Storage 6580 Array
  •  
  • Sun Storage 6140 Array
  •  
  • Sun Storage 6780 Array
  •  
Related Categories
  • GCS>Sun Microsystems>Sun Alert>Criteria Category>Availability
  •  
  • GCS>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  

PreviouslyPublishedAs
269828


Bug Id
<SUNBUG: 6857533>

Date of Resolved Release
14-Oct-2009

During a supportdata collection, both controllers may reboot with a Watchdog timeout ...

1. Impact

During a supportdata collection, both controllers may reboot with a Watchdog timeout
with no other problems present.

The result will be temporary loss of access to data for attached hosts until both dual RAID
controllers finish their respective reboot cycles.


2. Contributing Factors

This issue may occur on the following platforms:
  • StorageTek Flexline 380 Arrays  with 07.50.08.10 or 07.50.13.10 firmware
  • Sun StorageTek 6140 Arrays with 07.50.08.10 or 07.50.13.10 firmware
  • Sun StorageTek 6540 Arrays with 07.50.08.10 or 07.50.13.10 firmware
  • Sun Storage 6580 Arrays with 07.50.08.10 or 07.50.13.10 firmware
  • Sun Storage 6780 Arrays with 07.50.08.10 or 07.50.13.10 firmware

Note 1: Arrays with firmware below the above version number are not affected by this issue.

The firmware version can be determined by the Storage System Summary page of CAM

Firmware Version column.

For Solaris, do the following:

	% /opt/SUNWsefms/bin/ras_admin device_list

example output with ArrayName in BOLD:

Monitored On Device              Type IP Address                    WWN                              Active  	  ASR
-----------------------------------------------------------------------------------------

x4200-21     se6130-01           6130 17n.2n.1nn.46                200400x0x821xxx7     Y                         N
x4200-21     se6130-02           6130 17n.2n.1nn.61                200400x0x821x167     Y                         N

then do the following:

	% touch <new file>
% /opt/SUNWsefms/bin/service -d <ArrayName> -c print -t  profile >> <new file>

Then view <new file> and the firmware version will be listed in the first Summary section.


Note 2: Running the explorer data collection script on a host that has the
Common Array Manager (CAM) software installed with registered arrays can
cause supportdata collections to be done on all registered arrays and
this has the potential to trigger the controller watchdog timeout reboot
condition if there are registered arrays that are at affected firmware revision.


3. Symptoms

A watchdog timeout reboot condition occurs.  The controller will log the following error:

WARNING: Restart by watchdog time out



4. Workaround

To workaround this issue do the following:

Parts of the supportdata collection can be done via command line without causing this issue.
The key point is to avoid collecting the stateCaptureData.dmp part of the supportdata.
Cam service command examples run from Solaris to collect parts of the supportdata:

	% /opt/SUNWsefms/bin/ras_admin device_list

use the above command to get arrayname for commands below:

	% /opt/SUNWsefms/bin/service -d arrayname -c save -t iom -p <path> -o <filename>
% /opt/SUNWsefms/bin/service -d arrayname -c save -t mel -p <path> -o <filename>
% /opt/SUNWsefms/bin/service -d arrayname -c save -t profile -p <path> -o <filename>
% /opt/SUNWsefms/bin/service -d arrayname -c save -t rls -p <path> -o <filename>
% /opt/SUNWsefms/bin/service -d arrayname -c save -t soc -p <path> -o <filename>

Santricity "smcli" command examples run from Solaris to collect parts of the supportdata:

	% /opt/SMgr/client/SMcli -d -i display inventory 

use the above command to get arrayname for commands below:

	% /opt/SMgr/client/SMcli -n arrayname -c "upload storageArray file=\"mel.log\" content=allEvents;"
% /opt/SMgr/client/SMcli -n arrayname -c "upload storageArray file=\"config.txt\" content=configuration;"
% /opt/SMgr/client/SMcli -n arrayname -c "upload storageArray file=\"rls.csv\" content=RLSCounts ;"


5. Resolution

This issue is addressed in the following platforms:

  • StorageTek Flexline 380 Array  with 07.60.18.10 firmware or later
  • Sun StorageTek 6140 Array with 07.60.18.10 firmware or later
  • Sun StorageTek 6540 Array with 07.60.18.10 firmware or later
  • Sun Storage 6580 Array with 07.60.18.10 firmware or later
  • Sun Storage 6780 Array with 07.60.18.10 firmware or later

Note: the above firmware is part of the CAM 6.5.0 release available at:

http://www.sun.com/storage/management_software/resource_management/cam/get_it.jsp


This Sun Alert notification is being provided to you on an "AS IS" basis. This Sun Alert notification may contain information provided by third parties. The issues described in this Sun Alert notification may or may not impact your system(s). Sun makes no representations, warranties, or guarantees as to the information contained herein. ANY AND ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT, ARE HEREBY DISCLAIMED. BY ACCESSING THIS DOCUMENT YOU ACKNOWLEDGE THAT SUN SHALL IN NO EVENT BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES THAT ARISE OUT OF YOUR USE OR FAILURE TO USE THE INFORMATION CONTAINED HEREIN. This Sun Alert notification contains Sun proprietary and confidential information. It is being provided to you pursuant to the provisions of your agreement to purchase services from Sun, or, if you do not have such an agreement, the Sun.com Terms of Use. This Sun Alert notification may only be used for the purposes contemplated by these agreements.

Copyright 2000-2009 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved.


Product
Sun StorageTek Flexline 380 Array
Sun StorageTek 6140 Array
Sun StorageTek 6540 Array
Sun Storage 6580 Array
Sun Storage 6780 Array

Internal Comments (for SAs)
There are two possible causes, a watchdog timeout or a memory leak both of which can
happen as part of the stateCaptureData.dmp information collection.


Problem one is caused by the amount of trace information collected in
the stateCaptureData.dmp section. If the trace is very full and the
controller processor is very busy then the trace print will take too
long triggering the watchdog timeout. This is typically seen on the 6140
and is not usually seen on the other arrays as they have faster processors.
Problem two is caused by a memory leak due to the type of dqprint
command used in the stateCaptureData.dmp collection.

A trace file is generated that is kept in the controller memory. That trace file
is not removed at the end of the supportdata collection thus the memory leak.
Subsequent supportdata collections each will add another trace file to the memory
until the controller ends up rebooting due to memory problems. This problem can be
seen on all of the affected arrays, but again, it should be seen more often
on the 6140 because it has less memory than the other arrays.

To check an array for trace files and clear them, the following serial shell
commands can be used to show/ delete the traces that may exist in the
controller's memory. Once the trace files are cleared from both controllers
then a supportdata can be taken without concern for the controller
reboot issue.

dqlist This will show potential trace files in memory.
dqclear "trace" This clears the current trace.
dqdelete "trace.*" This deletes the next trace file in memory.

Run above commands as shown below in the example until the controllers posts
the following to the shell :
Usage: dqdelete "label"

Following is examples of the commands being run :

-> dqlist

---------------------------------------------- Debug Queue
-----------------------------------------------
Label (dl) Address Description Size Elements Full Hndlr
(df) First Last
----------- ---------- ----------- ---------- -------- ----
---------- ----------------- -----------------
ddcDq 0x7a58be00 ACTIV:dd=2 94834700 0 000% stop
trace 0x153a6c00 ACTIV:dd=2 5230604 5878 003%
none 09/03/09-15:23:23 09/03/09-15:23:43
platform 0x004380b8 ACTIV:dd=2 141324 346 004%
none 09/03/09-09:49:25 09/03/09-15:15:05
=> trace.5 0x041818a4 none 608
13 09/03/09-15:23:12 09/03/09-15:23:18
trace.4 0x041151cc none 39364
1193 09/03/09-15:23:10 09/03/09-15:23:12
trace.3 0x04136a08 none 42036
1241 09/03/09-15:23:03 09/03/09-15:23:07
trace.2 0x040a0458 none 119864
3602 09/03/09-15:22:48 09/03/09-15:23:02
trace.1 0x03defadc none 8428
209 09/03/09-15:20:22 09/03/09-15:22:43

value = 1 = 0x1
-> dqclear "trace"
value = 1 = 0x1
-> dqdelete "trace.*"
value = 1 = 0x1
-> dqdelete "trace.*"
value = 1 = 0x1
-> dqdelete "trace.*"
value = 1 = 0x1
-> dqdelete "trace.*"
value = 1 = 0x1
-> dqdelete "trace.*"
value = 1 = 0x1
-> dqdelete "trace.*"
Usage: dqdelete "label"
value = 0 = 0x0
-> dqlist

---------------------------------------------- Debug Queue
-----------------------------------------------
Label (dl) Address Description Size Elements Full Hndlr
(df) First Last
----------- ---------- ----------- ---------- -------- ----
---------- ----------------- -----------------
ddcDq 0x7a58be00 ACTIV:dd=2 94834700 0 000% stop
=> trace 0x153a6c00 ACTIV:dd=2 5230604 0 000% none
platform 0x004380b8 ACTIV:dd=2 141324 346 004%
none 09/03/09-09:49:25 09/03/09-15:15:05

value = 1 = 0x1
->

Done.

Internal Contributor/submitter
don.curren@sun.com

Internal Eng Responsible Engineer
rich.floyd@sun.com

Internal Eng Business Unit Group
Storage Group - Midrange Disk

Internal Services Knowledge Engineer
karen.edwards@sun.com

Internal Sun Alert & FAB Admin Info
13-Oct-2009, karen, submitted yesterday. sending to 24hr review today.


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback