Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-73-1001039.1
Update Date:2010-11-10
Keywords:

Solution Type  FAB (standard) Sure

Solution  1001039.1 :   Sun StorEdge 5310 Systems With Pre-built LUNs In the Expansion Unit (EU/JBOD) May Not Function Properly After Installation  


Related Items
  • Sun Storage 5310 NAS Appliance
  •  
Related Categories
  • GCS>Sun Microsystems>Sun FAB>Standard>Mandatory
  •  

PreviouslyPublishedAs
201366


Product
Sun StorageTek 5310 NAS Appliance

Bug Id
<SUNBUG: 6414723>

Part
  • Part No: 594-0597-02
  • Part Description: StorEdge 5310 Expansion Unit
Part
  • Part No: 594-2294-01
  • Part Description: StorEdge 5310
Part
  • Part No: 594-0596-01
  • Part Description: StorEdge 5310 Expansion Unit
Part
  • Part No: 594-0345-01
  • Part Description: StorEdge 5310
Xoption
  • Xoption Number: XTA5300R01A0FU1022
  • Xoption Description: StorEdge 5310 Expansion Unit
Xoption
  • Xoption Number: XTA5300R01A0SY56
  • Xoption Description: StorEdge 5310
Xoption
  • Xoption Number: XTA5300R01A0FU2044
  • Xoption Description: StorEdge 5310 Expansion Unit
Xoption
  • Xoption Number: XTA5300R01A0FU42
  • Xoption Description: StorEdge 5310

Impact

 


Contributing Factors

The following table represents the marketing part numbers and the corresponding manufacturing part numbers:

Marketing Part Number	    Manufacturing #	    Description
XTA5300R01A0SY56		594-0345-01 	    SE5310
XTA5300R01A0FU2044		594-0597-02 	    SE5310 EU
XTA5300R01A0FU42		594-2294-01 	    SE5310
XTA5300R01A0FU1022		594-0596-01	    SE5310 EU

Only units (as described above) manufactured on or before May 31, 2006 are affected by this issue. The system record date, reflected in the CIS (Customer Information Sheet), shows the date of manufacture for the tray.

Systems shipped June 1, 2006 and on have the "Important Installation Notice" (Part Number 819-6770-10) included in the ship kit. The unit will also have a STOP label over the power inlet sockets on the power supplies.

If the SE5310 system(s) are not affected, do not perform the recovery procedure. In addition, systems with the controller enclosure and without an expansion unit do not have this issue.


Symptoms

IF one or more of the following symptoms is observed, the system might be affected:

  • One of the RAID controllers resets or reboots while the NAS head is booting.
  • Telnet's LUN Path Screen is displayed slowly (one line at a time) and the controller reboots when running auto-assign LUN path from this screen.
  • If creation of an SFS2 partition OF MAXIMUM SIZE (256GB) takes more than 5 minutes.
  • There are multiple error messages in the NAS system log indicating LUNs are being transferred between controllers, or LUN not ready (ASC code 0x4).

To identify systems which might be experiencing the above issue(s),use the Command Line Interface (CLI):

  1. Telnet to the StorEDGE NAS with any telnet client
  2. At the [menu] prompt type "admin" and press the enter key
  3. Type the administrator password for the NAS if prompted
  4. This will get you to a CLI

Determine the NAS OS that is running by performing a "telnet" to the NAS head and then run the "version" command:

CRS_H1 > version
StorEdge Model 5310C NAS S/N 1234569 Version 4.11 M0 (Build 12)

Click the following link to download the "SE53x0 and SE53x0C clear array" zip file:

http://pts-storage.west/products/SE5210/code/cleararray.ZIP

Unzipping the downloaded file will show two files: crs_v11.nsm and crs_v12.nsm used in the procedure below.

Note: In place of "crs_v1x.nsm" below, substitute either "crs.v11.nsm" or "crs.v12.nsm" as appropriate.

Next steps:

1. Boot the NAS head up and boot both heads if this is a cluster system.

2. Copy the crs_v1x.nsm to the NAS:/cvol directory.

Note: For clustered systems, copy this module to /cvol of head 1.  Refer to "SUN StorEdge 5310 NAS Appliance Admin Guide" Chapter 8 on how to setup and delete a NFS export for /cvol. Make sure to remove this export entry after copying crs_v1x.nsm module to /cvol.

3. Install the crs_v1x module:

CRS_H1 > load /cvol/crs_v1x.nsm
   Found 2 Engenio RAID Array

4. Run "show_crs" and capture its output.  

The output of "show_crs" might scroll off the screen if there is more than 1 array in the system.  For Windows users, use the "putty" program with logging enabled to capture the output. For Unix users, run the "script" command, then use "telnet" to connect to the NAS head to capture the output.

 

CRS_H1 > show_crs
	*** Checking Array 1 using Ctlr 0...
	Controller F/W version: 06.12.09.10
		Ctlr A: Online
		Ctlr B: Online
	NVSRAM version: N2882-612843-503
	Number of Expansion Tray(s): 3
	TrayID 1 ESM Location A ( Left canister) - F/W version: 9631
	TrayID 1 ESM Location B (Right canister) - F/W version: 9631
	TrayID 2 ESM Location A ( Left canister) - F/W version: 9631
	TrayID 2 ESM Location B (Right canister) - F/W version: 9631
	TrayID 3 ESM Location A ( Left canister) - F/W version: 9631
	TrayID 3 ESM Location B (Right canister) - F/W version: 9631
Executing  cfgUnitList(0,0,0,0,0,0,0,0,0,0):
Vol#   vdUnit State/Stat vdFlags cfgFlags Pcs vdPiece Owner
0 44be8f40 0000/0000 0053270f 00110000  7 44be90f8 prim
1 44bd5b00 0000/0000 0053270f 00110000  6 44bd5cb8 prim
2 44bd73c0 0000/0000 0013270f 00110000  7 44bd7578  alt
3 44bd70a0 0000/0000 0013270f 00110000  6 44bd7258  alt
4 44bd6d40 0000/0000 0053270f 00110000  7 44bd6ef8 prim
5 44bd6a20 0000/0000 0053270f 00110000  6 44bd6bd8 prim
6 44bd6240 0000/0000 0013270f 00110008  7 44bd63f8  alt
7 44bd5ea0 0000/0000 0013270f 00110008  6 44bd6058  alt
128 47c71840 0000/0000 0000000f 00000004  0 00000000  alt
1024 44bac150 0000/0000 0000000f 00000000  2 44bac308  alt
<2>
	*** Volumes with CRS problem in Array 1:
		LUN SSID 6
		LUN SSID 7
	*** Checking Array 2 using Ctlr 2...
	Controller F/W version: 06.12.09.10
		Ctlr A: Online
		Ctlr B: Online
	NVSRAM version: N2882-612843-503
	Number of Expansion Tray(s): 4
	TrayID 1 ESM Location A ( Left canister) - F/W version: 9631
	TrayID 1 ESM Location B (Right canister) - F/W version: 9631
	TrayID 2 ESM Location A ( Left canister) - F/W version: 9631
	TrayID 2 ESM Location B (Right canister) - F/W version: 9631
	TrayID 3 ESM Location A ( Left canister) - F/W version: 9631
	TrayID 3 ESM Location B (Right canister) - F/W version: 9631
	TrayID 12 ESM Location A ( Left canister) - F/W version: 9631
	TrayID 12 ESM Location B (Right canister) - F/W version: 9631
Executing  cfgUnitList(0,0,0,0,0,0,0,0,0,0):
Vol#   vdUnit State/Stat vdFlags cfgFlags Pcs vdPiece Owner
0 44bce5c0 0000/0000 0013271f 00110000  7 44bce778  alt
1 44bcca80 0000/0000 0053271f 00110000  6 44bccc38 prim
2 44bcdf40 0000/0000 0053271f 00110000  7 44bce0f8 prim
3 44bcdc20 0000/0000 0013271f 00110000  6 44bcddd8  alt
4 44bcd8c0 0000/0000 0013271f 00110000  7 44bcda78  alt
5 44bcd100 0000/0000 0053271f 00110000  6 44bcd2b8 prim
6 44bce2a0 0000/0000 0053271f 00110000  6 44bce458 prim
7 44bcc740 0000/0000 0013271f 00110000  6 44bcc8f8  alt
8 44bccda0 0000/0000 0013271f 00110008  7 44bccf58  alt
9 44bcc420 0000/0000 0053270f 00110008  6 44bcc5d8 prim
128 47c361fc 0000/0000 0000000f 00000004  0 00000000  alt
1024 44ba23dc 0000/0000 0000000f 00000000  2 44ba2594  alt
<2>
	*** Volumes with CRS problem in Array 2:
		LUN SSID 8
		LUN SSID 9
	CRS_H1>

5. Verify the output of "show_crs".

Any volume that has its cfgFlags with bit 3 set (0x8) has been impacted by this issue. The command displays the SSIDs of volumes that have CRS issue. Verify the cfgFlags of all volumes to make sure the list is complete and correct.


6. Unload the crs_v1x module:    CRS_H1 > unload crs_v11    Unloading module crs_v11...

Follow the instructions below to recover any arrays that were identified.


Root Cause

The root cause is due to the implementation of the group migration procedure with the manufacturing process.  When the volume group was powered off and relocated without the correct procedure, the symptoms presented themselves.


Workaround

 


Resolution

The procedure to recover the arrays involves identifying the affected volumes, putting the corresponding volume groups offline then back online again. As part of the process, controller B of each array is put to an offline state before the process starts and put back to an online state after the process completes.

Note: Downtime must be scheduled to accommodate the reboot of affected systems.

Verify system requirements prior to executing recovery procedure:

1. The NAS head has to run OS version 4.11 or 4.12.  <>, which contains the upgrade to OS version 4.12 and the firmware listed below.

2. The CSM array must have the following F/W revisions:

  • Controller F/W version 06.12.09.10 or newer
  • NVSRAM version N2882-612843-503
  • FC JBOD ESM version 9631
  • SATA JBOD ESM version 9722

Use the output of "show_crs" command to verify the F/W version numbers.

RECOVERY PROCEDURE

This section describes a (9 step) user procedure, using the crs_v1x.nsm module, to recover the affected RAID arrays. This procedure applies to all existing systems with or without user data stored on the array.

NOTE: The system administrator should schedule down time for the NAS head since the system will be unavailable during the (9 step) recovery procedure.

Step 1. Force any clustered system into the "Alone" state.

If the head 1 is already in the "Alone" state then skip to step 2. Otherwise power off head 2 and wait until head 1 completes the failover and becomes "Alone". Watch the NAS front panel LCD for the status displayed as "Alone".

Step 2. Reboot the NAS head.  If clustered - reboot the "Alone" head.

Step 3. Connect to the NAS head and capture all output from the NAS head.

For Windows users: use the "putty" program with logging enabled to capture the output of the NAS head. For Unix users: run the "script" command then use "telnet" to connect to the NAS head to capture the output of the NAS head.

Step 4. Make sure all RAID controllers are "Online".  

Use the output of "show_crs" command to verify that all controllers status is "Online".  If any of the controllers are missing or any of the controllers status is not "Online", stop this procedure and reboot the controllers using the following steps:

  1. Using the CLI, GUI or Telnet option, shutdown the NAS head
  2. Power cycle both RAID controllers, leave the JBODs power on and wait 3 minutes for both controllers to come online and all drive LEDs stop flashing
  3. Restart the recovery procedure from step 1 again

Step 5. Copy the crs_v1x.nsm to the NAS head and install.

After copying the crs_v1x.nsm to /cvol on the NAS head, install the module:

   CRS_H1> load /cvol/crs_v11.nsm
   Found 2 Engenio RAID Arrays

Step 6. Run the fix_crs CLI command to fix the issue in all arrays:

Below is a sample display of running fix_crs command on system with two arrays affected:

CRS_H1 > fix_crs
	*** Checking Array 1 using Ctlr 0...
Executing  cfgUnitList(0,0,0,0,0,0,0,0,0,0):
Vol#   vdUnit State/Stat vdFlags cfgFlags Pcs vdPiece Owner
0 44bd51e0 0000/0000 0053270f 00110000  7 44bd5398 prim
1 44bd3d20 0000/0000 0053270f 00110000  6 44bd3ed8 prim
2 44bd4e80 0000/0000 0013270f 00110000  7 44bd5038  alt
3 44bd4b60 0000/0000 0013270f 00110000  6 44bd4d18  alt
4 44bd4800 0000/0000 0053270f 00110000  7 44bd49b8 prim
5 44bd44c0 0000/0000 0053270f 00110000  6 44bd4678 prim
6 44bd39c0 0000/0000 0013270f 00110008  7 44bd3b78  alt
7 44bd3680 0000/0000 0013270f 00110008  6 44bd3838  alt
128 47c3f4e8 0000/0000 0000000f 00000004  0 00000000  alt
1024 44baa3fc 0000/0000 0000000f 00000000  2 44baa5b4  alt
<2>
	*** Volumes with CRS problem in Array 1:
		LUN SSID 6
		LUN SSID 7
Array 1 has 2 volume(s) with CRS problem
Send cmd to Ctlr 0 to Offline Ctlr 1
	**** Set Ctlr 1 to Offline OK.
CtlrB of Array 1 Offline OK - wait 10 secs.
Offline Volume SSID 6
SSID 6 owned by Ctlr Slot A -> 0
OK
Wait 8 sec...
Online Volume SSID 6
SSID 6 owned by Ctlr Slot A -> 0
OK
Wait 8 sec...
Offline Volume SSID 7
SSID 7 owned by Ctlr Slot A -> 0
OK
Wait 8 sec...
Online Volume SSID 7
SSID 7 owned by Ctlr Slot A -> 0
OK
Wait 8 sec...
Send cmd to Ctlr 0 to Online Ctlr 1
	**** Set Ctlr 1 to Online OK.
CtlrB of Array 1 Online OK - wait 20 secs.
	*** Checking Array 2 using Ctlr 2...
Executing  cfgUnitList(0,0,0,0,0,0,0,0,0,0):
Vol#   vdUnit State/Stat vdFlags cfgFlags Pcs vdPiece Owner
0 44be3120 0000/0000 0013271f 00110000  7 44be32d8  alt
1 44bd0060 0000/0000 0053271f 00110000  6 44bd0218 prim
2 44bd12a0 0000/0000 0053271f 00110000  7 44bd1458 prim
3 44bd0f60 0000/0000 0013271f 00110000  6 44bd1118  alt
4 44bd0c20 0000/0000 0013271f 00110000  7 44bd0dd8  alt
5 44bd0400 0000/0000 0053271f 00110000  6 44bd05b8 prim
6 44bd1600 0000/0000 0053271f 00110000  6 44bd17b8 prim
7 44bcfcc0 0000/0000 0013270f 00110000  6 44bcfe78  alt
8 44bcf900 0000/0000 0013270f 00110008  7 44bcfab8  alt
9 44bcf560 0000/0000 0053270f 00110008  6 44bcf718 prim
128 47c37a80 0000/0000 0000000f 00000004  0 00000000  alt
1024 44ba5978 0000/0000 0000000f 00000000  2 44ba5b30  alt
<2>
	*** Volumes with CRS problem in Array 2:
		LUN SSID 8
		LUN SSID 9
Array 2 has 2 volume(s) with CRS problem
Send cmd to Ctlr 2 to Offline Ctlr 3
	**** Set Ctlr 3 to Offline OK.
CtlrB of Array 2 Offline OK - wait 10 secs.
Offline Volume SSID 8
SSID 8 owned by Ctlr Slot A -> 2
OK
Wait 8 sec...
Online Volume SSID 8
SSID 8 owned by Ctlr Slot A -> 2
OK
Wait 8 sec...
Offline Volume SSID 9
SSID 9 owned by Ctlr Slot A -> 2
OK
Wait 8 sec...
Online Volume SSID 9
SSID 9 owned by Ctlr Slot A -> 2
OK
Wait 8 sec...
Send cmd to Ctlr 2 to Online Ctlr 3
	**** Set Ctlr 3 to Online OK.
CtlrB of Array 2 Online OK - wait 20 secs.
CRS_H1 >

NOTE: Review the output of the "fix_crs" command, if any of the commands above fail or hang, power off the NAS head and follow the instructions in step 4 to power cycle the controllers and restart the recovery procedure from step 1 again.

Step 7. Verify fix

CRS_H1 > show_crs
	*** Checking Array 1 using Ctlr 0...
	Controller F/W version: 06.12.09.10
		Ctlr A: Online
		Ctlr B: Online
	NVSRAM version: N2882-612843-503
	Number of Expansion Tray(s): 3
	TrayID 1 ESM Location A ( Left canister) - F/W version: 9631
	TrayID 1 ESM Location B (Right canister) - F/W version: 9631
	TrayID 2 ESM Location A ( Left canister) - F/W version: 9631
	TrayID 2 ESM Location B (Right canister) - F/W version: 9631
	TrayID 3 ESM Location A ( Left canister) - F/W version: 9631
	TrayID 3 ESM Location B (Right canister) - F/W version: 9631
Executing  cfgUnitList(0,0,0,0,0,0,0,0,0,0):
Vol#   vdUnit State/Stat vdFlags cfgFlags Pcs vdPiece Owner
0 44bd51e0 0000/0000 0053270f 00110000  7 44bd5398 prim
1 44bd3d20 0000/0000 0053270f 00110000  6 44bd3ed8 prim
2 44bd4e80 0000/0000 0053270f 00110000  7 44bd5038 prim
3 44bd4b60 0000/0000 0053270f 00110000  6 44bd4d18 prim
4 44bd4800 0000/0000 0053270f 00110000  7 44bd49b8 prim
5 44bd44c0 0000/0000 0053270f 00110000  6 44bd4678 prim
6 44bd39c0 0000/0000 0053270f 00110000  7 44bd3b78 prim
7 44bd3680 0000/0000 0053270f 00110000  6 44bd3838 prim
128 47c3f4e8 0000/0000 0000000f 00000004  0 00000000  alt
1024 44baa3fc 0000/0000 0000000f 00000000  2 44baa5b4  alt
<0>
	*** No volume with CRS problem in Array 1 ***
	*** Checking Array 2 using Ctlr 2...
	Controller F/W version: 06.12.09.10
		Ctlr A: Online
		Ctlr B: Online
	NVSRAM version: N2882-612843-503
	Number of Expansion Tray(s): 4
	TrayID 1 ESM Location A ( Left canister) - F/W version: 9631
	TrayID 1 ESM Location B (Right canister) - F/W version: 9631
	TrayID 2 ESM Location A ( Left canister) - F/W version: 9631
	TrayID 2 ESM Location B (Right canister) - F/W version: 9631
	TrayID 3 ESM Location A ( Left canister) - F/W version: 9631
	TrayID 3 ESM Location B (Right canister) - F/W version: 9631
	TrayID 12 ESM Location A ( Left canister) - F/W version: 9631
	TrayID 12 ESM Location B (Right canister) - F/W version: 9631
Executing  cfgUnitList(0,0,0,0,0,0,0,0,0,0):
Vol#   vdUnit State/Stat vdFlags cfgFlags Pcs vdPiece Owner
0 44be3120 0000/0000 0053271f 00110000  7 44be32d8 prim
1 44bd0060 0000/0000 0053271f 00110000  6 44bd0218 prim
2 44bd12a0 0000/0000 0053271f 00110000  7 44bd1458 prim
3 44bd0f60 0000/0000 0053271f 00110000  6 44bd1118 prim
4 44bd0c20 0000/0000 0053271f 00110000  7 44bd0dd8 prim
5 44bd0400 0000/0000 0053271f 00110000  6 44bd05b8 prim
6 44bd1600 0000/0000 0053271f 00110000  6 44bd17b8 prim
7 44bcfcc0 0000/0000 0053270f 00110000  6 44bcfe78 prim
8 44bcf900 0000/0000 0053270f 00110000  7 44bcfab8 prim
9 44bcf560 0000/0000 0053270f 00110000  6 44bcf718 prim
128 47c37a80 0000/0000 0000000f 00000004  0 00000000  alt
1024 44ba5978 0000/0000 0000000f 00000000  2 44ba5b30  alt
<0>
	*** No volume with CRS problem in Array 2 ***

NOTE: If there is still a volume with issues, repeat the recovery procedure again until the issue is completely removed from all volumes. Depending on the state of the array, the recovery procedure might need to be repeated to clear the arrays. If the recovery procedure fails to clear the arrays after two tries, stop and call your Sun service representative and open an escalation with PTS.

Step 8. Unload the crs_v1x module:

    CRS_H1 > unload crs_v11
    Unloading module crs_v11...

Step 9. Reboot the NAS head.

If a clustered system, boot the quiet head up and perform head recovery to go back to "Normal-Normal" state.

Adding a JBOD to an existing Array

This procedure is required when adding JBODs to an existing array. This procedure is needed only for JBODs that were impacted but it also can be used when adding any additional JBOD to ensure the array is free of this issue. As new JBODs are added, this document should be reviewed to assure the new JBODS do not have any issue mentioned here.

  • Follow the instructions in the Hardware Installation Guide to add an expansion JBOD to an existing array. Perform the procedure to completion.
  • Determine if the array has any issues, as described in the above steps and proceed to recover the array. Otherwise the new expansion JBOD is ready for use.
  • If needed, perform the recovery procedure above to fix the array(s).
  • Installation requirements for arrays that were built with the new manufacturing process. See above for instructions on determining if the SE5310 was built with the new manufacturing process.

For arrays that were built with the new manufacturing process there is a required initial power-up sequence to be performed before the array is ready for use with the NAS head. This initial power-up sequence (listed below) is needed for the controllers to properly establish the array's RAID configuration.

This requirement is only applied to arrays with a controller enclosure and one or more additional expansion enclosures. For arrays with a controller enclosure only and no additional JBOD, this requirement is not applied and the system can be configured using the System Installation Guide.

The initial power-on sequence of the array is as follows:

Perform the H/W installation using the Installation Guide. All cables from NAS head to the array should be connected. Connect the NAS head to the array but do not power on the NAS head until completing step 4 below.

  1. Power up the controller enclosure first. Wait 3 minutes for the controllers to boot up and all drive LEDs stop flashing.
  2. Power up one JBOD wait for the JBOD to come up and all drive LEDs in the JBOD stop flashing. This usually takes no more than one minute. Wait until each JBOD is fully powered up (all LEDs steady) before proceeding with subsequent JBOD power up sequence.
  3. Repeat step 3 until all JBODs are powered up.
  4. Power on the NAS head, both heads if this is a cluster system.

All volumes in the array should now be "Online". The system is now ready to be configured, follow the System Installation Guide to complete the system setup.


Modification History
Date: 21-JUL-2006
  • Updated implementation type from controlled proactive to mandatory (we received ok from legal for the customer letter).


Previously Published As
102453
Internal Comments


Customer Letter can be found at the URL below:




http://sdpsweb.central/FIN_FCO/FIN/FIN102453/Customer_Letter.sxw

Customer List can be found at the URL below:

http://sunwebcollab.central.sun.com/gm/document-1.9.1819517


 



Hardware Remediation Details

 


Related Information
  • URL: http://sdpsweb.central/FIN_FCO/FIN/FIN102453/Customer_Letter.sxw,
    http://pts-storage.west/products/SE5210/index10.html,
    http://sunwebcollab.central.sun.com/gm/document-1.9.1819517

Internal Contributor/submitter
roberta.pokigo@sun.com

Internal Eng Business Unit Group
KE Authors

Internal Eng Responsible Engineer
Dushy.Mahendran@sun.com

Internal Services Knowledge Engineer
sean.hassall@sun.com

Internal Escalation ID
1-17031578 1-16404757

Internal Resolution Patches
119351-06

Internal Kasp FAB Legacy ID
102453

Internal Sun Alert & FAB Admin Info
Critical Category:
Significant Change Date:
Avoidance: Service Procedure
Responsible Manager: null
Original Admin Info: null

Internal SA-FAB Eng Submission
StorEdge 5310 system with pre-built LUNs in the Expansion Unit (EU/JBOD) may not function properly after installation

Product_uuid
63654ce5-f88d-11d8-ab63-080020a9ed93|Sun StorageTek 5310 NAS Appliance

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback