Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-72-1312148.1
Update Date:2011-06-08
Keywords:

Solution Type  Problem Resolution Sure

Solution  1312148.1 :   Troubleshooting 25xx Storage[TM] Array Battery Failures During Learn Cycle  


Related Items
  • Sun Storage 2540 Array
  •  
  • Sun Storage 2510 Array
  •  
  • Sun Storage 2540-M2 Array
  •  
  • Sun Storage 2530 Array
  •  
  • Sun Storage 2530-M2 Array
  •  
Related Categories
  • GCS>Sun Microsystems>Storage - Disk>Modular Disk - 2xxx Arrays
  •  




In this Document
  Symptoms
  Cause
  Solution


Applies to:

Sun Storage 2530 Array - Version: Not Applicable and later   [Release: N/A and later ]
Sun Storage 2540 Array - Version: Not Applicable and later    [Release: N/A and later]
Sun Storage 2510 Array - Version: Not Applicable and later    [Release: N/A and later]
Sun Storage 2530-M2 Array - Version: Not Applicable and later    [Release: N/A and later]
Sun Storage 2540-M2 Array - Version: Not Applicable and later    [Release: N/A and later]
Information in this document applies to any platform.

Symptoms

For 25xx arrays running at Controller Firmware Revision 7.35.xx.xx
and the following Learn Cycle Notifications and Battery Cache failures:

Notifications:

A:Fri Oct 15 16:46:42 MEST 2010 : 639 : 0/0/0 : 7310 : Notification : Battery :
tray85 : Learn Cycle Started
B:Fri Oct 15 16:46:20 MEST 2010 : 640 : 0/0/0 : 7310 : Notification : Battery :
tray85 : Learn Cycle Started
...
B:Fri Oct 15 23:17:37 MEST 2010 : 647 : 0/0/0 : 730E : Notification : Battery :
tray85 : Battery capacity is sufficient

Followed by the following Critical Faults:

Reference:  <Document: 1021057.1> Verify Sun StorageTek[TM] 2500 and Sun Storage[TM] 6000 Critical Faults via the User Interface

  • Common Array Manager alarm code xx.66.1006:  A cache backup battery has failed
  • Common Array Manager alarm code xx.66.1040: A controller cache backup battery has failed

Internal note for the Oracle support engineers: If encounter the bug 6987616 mentioned in this document, please use Bugster to attach your customer SR to this bug. If you do not have an account for Bugster, please send an email to nicolas.wipfli@oracle.com with the following information:

- The SR number.
- The location of the supportdata.

Cause

Defect 6987616 in the controller firmware has caused SMART battery failures to occur during the learn cycle.
This is only observed in 2500 arrays running 7.35.xx.xx firmware.  This is caused by i2c bus faults that are falsely
raised by the controller firmware.


Again, this only affects the 2510, 2530, and 2540 array models.  No other models, including 2540-M2 and 2530-M2, are impacted by this fault.  No other firmware releases are impacted by this fault.


Solution


Each battery should be troubleshot individually for this fault. 2500 systems can have one or two batteries depending on the controller configuration.

1.  Verify that you have a critical fault of Battery Failure(xx.66.1006).

Reference:  <Document: 1021057.1> Verify Sun StorageTek[TM] 2500 and Sun Storage[TM] 6000 Critical Faults via the User Interface

  • Common Array Manager alarm code xx.66.1006:  A cache backup battery has failed
  • Common Array Manager alarm code xx.66.1040: A controller cache backup battery has failed

  • If there are no critical faults for Battery Failure, then you may have a different issue, reference <Document: 1021054.1> Troubleshooting Sun Storage Array Battery Faults
  • If there is a Battery Failure fault as shown above, continue to Step 2.

2.   Verify array firmware.

Reference <> Verify Storage[TM] Array Firmware via the User Interface
  • If the firmware is 06.xx.xx.xx, then you may have a different issue, reference <Document: 1021054.1> Troubleshooting Sun Storage Array Battery Faults
  • If the firmware is 07.35.xx.xx, then continue to Step 3.

3.  Verify the events sequence for 7310, followed by 730E, followed by 210C.

Sun StorageTek Common Array Manager:

Browser:

   1. Expand Storage Arrays in the left menu pane.
   2. Expand your storage array name in the left menu pane.
   3. Expand Troubleshooting in the left menu pane.
   4. Click on Events.
   5. In the right pane, click on the -|-> icon.  If you mouse over it, it will state Advanced Filter.
   6. Set Event to Log Events.
   7. Set Event Type to Component.
   8. Set Read the last X Kbytes From Log File to 100.
   9. Set String Filter to Battery.
  10. Click on the Details of any event that is shown.
  11. Review the Description Field.
  12. Get the value of the array log event ID from the description.


Example:

Description : Apr 08 21:31:31 2530-array Tray.99.Controller.A.Battery: [ID 0x7310] NOTICE: Learn Cycle Started
Description Apr 08 21:31:31 2530-array Tray.99.Controller.A.Battery: [ID 0x730E] Battery capacity is sufficient
Description Apr 08 21:31:31 2530-array Tray.99.Controller.A.Battery: [ID 0x210C] Controller cache battery failed


Note:  The filter in Step 9 is case sensitive.


SSCS CLI:

Get the list of events:

sscs list -d <array_name> -t LogEvent -f Battery event

Get the event details:

sscs list -d array_name event event_id

Note:  The -f option is case sensitive.



Get the value of the array log event ID from the description:

Example:

Description : Apr 08 21:31:31 2530-array Tray.99.Controller.A.Battery: [ID 0x7310] NOTICE: Learn Cycle Started
Description Apr 08 21:31:31 2530-array Tray.99.Controller.A.Battery: [ID 0x730E] Battery capacity is sufficient
Description Apr 08 21:31:31 2530-array Tray.99.Controller.A.Battery: [ID 0x210C] Controller cache battery failed


  • If event 0x210C follows 0x730E, continue to Step 4.
  • If event 0x210C does not follow a 0x730E event, contact Oracle to have the battery replaced.

4.  Reset the controller for the failed battery.

Based on the information supplied, defect 6987616, in the controller firmware is indicated.
This is caused by i2c bus faults that are falsely raised by the controller firmware.
In order to clear the i2c bus faults the controller needs to be reset.

Browser

select Physical Devices ->select Controllers, then
select Reset Controller
for the controller reporting the Cache Battery Failure.

SSCS

sscs reset -a array_name controller [A or B]

For example, based on the following alarm, reset Controller A:


Alarm ID   : alarm9
Description: A cache backup battery has failed Tray.85.Battery.A
Severity   : Critical
Element    : t85bat1
GridCode   : 70.66.1006
Date       : 2010-08-23 22:54:25




NOTE:  For SIMPLEX arrays (single controller), an outage is required, as the data path will be unavailable during the reset.


  • If the battery failure clears after resetting the controller, no further work is required.
  • If the battery failure does not clear, contact Oracle to have the battery replaced.



Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback