Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1128605.1
Update Date:2010-12-08
Keywords:

Solution Type  Sun Alert Sure

Solution  1128605.1 :   Firmware for RAID Controllers Causes Unscheduled Simultaneous Reboot of Controllers After 828.5 Days of Continuous Operation  


Related Items
  • Sun Storage 6780 Array
  •  
  • Sun Storage Flexline 380 Array
  •  
  • Sun Storage 6540 Array
  •  
  • Sun Storage 6580 Array
  •  
  • Sun Storage 2530 Array
  •  
  • Sun Storage 6180 Array
  •  
  • Sun Storage 6140 Array
  •  
  • Sun Storage 2510 Array
  •  
  • Sun Storage 2540 Array
  •  
Related Categories
  • GCS>Sun Microsystems>Sun Alert>Criteria Category>Availability
  •  
  • GCS>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  




In this Document
  Description
  Likelihood of Occurrence
  Possible Symptoms
  Workaround or Resolution
  Patches
  Modification History
  References


Applies to:

Sun Storage 2530 Array - Version: Not Applicable and later   [Release: NA and later ]
Sun Storage 2510 Array - Version: Not Applicable and later    [Release: NA and later]
Sun Storage 2540 Array - Version: Not Applicable and later    [Release: NA and later]
Sun Storage 6140 Array - Version: Not Applicable and later    [Release: NA and later]
Sun Storage 6180 Array - Version: Not Applicable and later    [Release: NA and later]
Sun SPARC Sun OS
_______________
SUNBUG 6949589

Date of Resolved Release: 18-Jun-2010

Description

A known issue with vxWorks RAID controller firmware for Sun StorageTek arrays (as listed in Section 2) may cause drives associated with host/IO volumes to experience write failures when the controllers reboot. This issue can occur after approximately 828.5 days of uptime, when vxWorks (by default) is scheduled for a simultaneous auto-reboot of the controllers.

Likelihood of Occurrence

This issue can occur on the following platforms:
  • Sun StorageTek 2510 Array
  • Sun StorageTek 2530 Array
  • Sun StorageTek 2540 Array
  • Sun StorageTek 6140 Array
  • Sun StorageTek 6180 Array
  • Sun StorageTek 6540 Array
  • Sun StorageTek 6580 Array
  • Sun StorageTek 6780 Array
  • Sun StorageTek Flexline 380 Array
running vxWorks Controller Firmware versions 6.70.xx or earlier.

This issue is not restricted to the above arrays, as this firmware may also be used with other arrays, servers, or switches.

To determine the version of firmware on the controller, please view the Common Array Manager (CAM) Storage System Summary page of the CAM host managing the array.

There is a timer in vxWorks (vxAbsTicks) that is a double word long 0x00000000 (a 32 bit number). cfgMonitorTask monitors this offset to avoid drive failure during IO to the disk, and reboots the controller once the vxAbsTicks reaches 0xff000000. When this timer rolls over from 0xffffffff to 0x00000000 (approximately 828.5 days) there is a possibility that if host I/O volumes exist, the associated drives will be failed with a write failure.

Possible Symptoms

RAID arrays using software mirroring to mirror data between the two arrays perform an unscheduled simultaneous reboot at nearly the same time (approximately 828.5 days uptime), causing a write failure.

Workaround or Resolution

To avoid the (unscheduled) controller auto-reboot, alternately reboot the controllers anytime between 1 day uptime and 800 days uptime to restart the counter prior to vxAbsTicks rollover. With a proper failover environment, there should be no interruption of service.

Even with a workaround of rebooting each controller prior to the vxAbsTicks rollover, the issue will still be experienced by arrays with 6.x firmware revision.

This issue is resolved in Raidcore 2 firmware upgrade 7.35.10.10 or later (Sun StorageTek 25xx Series Firmware Upgrade Utility), which will change the reboot schedule to different times for each controller without the need for a manual reboot. (Reboot date/time can still be scheduled manually). This firmware can be found at (requires login):

https://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-CDS_SMI-Site/en_US/-/USD/ViewProductDetail-Start?ProductRef=STK25SFU-07.35-OTH-G-F@CDS-CDS_SMI&ProductUUID=ly9IBe.pyuMAAAEdKD5T8cHi&ProductID=ly9IBe.pyuMAAAEdKD5T8cHi&Origin=ViewProductDetail-Start

You cannot upgrade directly from 6.x firmware to 7.35.44.10. You must first upgrade to the firmware bundled in the upgrade utility or in CAM 6.2. (required to support this firmware).

Also see: "Sun StorageTek 25xx Series Array Firmware Upgrade Guide" at http://docs.sun.com/app/docs/doc/820-6362-12

The Sun Storage 2510 Firmware Matrix can be found in <Document:1021780.1>  in  MyOracleSupport:

Patches

@ vxWorks Detail
@
What is it? - There is a timer in the firmware, specifically
in vxWorks, called vxAbsTicksthat is only a double word long
0x0000 0000.  When this timer rolls over from 0xffff ffff
to 0x0000 0000 (approximately 828.5 days) there is the possibility
that if there is host I/O to volumes, the associated drives will
be failed with a write failure.  Thiswas discovered in 2003,
CR# 68447 was opened against the issue.  The CR# put a function
in the controller firmware called 'cfgMonitorTask' that will
reboot the controllerif the vxAbsTicks value is within 12 days
of 828 days.  This has been in the firmware from 03.xx up to
06.60 firmware.

You can monitor this using the following shell
command:
% vxAbsTicks
vxAbsTicks = 0x2e5540: value = 227183 = 0x3776f

What Happened? - When the conversion from RC1 to RC2 was completed
The functionality in cfgMonitorTask was not ported into 07.xx CFW.
Therefore, this reintroduced the ungraceful vxAbsTicks timer rollover
at approximately 828.5days with the possibility that if there is
host I/O to volumes, the associated drives will be failed with a
write failure.

Where was it fixed? - CR 138248 was added to RC2 trunk prior to
Emerald/Exmoor and is in all subsequent releases which adds the
proactive reboot of the controllers prior to the ungraceful vxAbsTicks
timer rollover.


Modification History

18-Jun-2010: Document created, issue is Resolved
01-Jun-2010: Updated for minor formatting issues

References

Please send technical questions to the following email:
sunalert-tech-questions@sun.com
Responsible Engineer: rich.floyd@oracle.com

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback