Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Sun Alert Sure Solution 1128605.1 : Firmware for RAID Controllers Causes Unscheduled Simultaneous Reboot of Controllers After 828.5 Days of Continuous Operation
In this Document
Applies to:Sun Storage 2530 Array - Version: Not ApplicableSun Storage 2510 Array - Version: Not Applicable and later [Release: NA and later] Sun Storage 2540 Array - Version: Not Applicable and later [Release: NA and later] Sun Storage 6140 Array - Version: Not Applicable and later [Release: NA and later] Sun Storage 6180 Array - Version: Not Applicable and later [Release: NA and later] Sun SPARC Sun OS _______________ SUNBUG 6949589 Date of Resolved Release: 18-Jun-2010 DescriptionA known issue with vxWorks RAID controller firmware for Sun StorageTek arrays (as listed in Section 2) may cause drives associated with host/IO volumes to experience write failures when the controllers reboot. This issue can occur after approximately 828.5 days of uptime, when vxWorks (by default) is scheduled for a simultaneous auto-reboot of the controllers.Likelihood of OccurrenceThis issue can occur on the following platforms:
This issue is not restricted to the above arrays, as this firmware may also be used with other arrays, servers, or switches. To determine the version of firmware on the controller, please view the Common Array Manager (CAM) Storage System Summary page of the CAM host managing the array. There is a timer in vxWorks (vxAbsTicks) that is a double word long 0x00000000 (a 32 bit number). cfgMonitorTask monitors this offset to avoid drive failure during IO to the disk, and reboots the controller once the vxAbsTicks reaches 0xff000000. When this timer rolls over from 0xffffffff to 0x00000000 (approximately 828.5 days) there is a possibility that if host I/O volumes exist, the associated drives will be failed with a write failure. Possible SymptomsRAID arrays using software mirroring to mirror data between the two arrays perform an unscheduled simultaneous reboot at nearly the same time (approximately 828.5 days uptime), causing a write failure.Workaround or ResolutionTo avoid the (unscheduled) controller auto-reboot, alternately reboot the controllers anytime between 1 day uptime and 800 days uptime to restart the counter prior to vxAbsTicks rollover. With a proper failover environment, there should be no interruption of service.Even with a workaround of rebooting each controller prior to the vxAbsTicks rollover, the issue will still be experienced by arrays with 6.x firmware revision. This issue is resolved in Raidcore 2 firmware upgrade 7.35.10.10 or later (Sun StorageTek 25xx Series Firmware Upgrade Utility), which will change the reboot schedule to different times for each controller without the need for a manual reboot. (Reboot date/time can still be scheduled manually). This firmware can be found at (requires login): https://cds.sun.com/is-bin/INTERSHOP.enfinity/WFS/CDS-CDS_SMI-Site/en_US/-/USD/ViewProductDetail-Start?ProductRef=STK25SFU-07.35-OTH-G-F@CDS-CDS_SMI&ProductUUID=ly9IBe.pyuMAAAEdKD5T8cHi&ProductID=ly9IBe.pyuMAAAEdKD5T8cHi&Origin=ViewProductDetail-Start You cannot upgrade directly from 6.x firmware to 7.35.44.10. You must first upgrade to the firmware bundled in the upgrade utility or in CAM 6.2. (required to support this firmware). Also see: "Sun StorageTek 25xx Series Array Firmware Upgrade Guide" at http://docs.sun.com/app/docs/doc/820-6362-12 The Sun Storage 2510 Firmware Matrix can be found in <Document:1021780.1> in MyOracleSupport: Patches@ vxWorks Detail@ What is it? - There is a timer in the firmware, specifically in vxWorks, called vxAbsTicksthat is only a double word long 0x0000 0000. When this timer rolls over from 0xffff ffff to 0x0000 0000 (approximately 828.5 days) there is the possibility that if there is host I/O to volumes, the associated drives will be failed with a write failure. Thiswas discovered in 2003, CR# 68447 was opened against the issue. The CR# put a function in the controller firmware called 'cfgMonitorTask' that will reboot the controllerif the vxAbsTicks value is within 12 days of 828 days. This has been in the firmware from 03.xx up to 06.60 firmware. You can monitor this using the following shell command: % vxAbsTicks vxAbsTicks = 0x2e5540: value = 227183 = 0x3776f What Happened? - When the conversion from RC1 to RC2 was completed The functionality in cfgMonitorTask was not ported into 07.xx CFW. Therefore, this reintroduced the ungraceful vxAbsTicks timer rollover at approximately 828.5days with the possibility that if there is host I/O to volumes, the associated drives will be failed with a write failure. Where was it fixed? - CR 138248 was added to RC2 trunk prior to Emerald/Exmoor and is in all subsequent releases which adds the proactive reboot of the controllers prior to the ungraceful vxAbsTicks timer rollover. Modification History18-Jun-2010: Document created, issue is Resolved01-Jun-2010: Updated for minor formatting issues ReferencesPlease send technical questions to the following email:sunalert-tech-questions@sun.com Responsible Engineer: rich.floyd@oracle.com Attachments This solution has no attachment |
||||||||||||
|