Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1000616.1
Update Date:2011-02-04
Keywords:

Solution Type  Sun Alert Sure

Solution  1000616.1 :   On Sun Fire 3800/4800/4810/6800, V1280, and Netra 1280 Domains, Time of Day (TOD) May Drift or Jump  


Related Items
  • Sun Fire 3800 Server
  •  
  • Sun Fire 6800 Server
  •  
  • Sun Netra 1280 Server
  •  
  • Sun Fire 4800 Server
  •  
  • Sun Fire V1280 Server
  •  
  • Sun Fire 4810 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Sun Alert>Criteria Category>Availability
  •  
  • GCS>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  

PreviouslyPublishedAs
200817


Product
Sun Fire 3800 Server
Sun Fire 4800 Server
Sun Fire 4810 Server
Sun Fire 6800 Server
Sun Fire V1280 Server
Netra 1280 Server

Bug Id
<SUNBUG: 4876369>

Date of Workaround Release
10-SEP-2003

Date of Resolved Release
04-NOV-2003

Impact

On very rare occasions, the Time of Day (TOD) on Sun Fire 3800/4800/4810/6800, V1280, and Netra 1280 domains may be susceptible to a clock drift or jump. As a result, any functionality that relies upon the System Controller (SC) timer may be inaccurate.


Contributing Factors

This issue can occur in the following releases:

SPARC Platform

  • Sun Fire V1280 and Netra 1280 with firmware (ScApp) 5.13.0014 or earlier
  • Sun Fire 3800/4800/4810/6800 with firmware (ScApp) 5.12.x
  • Sun Fire 3800/4800/4810/6800 with firmware (ScApp) 5.13.x
  • Sun Fire 3800/4800/4810/6800 with firmware (ScApp) 5.14.x
  • Sun Fire 3800/4800/4810/6800 with firmware (ScApp) 5.15.0, 5.15.1 and 5.15.2

Note: Systems with firmware 5.11.x are not affected by this issue. Use the "showsc -v" command to display the firmware version of the SC.


Symptoms

This issue may occur after 528 days of SC continuous uptime, where the TOD within a domain in the system may become random and unstable. The intervals reported have varied, but the behavior is generally that the TOD jumps backwards approximately one hour up to as much as one month. The TOD as seen by the SC itself does not jump.

There are no specific messages that would indicate this issue has occurred. It can only be discovered by the domain exhibiting unexpected behavior due to the domain TOD changing unexpectedly.


Workaround

There are three options available that can be applied to avoid this issue:

  1. Setting the variable "tod_broken" to 1 in the domain kernel (see below), or
  2. Reboot the SCs before 528 days of SC continuous uptime (recommended at 500 days), or
  3. Install Patch 112884-04 (ScApp 5.15.3)

To work around the described issue in a running domain, immediate relief can be obtained by setting the variable "tod_broken" to 1 in the domain kernel. This will cause Solaris to ignore the clock data coming from the Serengeti clock driver and use a domain kernel timebase as a reference instead.

The following script can be invoked as "root" on the running domain to change the value of "tod_broken" in that domain's kernel:

    #!/bin/sh
#
# Set tod_broken
#
echo "tod_broken ?W 1" | adb -w -k /dev/ksyms /dev/mem
#
exit 0

Additionally, adding the line "set tod_broken=1" to the domain's "/etc/system" configuration information file will sustain the value of the "tod_broken" variable across a reboot of the domain.

At the next maintenance opportunity, the platform SCs should be rebooted. For systems with firmware 5.13 or later and failover configured, this can be accomplished by rebooting the spare SC first. After it has come up again and failover has become enabled and active, run the "setfailover force" command to make it the main SC, then reboot the other SC. When the other SC completes its reboot, running "setfailover force" again will restore it to the main SC state if desired.

For systems with firmware 5.12 or systems without failover enabled, it will be necessary to bring down any running domains before rebooting the SCs (Sun does not recommend rebooting a main SC with running domains as that action may disrupt domain operation).

Once the platform SCs have been rebooted, the domain TOD jumping will not recur for another 500 days. The "set tod_broken=1" variable can be removed from the "/etc/system" file, and reset to 0 in a running domain kernel by substituting 0 for 1 in the above script.


Resolution

This issue is addressed in the following releases:

  • Sun Fire V1280 and Netra 1280 with firmware (ScApp) 5.13.0015 (as delivered in patch 113751-05 or later)
  • Sun Fire 3800/4800/4810/6800 with firmware (ScApp) 5.15.3 (as delivered in patch 112884-04 or later)

Note: The patch must be added to both system controllers to remedy this issue.



Modification History
Date: 18-NOV-2004
  • Firmware version 5.15.0 added to affected platforms in Contributing Factors

Date: 20-OCT-2004
  • Correction made in "Relief/Workaround" section for statement to read: "adding..."set tod_broken=1" to the domain's "/etc/system" file"

Date: 13-OCT-2004
  • Updated Contributing Factors and Resolution sections by adding Sun Fire V1280 and Netra 1280 to affected platforms; add patch for fix

Date: 04-NOV-2003
  • Update Contributing Factors, Relief/Workaround, Symptoms and Resolution sections
  • Re-release as Resolved


References

<SUNPATCH: 112884-04>
<SUNPATCH: 113751-05>

Previously Published As
101335
Internal Comments



08-Sep-2003



Specifically, there is a work-around patch that will be out in 5.15.3 which will prevent the corrupted time-of-day data from reaching the domains. We're still looking for a root cause to the corrupted time-of-day data. (See hideshi's comments below)


    SC: SSC0
SC date: Mon May 26 13:07:42 JST 2003
JST GMT+9 Japan Standard Time
SC uptime: 530 days 2 hours 16 minutes 9 seconds
ScApp version: 5.12.5
RTOS version: 19


The date was corrected manually with the "ntpdate" command, however the TOD kept jumping back 1 hour at a time, every hour.


    Sat Jun  7 10:02:43 JST 2003
Sat Jun 7 10:03:43 JST 2003
Sat Jun 7 10:04:43 JST 2003
Sat Jun 7 09:05:43 JST 2003
Sat Jun 7 09:06:43 JST 2003
Sat Jun 7 09:07:43 JST 2003


Given that any system that has been up for over 530 days will have to have SCAPP 5.12.5 or older on it, We would not yet have seen this problem on later revs of the FW. That said, there is nothing to suggest that the bug isn't in 5.15 as well.



PDE's comments:



We don't know if this problem affect 5.13-current, as we haven't been able to root-cause it in 5.12, and the later releases have not yet reached the magic uptime for the problem to manifest itself. - Mark



Also, here is some data of potential 530 day impacts:



-----------------------------------------------------------



Table of ScApp releases, sorted by the date they hit sunsolve, and their corresponding 528 date:



Version Date Comments Date + 528 days



------ --------- -------- ---------------



5.12.5 2001-10-26 2003-04-07



5.12.6 2002-02-01 2003-07-14



5.13.0 2002-05-05? T-Patch T112494-01 2003-10-15



5.13.0 2002-06-12 2003-11-22



5.13.1 2002-06-20 2003-11-30



5.13.2 2002-07-01 2003-12-11



5.12.7 2002-08-02 2004-01-12



5.13.3 2002-09-16 2004-02-26



5.13.3 2002-10-06 Update to RTOS 26 only 2004-03-17



5.13.4 2002-11-01 2004-04-12



5.14.0 2002-11-11 2004-04-22



5.14.1 2002-12-09 2004-05-20



5.14.2 2002-12-23 2004-06-03



5.14.3 2003-01-29 2004-07-10



5.13.5 2003-02-10 2004-07-22



5.14.4 2003-02-13 2004-07-25



5.15.0 2003-04-25 2004-10-04



5.14.5 2003-05-16 2004-10-25



5.15.1 2003-06-18 2004-11-27



5.15.6 2003-07-30 2005-01-08



5.15.2 2003-08-15 2005-01-24



Date of first reported TOD jump:



2003-05-25 jumped back to 2003-04-27



If up 528 days, previous reboot was 2001-12-13.



If the pattern holds, we should only see the problem in systems running 5.12.5 or 5.12.6 until October 15th, then the first 5.13 systems should start seeing the problem, unless the bug was fixed in 5.13.0.



If we have seen the problem in 5.12.7, then the 528 day period until the problem appears has to be recalculated downward.



-Mark






04-Nov-2003 - About the root cause:



When a Serengeti System Controller has logged an uptime of approximately 528 days, the time of day as reported by any Solaris domains managed by the system controller will exhibit erratic behavior, most commonly the time will jump back one hour, every hour. The root cause of this problem is a bug discovered in the java.util.Calendar class, which was fixed some time after the java runtime library snapshot was taken for Serengeti.



The solution is to update the java class libraries that are used by the System Controller to include the fixed version of java.util.Calendar. This will happen in the next release of each supported version of ScApp, namely 5.16.0, 5.15.3, and 5.14.7. ScApp 5.15.3 was already released officially on Oct.15th 2003 as Patch# 112884-04.



hideshi


Internal Contributor/submitter
Saaid.Magidi@Sun.Com

Internal Eng Business Unit Group
SSG ES (Enterprise Systems)

Internal Eng Responsible Engineer
Mark.a.Matthews@Sun.Com

Internal Services Knowledge Engineer
david.mariotto@sun.com

Internal Escalation ID
546489

Internal Resolution Patches
112884-04, 113751-05

Internal Sun Alert Kasp Legacy ID
101335, 56680 (Sun Alert)

Internal Sun Alert & FAB Admin Info
Critical Category: Availability ==> Pervasive
Significant Change Date: 2003-09-10, 2003-11-04
Avoidance: Patch, Workaround
Responsible Manager: Ken.Yan@Sun.Com
Original Admin Info: This document has been imported from KMS Creator and may need adjustment before re-publishing.

This imported document has been reviewed/adjusted by:
Review Name:
Review Date:

Original KMS Creator attributes below:

--- PLEASE DO NOT MAKE ANY CHANGES BELOW THIS LINE! ---

Sun Alert ID: 56680
Synopsis: On Sun Fire 3800/4800/4810/6800, V1280, and Netra 1280 Domains, Time of Day (TOD) May Drift or Jump
Category: Availability
Product: Sun Fire Servers 3800/4800/4810/6800, V1280, Netra 1280
BugIDs: 4876369
Avoidance: Patch, Workaround
State: Resolved
Date Released: 10-Sep-2003, 04-Nov-2003
Date Closed: 04-Nov-2003
Date Modified: 04-Nov-2003, 13-Oct-2004, 20-Oct-2004, 18-Nov-2004
Escalation IDs: 546489
Pending Patches:
Resolution Patches: 112884-04, 113751-05
FIN:
FCO:
Date Submitted: 08-Sep-2003
Submitter: Saaid.Magidi@Sun.Com
Responsible Engineer: Mark.a.Matthews@Sun.Com
Responsible Manager: Ken.Yan@Sun.Com
CTE group: CPRE ESP EMEA
Responsible Writer: david.mariotto@sun.com
Distribution: Public SunSolve

Workflow History:

WF State: Issued, 18-Nov-2004, David Mariotto
WF Note: version addition to Contributing Factors

WF State: Issued, 13-Oct-2004, David Mariotto
WF Note: added V1280/Netra 1280 to CF and Res. w/patch

WF State: Issued, 04-Nov-2003, David Mariotto
WF Note: Complet all Updates (patch, resolved) and re-release

WF State: Issued, 20-Oct-2003, David Mariotto
WF Note: Recently issued patch only resolves issue for one
firmware revision.

WF State: Issued, 10-Sep-2003, David Mariotto
WF Note: Signoff by PTS, (Saaid), sent for release

WF State: Draft, 10-Sep-2003, David Mariotto
WF Note: Review completed, sent for signoff

WF State: Draft, 09-Sep-2003, David Mariotto
WF Note: Sending for review

WF State: Draft, 09-Sep-2003, David Mariotto
WF Note: waiting on minor clarification from submitter

WF State: Draft, 08-Sep-2003, David Mariotto
WF Note: Article created.

Exported from KMS Creator Sat May 21 09:03:37 2005 GMT, olaf.reineke@sun.com
Internal SA-FAB Eng Submission
On Sun Fire 3800/4800/4810/6800, V1280, and Netra 1280 Domains, Time of Day (TOD) May Drift or Jump

Product_uuid
29d05214-0a18-11d6-92b2-a111614865b5|Sun Fire 3800 Server
29d3a694-0a18-11d6-92da-df959df44cdd|Sun Fire 4800 Server
29d6f808-0a18-11d6-8aa8-943929fbbdd8|Sun Fire 4810 Server
29da7938-0a18-11d6-8a41-9ed1ad6d6779|Sun Fire 6800 Server
6a74b2f9-bbd8-4b2c-870d-b6b73d6e224f|Sun Fire V1280 Server
e41a7084-3dbf-472d-918b-efb50dcbc220|Netra 1280 Server

References

SUNPATCH:112884-04
SUNPATCH:113751-05

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback