Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-73-1020827.1
Update Date:2010-08-27

Solution Type  FAB (standard) Sure

Solution  1020827.1 :   Intermittent Sun Fire X4500 system hangs with watchdog timeouts.  

Related Items
  • Sun Fire X4500 Server
Related Categories
  • GCS>Sun Microsystems>Sun FAB>Standard>Reactive


Bug Id
<SUNBUG: 6746949>

Sun Fire X4500 Server

Date of Resolved Release

Intermittent Sun Fire x4500 system hangs with watchdog timeouts (see details below).

Affected Parts:

371-0856-xx   2.6GHz Dual Core CPU, AMD Opteron 285 E6 Stepping (95Watt), RoHS:Y
371-1779-xx   2.8GHz Dual Core CPU, AMD Opteron 290 E6 Stepping, RoHS:YL


System hangs have been observed in certain workloads and 2P configurations with AMD Opteron processors (Rev E) from the 0Fh revision E6.

Contributing Factors

Sun Fire x4500 systems containing either of above listed Affected Parts and running Solaris 10 U4-U7 (ZFS) are impacted by this issue.


The expected behavior is a system hard hang requiring a power cycle to reset.  Running HDT cannot break into the CPUs for analysis.  At times the system would becomes sluggish, responding to a few commands before the hard hang would occur.

The SEL log will show nothing, since system has frozen. A sync flood reset cannot happen, and BIOS cannot report anything.

Root Cause

Debug information has shown that a probe message has hung within the CPU.  While a definitive root cause is not known at this time, evidence points to a possible contention between the TLB miss resolution hardware and a probe has caused the system to hang.  This debug information is further backed up by experimental evidence that the hang does not occur when the workaround is applied.

Corrective Action

Upgrade to SW 1.6 (or greater).  If a customer experiences a Watchdog Time Out hang, use the following workaround:
The AMD recommendation is to disable caching of page table data in the L2.  The default BIOS setting enables TLB caching.  Under limited testing, a 3 to 5% performance loss was observed.

  SW 1.6 (x4500) - BIOS 0ABIG024
  BIOS setup option:
  F2 --> Advanced
       --> CPU Configuration
       --> Force TLB Caching disabled = Enabled
The field should escalate the case to TSC if the workaround is not acceptable to the customer.
Identification of Affected Parts (how to):
The Sun Fire x4500 has two CPU FRUs:
  371-0856  2.6GHz Dual Core CPU, AMD Opteron 285 E6 Stepping (95Watt), RoHS:Y
  371-1779  2.8GHz Dual Core CPU, AMD Opteron 290 E6 Stepping, RoHS:YL
Both of these CPUs are REV E and E6 Stepping.


This issue was evaluated by the Sun Alert PMO and found not to meet criteria.

 Escalation ID: 1-24213193, 1-24512721
 Resolution Patches: SW 1.6
 Related URL(s):

For information about FAB documents, its release processes, implementation strategies and billing information, go to the following URL:

For Sun Authorized Service Providers go to:

In addition to the above you may email:

Internal Contributor/submitter

Internal Eng Responsible Engineer

Internal Services Knowledge Engineer

Internal Eng Business Unit Group
SSG WGS (Workgroup Systems)

Internal Sun Alert & FAB Admin Info
07-Aug-2009: Completed draft and sent to Extended Review.
12-Aug-2009: No feedback from Ext Rvw - sending to Publish.
19-Nov-2009: Corrected Product Name to swoRDFish inconsistency.

This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.