Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1020990.1
Update Date:2011-03-21
Keywords:

Solution Type  Sun Alert Sure

Solution  1020990.1 :   BIOS Versions Prior to 3.0.2 May Cause System Hangs on Sun Fire x4150/X4250/x4450 Systems  


Related Items
  • Sun Fire X4150 Server
  •  
  • Sun Fire X4450 Server
  •  
  • Sun Fire X4250 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Sun Alert>Criteria Category>Availability
  •  
  • GCS>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  

PreviouslyPublishedAs
268668


Bug Id
<SUNBUG: 6871221>, <SUNBUG: 6873737>

Date of Resolved Release
02-Oct-2009

Sun Fire x4150/X4250/x4450 systems may hang as a result of correctable ECC memory errors:

1. Impact

Sun Fire X4150/X4250/X4450 systems with BIOS versions 3.0.1 or earlier  may hang as a result of correctable ECC memory errors not being handled properly.

2. Contributing Factors

This issue can occur on the following platforms:
  • Sun Fire X4150/X4250/X4450 system with BIOS versions 3.0.1 or earlier
Note 1: Sun Fire X4150 and X4250 servers with BIOS versions 1ADQW061 or earlier and X4450 servers with BIOS versions 3B61 or earlier have an issue where the SMI (System Management Interrupt) handler will never exit when it tries to handle a patrol scrub detected correctable ECC memory error. When this happens, the system will lockup with no ILOM SEL entry indicating the problem. This bug does not affect all operating systems due to the different ways they can handle a patrol scrub detected correctable memory error. VMWare 3.5, 4.0 and RHEL 5.3 are known to encounter this hang condition because they will pass patrol scrub correctable errors on to the BIOS.

Note 2: Correctable errors can occur even in healthy systems. The likelihood of a system hang due to this bug is based on if an error occurs, when it occurs, how it is detected, and the operating system running.
3. Symptoms

If the described issue occurs, the system will lock up/hang with no ILOM SEL entry indicating a problem. Access to the ILOM is not affected.

4. Workaround

There is no workaround for this issue.  Please see the Resolution section below.

5. Resolution

This issue is addressed on the following platforms:
  • Sun Fire X4150/X4250/X4450 systems with BIOS revision 3.0.2 or later
It is recommended to update affected systems with the latest BIOS versions located at:

For Sun Fire X4150:
For Sun Fire X4250:
For Sun Fire X4450:
Note: The above releases contain BIOS 1ADQW062 for the Sun Fire X4150/X4250 and BIOS 3B62 for the X4450

This Sun Alert notification is being provided to you on an "AS IS" basis. This Sun Alert notification may contain information provided by third parties. The issues described in this Sun Alert notification may or may not impact your system(s). Sun makes no representations, warranties, or guarantees as to the information contained herein. ANY AND ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT, ARE HEREBY DISCLAIMED. BY ACCESSING THIS DOCUMENT YOU ACKNOWLEDGE THAT SUN SHALL IN NO EVENT BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES THAT ARISE OUT OF YOUR USE OR FAILURE TO USE THE INFORMATION CONTAINED HEREIN. This Sun Alert notification contains Sun proprietary and confidential information. It is being provided to you pursuant to the provisions of your agreement to purchase services from Sun, or, if you do not have such an agreement, the Sun.com Terms of Use. This Sun Alert notification may only be used for the purposes contemplated by these agreements.


Copyright 2000-2009 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved.


Product
Sun Fire X4150 Server
Sun Fire X4250 Server
Sun Fire X4450 Server

Internal Comments
Additional Information:

There are 2 other known issues that are being fixed in the next (3.1.0) software release:

Issue 1:
Incorrect error messaging
If a correctable ECC memory error is detected by the CPU, you will see this SEL entry as usual:

|67| IPMI | Log | minor | Fri Sep 4 17:04:57 2009 | ID = 1d : 09/04/2009 : 17:04:57 : Memory : BIOS : Correctable ECC; Channel: D, DIMM: 5 |

If the background scrubber detects the correctable ECC memory error, the SEL entry will look like this:

|118| IPMI | Log | *critical*| Tue Sep 8 18:00:47 200 | ID = 3f : 09/08/2009 : 18:00:47 : Memory : BIOS : Memory Scrub Failed; Channel: D, DIMM: 5

This incorrectly indicates the error as critical. A scrubber correctable ECC memory error is not a critical error despite the SEL entry. This will be fixed in the next software release and both types will be reported as a minor correctable ECC.

Issue 2:
Dimms being falsely mapped out during POST due to correctable ECC memory errors.
POST should not map out a DIMM due to detecting a correctable ECC memory error. If during POST a DIMM is mapped out, the system should be rebooted to determine if the mapped out DIMM is due to a correctable ECC memory error at which point three things could happen:
  1. The Dimm error goes away indicating the issue was due to a correctable ECC memory error at which point everything is fine.
  2. If the same DIMM maps out there is likely a bad Dimm DIMM and the DIMM pair should be replaced.
  3. If a different dimm maps out you should continue to reboot the system
    until the error goes away or you see a persistent DIMM mapping out and
    that Dimm pair should be replaced.
Please send technical questions to the following email:
 sunalert-tech-questions@sun.com
and CC the following persons:
 Internal Contributor/Submitter
 Internal Eng Responsible Engineer
 Internal Services Knowledge Engineer

Internal Contributor/submitter
Jake.Bell@Sun.COM

Internal Eng Responsible Engineer
leigh.chen@sun.com

Internal Services Knowledge Engineer
jeff.folla@sun.com

Internal Eng Business Unit Group
SVS (SPARC Volume Systems, Horizontal Systems (includes T2000/Ontario), NWS (Network Storage), Systems Group-x64 (X4100-X4600 (includes M2), V20z/V40z/V60z/V65z, Ultra20/40)

Internal Sun Alert & FAB Admin Info
WF 02-Sep-2009, jfolla: sent for release
WF 30-Sep-2009, jfolla: sent for review
WF 29-Sep-2009, jfolla: sent to submitter with questions
WF 29-Sep-2009, jfolla: created


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback