Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type FAB (standard) Sure Solution 1000095.1 : X4100 and X4200 May Encounter Unscheduled System Reboots Due to Double-Bit Uncorrectable Memory Errors
PreviouslyPublishedAs 200113 Product Sun Fire X4100 Server Sun Fire X4200 Server Bug Id <SUNBUG: 6364001> Part
Impact A small proportion of X4100 and X4200 systems have been experiencing unscheduled reboots. Contributing Factors The reboot could happen anytime there is heavy traffic between the CPU and DIMMs. Symptoms The BIOS Event Log (DMI) will show "Sync flood error" just prior to the reboot. The System event log (SEL) of the ilom if interrogated with ipmitool (available on Resource cd) will show messages similar to these: e00 | 03/21/2006 | 04:58:39 | OEM #0xfb | f00 | 03/21/2006 | 04:58:49 | Memory | Memory Device Disabled | CPU 0 DIMM 0 1000 | 03/21/2006 | 04:58:55 | System Firmware Progress | Motherboard initialization
Root Cause DDR1 memory on these platforms may have an issue dealing with going in or out of the PowerDown mode and trigger uncorrectable ECC errors that cause system reboots. BIOS 034 and earlier enables the PowerDown mode (self-refresh/low-power mode) on the DIMMs with the wrong topology setting for these systems. Workaround
Resolution Upgrade to BIOS 036 or later. Statistically, BIOS 036 reduces the probability of an unscheduled reboot with certain registered DIMMs and increases stability. BIOS 036 will disable the PowerDown mode per AMD's recommendation. BIOS 036 can be obtained via the following website: http://www.sun.com/servers/entry/x4100/downloads.jsp Note: Some corner-case registered DIMMs with poor noise immunity coupled with corner-case noisy motherboards may not be fixed with BIOS 036. If the issue continues, the CFE or field representative should have their case escalated and the engineer assigned should refer to the TSC VSP - X4100/X4200 website (listed below) for further remediation actions. http://systems-tsc.uk/twiki/bin/view/Products/ProdIssuesSunFireX4100 Previously Published As 102619 Internal Comments It is recommended that if a field engineer is doing a motherboard replacement or other FRU replacement, BIOS 036 or later should be loaded. Upgrading to BIOS 036 or later should be the first step in resolving memory related issues. Customers should be advised to upgrade their LSI firmware/MPT BIOS firmware if moving to BIOS 036. Sun supplied vendor DIMMs meet all of the JEDEC specs and are not faulty in their own right. Refer to Product Notes 1.2.1 (819-1162-21) and Release Note Supplement 1.2.1 (819-4344-10) for further information. Hardware Remediation Details
Related Information
Internal Contributor/submitter brian.jones@sun.com Internal Eng Business Unit Group NSG (Network Systems Group) Internal Eng Responsible Engineer derek.tsai@sun.com Internal Services Knowledge Engineer sean.hassall@sun.com Internal Escalation ID 1-14109922, 1-13950402, 1-15145059, 1-15344836, 1-15612351, 1-15844911, 1-16558422, 1-17641524 Internal Kasp FAB Legacy ID 102619 Internal Sun Alert & FAB Admin Info Critical Category: Significant Change Date: Avoidance: Upgrade Responsible Manager: beth.s.beasley@sun.com Original Admin Info: null Product_uuid 54e2ac49-df71-11d9-89e6-080020a9ed93|Sun Fire X4100 Server c6e795ef-df6f-11d9-89e6-080020a9ed93|Sun Fire X4200 Server Attachments This solution has no attachment |
||||||||||||
|