Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1005169.1
Update Date:2009-06-16
Keywords:

Solution Type  Technical Instruction Sure

Solution  1005169.1 :   How to decode System Event Log (SEL) entries for Memory Uncorrectable Errors in Sun Blade[TM] X8400, X8420, X8440 Server Modules  


Related Items
  • Sun Blade X8420 Server Module
  •  
  • Sun Blade X8400 Server Module
  •  
  • Sun Blade X8440 Server Module
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>Blade Servers
  •  

PreviouslyPublishedAs
207257


Description
Document description

Symptoms:

  • sync flood
  • reboot
  • crash
  • "blue screen of death"
  • purple screen

Purpose/Scope:

When a Memory Uncorrectable ECC Error occurs on Sun Blade[TM] X8400 server modules, a Hypertransport sync flood occurs, the system is rebooted and BIOS fault diagnosis will fault the DIMM pair and put an entry in the System Event Log (SEL).


The way the faulty DIMM pair is identified in SEL entries is slightly different from other ways the DIMMs are represented in other places.

Note that the faulted DIMMs can be identified in other places. They should all point to the same DIMM pair, which is the customer/field replaceable unit (CRU/FRU).



Steps to Follow
Steps to follow

A SEL entry for the Uncorrectable ECC Error may appear like this as viewed by ipmitool command e.g.:

/usr/sfw/bin/ipmitool -I lanplus -H <IP address or hostname of Blade SP> -U root sel elist

>>> 1807 | 06/24/2007 | 07:40:26 | Memory | Uncorrectable ECC | Asserted | CPU 1 DIMM 0

The label "DIMM 0" refers to a DIMM Pair, and not a single DIMM.

This may be different from how the entry appears when viewing the SEL in BIOS or when viewed from the blade SP or Chassis Monitoring Module (CMM) unified log the DIMMs will be identified as d#/d# for pairs that correspond to the physical labeling of the slots on the board.

On the blade board, using the Fault Remind button will light up the LEDs in the DIMM sockets for the correct pair. The CRU/FRU is the pair of DIMMs to ensure matched size, type and vendor DIMMs.

To translate the pair numbering to the slot numbering, use these tables for the blade model you have:

X8400

DIMM Pair #

DIMM physical slot labeling

DIMM slot colour

DIMM slot location

0

DIMM 0 & DIMM 1

white

furthest from CPU socket

1

DIMM 2 & DIMM 3

black

closest to CPU socket

X8420

DIMM Pair #

DIMM physical slot labeling

DIMM slot colour

DIMM slot location

0

DIMM 0 & DIMM 1

black

closest to CPU socket

1

DIMM 2 & DIMM 3

white

furthest from CPU socket

X8440

DIMM Pair #

DIMM physical slot labeling

DIMM slot colour

DIMM slot location

0

D0 & D1

black

closest to CPU socket

1

D2 & D3

white

middle pair

2

D4 & D5

black

middle pair

3

D6 & D7

white

furthest from CPU socket

Note: When adding or upgrading memory, always populate memory in empty DIMM slots starting with the white pairs furthest from the CPU socket.

Note: On X8440 blade, problems can occur with some OS' that cannot handle having CPU0 with no dimm's.



Product
Sun Blade X8400 Server Module
Sun Blade 8000
Sun Blade X8420 Server Module
Sun Blade X8440 Server Module

Internal Comments
On some revisions of ILOM firmware, the fault remind button may also light the CPU socket fault LED as well as the DIMM LEDs. If the logged error is related to memory, then only replace the memory DIMMs, not the CPU as well.

This document contains normalized content and is managed by the the Domain Lead
(s) of the respective domains. To notify content owners of a knowledge gap
contained in this document, and/or prior to updating this document, please
contact the domain engineers that are managing this document via the “Document
Feedback” alias(es) listed below:

Domain Lead: Dencho.Kojucharov@sun.com
Feedback Alias: blade_normalizers@sun.com

normalized, blade, 8000, uncorrectable, memory, ecc, error, UE, dimm, x8400, x8420, SEL, x8440
Previously Published As
86563

Change History
Date: 2008-10-20
User Name: 79977
Action: Updated
Comment: Added normalization keywords and wrapper

Version: 7

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback