Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-73-1303679.1
Update Date:2011-03-21
Keywords:

Solution Type  FAB (standard) Sure

Solution  1303679.1 :   8Gb Dual FC HBA PCIe (Metis) correctable errors on Blade servers.  


Related Items
  • Sun Blade X6275 M2 Server Module
  •  
  • Sun Fire X4800 Server
  •  
  • Sun Blade X6270 M2 Server Module
  •  
  • SPARC T3-1B
  •  
  • Sun Blade X6275 Server Module
  •  
  • SPARC T3-4
  •  
  • Sun Blade X6270 Server Module
  •  
Related Categories
  • GCS>Sun Microsystems>Sun FAB>Standard>Reactive
  •  




In this Document
  Symptoms
  Changes
  Cause
  Solution


Oracle Confidential (PARTNER). Do not distribute to customers
Reason: FABs available to Internals and Partners only

Applies to:

Sun Blade X6275 M2 Server Module - Version: Not Applicable to Not Applicable - Release: N/A to N/A
Sun Blade X6270 Server Module - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
Sun Blade X6275 Server Module - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
Sun Blade X6270 M2 Server Module - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
Sun Fire X4800 Server - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
Information in this document applies to any platform.
__________

Affected X-Options:

SG-XPCIEFC8GBE-Q8-N - 8Gb FC HBA, EM, Qlogic
SG-XPCIEFC8GBE-Q8-Z - 8Gb FC HBA, EM, Qlogic
SG-XPCIEFCGBE-E8-Z - 8Gb FC HBA, EM, Emulex

Affected Parts: (FRU/CRU Part Number / Description)

371-4522-01 - 8Gb FC, Dual Ethernet PCIe Express Module
371-4666-01 - 8Gb FC, Dual Ethernet PCIe Express Module

Symptoms

Excessive PCIe correctable errors are detected.

Impact

When 1 or 2 Metis HBAs are installed in a blade server, excessive PCIe Correctable errors are detected. When used with SLES11 and SLES11SP1, these errors, combined with a Linux errata, will hang installation through the PEM, or hang the server at boot time. In all other Operating Systems, the errors are reported to log files via the Fault Management architecture mechanisms.

Customer Impact: The number of correctable errors are not within the PCIe Gen 2.0 specified limit of cf2 10^-12 BER (or ~288 per hour maximum) cf0. While the nature of these correctable errors are not inherently harmful, the signal integrity is unacceptable for implementations. The performance impact of these errors should be negligible. However, the system may not boot when used with SLES11 (SUSE Linux) OS and PCIe Gen2 system.

Changes

Contributing Factors

The following listed products are impacted...

  Sun Blade X6270
  Sun Blade X6275
  Sun Blade X6270M2
  Sun Blade X6275M2
  Sun Fire X4800 Server
  SPARC T3-4
  SPARC T3-1B

...when one or both of the following occurs:

. When 1 or 2 Metis HBAs (part numbers as identified above) are installed.
. When used with SLES11 (SUSE Linux) and SLES11SP1 and PCIe Gen2 System.

Cause

Root Cause

The root cause of this issue is ultimately the lack of ability for this HBA card, and more specifically the IDT PCIe Gen 2 switch on the card, to correctly respond to server DLLP settings during PCIe training. One of the more important settings, referred to as de-emphasis, sets the card's transmission and receiving to either -3.5dB or -6dB. This card defaults to -6dB, and does not respond properly to signals instructing the card to transmit -3.5dB. There are also secondary SI issues, relating to amplitude and varying clock jitter that are not fully understood at this time.

The remedy is to force the upstream switch lanes on the card to PCIe Gen 1 speeds, when talking to the root port on the blade. The downstream ports (to the Gigabit Ethernet chip and the Fiber Channel chip) remain at their stock speeds. This mode of operation actually has significant testing on multiple x86/x64 and SPARC blade systems, as any server capable of only PCIe Gen 1 speeds operate in this mode. The performance impact of up to 8% degradation is expected when running full load.

This corrective action was implemented by dash rolling the 371-4522 from -01 to -02 via ECO# E0000932, and purging Services inventories via GSAP# 5410 as of November 15, 2010.

This corrective action was implemented by dash rolling the 371-4666 from -01 to -02 via ECO# E0002549, and purging Services inventories via GSAP# 5501 as of March 11, 2011.

Solution

Workaround

No workaround available - see Resolution section below.

Resolution

. Upon failure only replace 371-4522-01 with 371-4522-02.
. Upon failure only replace 371-4666-01 with 371-4666-02.

Identification of Affected Parts (how to)

The fixed HBAs will have a different EEPROM program that loads the configuration file for the IDT PCIe switch on the card. This configuration file level cannot be identified by an OS query, therefore visual inspection of the dash number on the card is the only way to determine if it is affected.

Comments

This issue was evaluated and determined not to meet FCO criteria as exposure is very limited and confined to when the PCI bus is running in Gen 2 mode only.

References

ECO: E0000932, E0002549
GSAP: 5410, 5501
WW Stop Ship: SSP #20771



For information about FAB documents, its release processes, implementation strategies and billing information, go to the following (Internal Only) URL:

   https://sunspace.sfbay.sun.com/display/Onestop/FAB%20(Field%20Action%20Bulletin)

In addition to the above you may email:

  FAB-Manager_US@oracle.com

Contacts:

Contributor: noel.mckay@oracle.com
Responsible Engineer: beth.edelmaier@oracle.com
Responsible Manager: dave.palmer@oracle.com
Business Unit Group: HBA_P-team@sun.com

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback