Document Audience: | INTERNAL |
Document ID: | A0208-3 |
Title: | 440MHz and 450MHz UltraSPARC II Modules in Ultra 60/80, Enterprise 220R/420R and Netra t platforms may experience Red State Exception, Send Mondo Timeout and hard hangs |
Copyright Notice: | Copyright © 2007 Sun Microsystems, Inc. All Rights Reserved |
Update Date: | Mon Mar 29 00:00:00 MST 2004 |
----------------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
----------------------------------------------------------------------------
FIELD CHANGE ORDER
(For Authorized Distribution by Enterprise Services)
FCO #: A0208-3
Status: inactive
Synopsis: 440MHz and 450MHz UltraSPARC II Modules in Ultra 60/80, Enterprise 220R/420R and Netra t platforms may experience Red State Exception, Send Mondo Timeout and hard hangsDate: Mar/29/2004
SunAlert: Yes
Top FIN/FCO Report: No
Products Reference: Ultra 60/80, Enterprise 220R/420R, Netra t1120,
Netra t1125, Netra t1400, Netra t1405
Product Category: Server / System Component
Product Affected:
Systems Affected:
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
for part number 501-6058-02 (or less):
- A27 All Ultra 80 144xxxxx - 246xxxxx
- A23 All Ultra 60 144xxxxx - 246xxxxx
- A34 All Enterprise 220R 144xxxxx - 246xxxxx
for part number 501-6071-01:
- A27 All Ultra 80 144xxxxx - 246xxxxx
- A23 All Ultra 60 144xxxxx - 246xxxxx
- A34 All Enterprise 220R 144xxxxx - 246xxxxx
- A33 All Enterprise 420R 144xxxxx - 246xxxxx
for part number 501-6209-02: (or less)
- N14 All Netra t1405 144xxxxx - 246xxxxx
- N15 All Netra t1400 144xxxxx - 246xxxxx
- N04 All Netra T1120 144xxxxx - 246xxxxx
- N03 All Netra T1125 144xxxxx - 246xxxxx
- N02 All Netra T1120 144xxxxx - 246xxxxx
for part number 501-5682-03:
- N14 All Netra t1405 220xxxxx - 319xxxxx
- N15 All Netra t1400 220xxxxx - 319xxxxx
- N04 All Netra T1120 220xxxxx - 319xxxxx
- N03 All Netra T1125 220xxxxx - 319xxxxx
- N02 All Netra T1120 220xxxxx - 319xxxxx
X-Options Affected:
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
X1197A - - 440MHz UltraSPARC II module -
X1195A - - 450MHz UltraSPARC II module -
Parts Affected:
Part Number Description Model
----------- ----------- -----
501-6058-02(Or Less) 450MHz UltraSPARC II module -
501-6071-01 450MHz UltraSPARC II module -
501-6209-02(or Less) 440MHz UltraSPARC II module -
501-5682-03 440MHz UltraSPARC II module -
(SCSI Devices)
Type Vendor Model SerialNumber(Min) SerialNumber(Max) Firmware
---- ------ ------- ----------------- ----------------- --------
N/A
References:
URL:
http://pts-americas.west/vsp/wgs/products/E420R/e420_troubleshooting_guide.pdf
ECO: WO_25557
DPCO: 336.B
ESC: 538507
FIN: I0896-1
FIN: I0755-1
FIN: I0616-1
WWStopShip: P001-20085
Sun Alert: 49945
Issue Description:
Change History
--------------
A0208-3 Date Modified: Mar/26/04
Updates: PRODUCT AFFECTED, AFFECTED PARTS, CORRECTIVE ACTION, COMMENTS
. PRODUCT AFFECTED: see "Systems Affected". Now organized by part number
. AFFECTED PARTS: 501-6209-01 was changed to 501-6209-02
. CORRECTIVE ACTION: 501-6209-01 was changed to 501-6209-02
Also see "Identification of Suspect Part"
. COMMENTS: additional information from internal section of Sun Alert
was added to FCO and removed from Sun Alert
A0208-2 Date Modified: Aug/04/03
Updates: AFFECTED PARTS, CORRECTIVE ACTION
. AFFECTED PARTS: Added part number 501-5682-03
. CORRECTIVE ACTION: Added "replace 501-5682-03 with 501-5682-04"
----------------------------------------------------------------
Certain 440MHz and 450MHz UltraSPARC II Modules supported on Ultra 60/80,
Enterprise 220R/420R and Netra t platforms may experience early life
failures resulting in Red State Exception, Send Mondo Timeout and hard
hang errors.
A limited number of systems manufactured between November 01, 2001 and
November 22, 2002 may contain affected CPU modules. The affected system
serial number range is between 144xxxxx and 246xxxxx. The probability of
experiencing the described issue is considered low at < or = 4%.
The failures caused by this problem are not unique to this issue. Failures
include EDP and WP panics, Red State Exception, Send Mondo Timeout panics,
machine hard hangs, and machine reboots. Typically, after experiencing
such errors, the only means of rebooting the machine is by power cycling
it with the front power button or key switch. To be sure that a failure
is related to this problem, the failure must be repeatable and, when data
is available, must always identify the same CPU module is at fault. The
time between failures is typically two to five weeks.
Root cause determined that the socket used on the module has a 90 day
shelf-life. Modules assembled with sockets less than 90 days old have
not experienced this problem. Once assembled, socket aging is not an
issue. All loose sockets in manufacturing and repair that were more
than 90 days old have been scrapped, and field spares have been purged
and reworked with good sockets.
Corrective action was implemented in manufacturing by purging and reworking
all modules with new sockets (less than 90 days old) via Worldwide Stopship
Purge P001-20085 on November 22, 2002. Modules were then dash rolled via
ECO# WO_25557 as of December 17, 2002. Corrective action was put in place
in Sun Services via DPCO 336 as of December 17, 2002.
Parts Affected:
February 28th, 2005
Implementation:
---
| | MANDATORY (Fully Pro-Active)
---
---
| | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
---
---
| X | UPON FAILURE
---
Replacement Time Estimate:
0.5 hours
Special Considerations:
Due to material availability issues, APac will not be fully ready
to support this FCO until approximately March 12, 2003. All other
geographies are ready upon release of this FCO.
Because this problem manifests itself as a common system failure,
care should be given to first using the Best Practices Guide to
help determine the exact failure mode. The E420R Best Practices
Troubleshooting Guide can be located at the following URL;
http://pts-americas.west/vsp/wgs/products/E420R/e420_troubleshooting_guide.pdf
Corrective Action:
NOTE! This FCO does not authorize the proactive replacement of any hardware.
For those customers who insist on having their modules proactively
replaced, please address this through the CIC program. Reference
the following URL for more information;
http://uscq.ebay/Process/cic.html
Upon repeat failure and unsuccessful attempts to correct the failure by
the above mentioned Best Practices, the failing module should be replaced.
Upon Failure replace as follows;
- replace 501-6058-02 (or less) with 501-6058-03 (or above) or
501-6058-02 (with DPCO 336 label), or 501-6071-02, or
501-6071-01 (with DPCO 336 label)
- replace 501-6071-01 with 501-6071-02 (or above) or with
501-6071-01 (with DPCO 336 label)
- replace 501-6209-02 (or less) with 501-6209-03 (or above) or
501-6209-02 (with DPCO 336 label)
- replace 501-5682-03 with 501-5682-04
Identification of Suspect Part
------------------------------
For Serial Numbers: 144xxxxx - 246xxxxx:
Systems manufacted between November 2001 and November 2002 may contain
potentially affected CPU modules. The affected system serial number range is
144XXXXX through 246XXXXX. (the first 3 digits reflect the year and week of
manufacture).
Locate the CPU module part number. If it is 501-6071-01 or 501-6058-02 (or
earlier) and without a "DPCO 336" sticker, the system is possibly affected by
the described issue.
If it is 501-6209-02 (or earlier), please locate the CPU module serial number.
If it is 0 through 1000 or 2000 through 3500, it may be affected. If it is
1001-1999, or greater than 3500, it will not be affected. To find the CPU
module serial number, look at the barcoded label on the CPU module. There will
be a number above it, example 50162091300. The last 4 digits (1300 in this
case) is the serial number. In this example, 1300 is within the range of
modules that are not affected.
For Serial Numbers: 220xxxxx - 319xxxxx:
Systems manufacted between May 2002 and May 2003 may contain potentially
affected CPU modules. The affected system serial number range is 220XXXXX
through 319XXXXX. (the first 3 digits reflect the year and week of
manufacture).
For CPU module part number 501-5682-03, please locate the CPU module serial
number. If it is 103XXX and above, it may be affected. To find the CPU module
serial number, look at the barcoded label on the CPU module. There will be a
number above it, example 5015682106778. The last 6 digits (106778 in this case)
is the serial number. In this example, 106778 is greater than 103XXX and
therefore possibly affected.
Comments:
Observed average system MTBF has decreased by approximately 500 hours
on systems containing affected 450MHz CPU modules manufactured between
November 02, 2001 and November 22, 2002.
It is important to note :
1. These failures have been attributable to manufacture variability, not to
the design of the CPU, module or the socket.
2. The vast majority of sockets are still within design specification and will
function as expected.
Additional data that supports a bounded 90 day early life failure condition was
extracted from actual units returned from the field which had failed for a
socket related symptom. In these cases modules were taken back through extended
system testing under varying corner conditions of temperature, voltage and
frequency margining. Modules which failed for a socket contact related issue
showed a TTF averaging less then 38 days of continuous power on run time.
Sun has examined the socket subtier supplier process and imposed the
following changes:
1. A tightened column height tolerance has been implemented on all new sockets
used in all Sun modules. The contact column height tolerance has been changed
from .026-.035" to .029-.035" to insure a column height that provides optimum
ohmic contact under compression.
2. Inspection/measurement points have been integrated into the socket
manufactured process flow to insure sockets are made to Sun's required
tolerances.
"Wholesale" replacement of modules has been shown to create potential for
mechanical contact failures and introduce instability into a already stable
system.
Billing Type:
Warranty: Sun will provide parts at no charge under Warranty
Service. On-Site Labor Rates are based on how the
system was initially installed.
Contract: Sun will provide parts at no charge. On-Site Labor Rates
are based on the type of service contract.
Non Contract: Sun will provide parts at no charge. Installation by
Sun is available based on the On-Site Labor Rates
defined in the Price List.
--------------------------------------------------------------------------
Implementation Footnote:
________________________
i) In case of Mandatory FCOs, Sun Services will attempt to contact
all known customers to recommend the part upgrade.
ii) For controlled proactive swap FCOs, Sun Services mission critical
support teams will initiate proactive swap efforts for their respective
accounts, as required.
iii) For Replace upon Failure FCOs, Sun Services partners will implement
the necessary corrective actions as and when they are required.
--------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
______________
* Access the top level URL of http://sdpsweb.Central/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
_______________________
* Access the SunSolve Online URL at http://sunsolve.Central/
* From there, select the appropriate link to browse the FIN or FCO index.
Internet Access:
_______________
* Access the top level URL of https://spe.sun.com
--------------------------------------------------------------------------
General:
________
Send questions or comments to finfco-manager@Sun.COM
---------------------------------------------------------------------------