Document Audience: | INTERNAL |
Document ID: | I0849-1 |
Title: | New capability is available on E10000 systems to identify MSRAM modules from POST output. |
Copyright Notice: | Copyright © 2005 Sun Microsystems, Inc. All Rights Reserved |
Update Date: | 2002-10-24 |
---------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
FIN #: I0849-1
Synopsis: New capability is available on E10000 systems to identify MSRAM modules from POST output.Create Date: Oct/22/02
SunAlert: No
Top FIN/FCO Report: No
Products Reference: Mirrored SRAM CPU modules
Product Category: Server / Service
Product Affected:
Systems Affected:
-----------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- E10000 ALL Ultra Enterprise 10000 -
X-Options Affected:
-------------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- - - - -
Parts Affected:
Part Number Description Model
----------- ----------- -----
- - -
References:
BugId: 4401066 - Need to identify mirrored SRAM CPU modules after
bringup/DR.
4419788 - MSRAM processor property for Starfire.
PatchId: ssp3.4 110304-06: SSP 3.4: Updates for hpost, redx, and autoconfig.
110316-04: SSP 3.4: OBP patch.
Issue Description:
Mirrored SRAM CPU modules (IBM Sombra and Sony Espejo) are being
installed as upgrades in existing E10000 systems. In some cases, these
are installed alongside older CPU modules. The Field needs to be able
to determine if a Mirrored SRAM (MSRAM) module is installed in a
particular CPU location in order to more effectively service faulty CPU
modules. This issue does not directly impact customer systems but does
affect serviceability.
RFE 4401066 requested that field personnel be given a way to probe the
E10000 and decipher which CPU modules have MSRAM. New HPOST patches
for SSP3.4 software provide this capability. A patch for SSP3.5 is
pending. It is expected that these patches will enable field personnel
to more easily diagnose and resolve CPU Ecache issues on E10000
systems.
SSP3.4 110304-06
Without these recommended patches, there is a high risk that an
incorrect CPU module could be sent to a customer site or that
inappropriate Best Practices actions could take place, i.e., no
replacement on first Ecache parity error, even though the CPU in error
has Mirrored SRAM. The Best Practice for MSRAM CPU modules is to
replace them on the first error and submit them for RCCA/CPAS.
Once these patches are installed, POST output will change as follows:
1) In phase proc1, POST will try to acquire the Module Capability (MCAP)
value from the UPA_CONFIG register of the CPU. (This value was
previously unused for Sunfire E6000 and Starfire E10000 processor
modules). For currently shipped processor modules, the MCAP bits have
now been hard wired to signify the following assignments:
MCAP MCAP MIRRORED ECACHE?
BINARY HEX DATA SRAMS TAG SRAMS COMMENTS ON PROCESSOR MODULE & ECACHE
==============================================================================
b'0000 0x0 unknown unknown Cannot determine anything with MCAP = 0
b'0001 0x1 YES YES 466mhz "Blaze" with IBM (Sombra) MSRAMs
b'0010 0x2 YES YES 466mhz "Blaze" with Sony (Espejo) MSRAMs
b'0011 0x3 YES NO 400mhz "Sapphire" with IBM (Sombra) MSRAMs
b'0100 0x4 YES NO 400mhz "Sapphire" with Sony (Espejo) MSRAMs
==============================================================================
In addition, bits 4, 5, and 6 of the post2obp processor auxiliary, the
fields are now used as follows:
6: Ecache TAG SRAM is mirrored (1 = YES, 0 = NO)
5: Ecache DATA SRAMs are mirrored (1 = YES, 0 = NO)
4: Ecache SRAMs mirrored info is valid (1 = YES, 0 = NO)
After obtaining the MCAP value in phase proc1m the MCAP value will be
checked. For older legacy processor modules, the value will be "0",
indicating that the type of ecache is unknown in regards to mirrored
or not. If the MCAP value is non-zero, POST will check to see if it is
a known value. If the non-zero MCAP value is unknown, then WARN on
that unknown non-zero value and message that nothing could be
determined by it, but do not FAIL the proc.
An example unknown MCAP value WARNING message will appear as follows:
WARNING: Proc 0.3: Unknown MCAP value: 0x5
Cannot check post2obp proc aux info with unknown MCAP value.
In the above case, the mirrored status information was detected to
be valid because it was written in during phase jtag_integ(see item
#4). But since the MCAP value is unknown, the mirrored status information
could not be cross checked. The WARNING will be issued, but HPOST will
continue without failing the processor with the unknown MCAP value.
Another example WARNING is as follows:
WARNING: Proc 0.3: Unknown MCAP value: 0x5
Cannot set post2obp proc aux info with unknown MCAP value.
In this case, the mirrored status information in the post2obp processor
auxiliary structure was not valid, perhaps because phase jtag_integrity
(see item #4) was skipped. A subsequent detection of an unknown MCAP
value results in a message that the post2obp processor auxiliary
information could not be SET based solely on the unknown MCAP value.
Again, HPOST will continue with only the WARNING message and the proc
will not be failed.
2) A new postrc directive has been added for extra messaging during the
ecache SRAM probe:
A user could make the following entry in the postrc to allow more
messaging during HPOST, regarding probing of the ecache SRAMs for
their mirrored status:
debug_maskx00001000 # Extra messaging proc ecache SRAM mirrored status.
Note that some of the new messaging requires HPOST verbosity to be
at level 120 as well as having the new postrc entry above.
3) The postyymmdd.time.log file format has changed. The detected processor
ecache SRAM status is now printed at the end of the POST log file.
At the end of every POST log file, the post2obp auxiliary info structure
will now be included, with each new line beginning with: "#E". The
following is a real example of the new post2obp information being included
at the end of the new POST log file:
--------------------EXAMPLE postyymmdd.time.log:
START------------------------
<...snip...>
phase final_config: Final configuration...
Configuring in 3F, FOM = 92160.00: 10 procs, 8 Scards, 9216 MBytes.
Creating OBP handoff structures...
Configured in 3F with 10 processors, 8 Scards, 9216 MBytes memory.
Interconnect frequency is 83.241 MHz, from SNMP MIB.
Processor external frequency is 124.878 MHz, from SNMP MIB.
Processor internal frequency is 249.724 MHz, from proc clk_mode probe.
NOTE: 2 processors were detected running at least 9.00% below rated speed.
Check system clock values/ratios using the SSP command sys_clock
Boot processor is 3.0 = 12
POST (level=16, verbose=20) execution time 6:53
#E Auxiliary Info structures:
#E brd: cpu3 cpu2 cpu1 cpu0 MCAP ioc1 ioc0 iom type
#E 3: 0013 0013 0013 0013 0000 0000 0000 01: 2 * (SYSIO w/ 2 SBus slots)
#E 4: 0013 0013 0013 0013 0000 0000 0000 01: 2 * (SYSIO w/ 2 SBus slots)
#E 5: 0074 0074 0004 0004 11 0000 0000 01: 2 * (SYSIO w/ 2 SBus slots)
# SMI E10000 POST log closed Wed Mar 20 06:58:46 2002
--------------------EXAMPLE postyymmdd.time.log:
END--------------------------
Breaking down an example line;
#E brd: cpu3 cpu2 cpu1 cpu0 MCAP ioc1 ioc0 iom type
#E 5: 0074 0074 0004 0004 11 0000 0000 01: 2 * (SYSIO w/ 2 SBus slots)
CODE XXAB XXCD XXEF XXGH IJKL XXXX XXXX
(NOTE: "CODE" is just for FIN explanation. It won't be in the POST log)
CODE KEY:
A. For brd-5, cpu3, we have a "7" in that field (all 3 bits set),
indicating the processor module has mirrored data and mirrored
tag SRAMs, and that information is "valid".
B. Ecache Setting, outside the scope of this FIN
C. For brd-5, cpu2, we have a "7" in that field (bits [4,5,6] are set),
indicating the processor module has mirrored data and mirrored
tag SRAMs, and that information is "valid".
D. Ecache Setting, outside the scope of this FIN
E. For brd-5, cpu3, we have a 0" in that field (0 bits set). See
note for K
F. Ecache Setting, outside the scope of this FIN
G. For brd-5, cpu3, we have a 0" in that field (0 bits set). See
note for L
H. Ecache Setting, outside the scope of this FIN
I. MCAP value = "1"; Proc identified as a 466mhz "Blaze" with IBM
(sombra) MSRAMs. (see table above)
J. MCAP value = "1"; Proc identified as a 466mhz "Blaze" with IBM
(sombra) MSRAMs. (see table above)
K. MCAP is blank. The proc is not present or the phase jtag_integ was
skipped AND the proc was FAILED before its MCAP value could be
dechipered in phase proc1.
L. MCAP is blank. The proc is not present or the phase jtag_integ was
skipped AND the proc was FAILED before its MCAP value could be
deciphered in phase proc1.
So for system board three, we can't identify which type of procs they
are, but they are not mirrored SRAMs or mirrored TAGs, as the data is
valid.
Some other items to note, necessary for the explanation above, but do not
directly affect the field:
4) A result of this patch, phase jtag_integ now:
. Checks the JTAG (scantool) database on the SSP, to see if special
MSRAM handling is required for both the ecache data AND now tag SRAMs.
. If the JTAG (scantool) database on the SSP doesn't show that special
MSRAM handling is required for a given processor, cautiously do an
electronic JTAG probe of that processor's ecache SRAM's Component IDs
to be sure. This is just in case the database was incorrect because
autoconfig was never run for a given system board, and that system
board may have processor modules that have MSRAMs that require special
handling. If so, fail the system board and instruct user to run
autoconfig.
. Checks all other processor e-cache tag and data SRAMs in the domain, to
see if they are mirrored or not.
. Records the ecache data and ecache tag status for each processor, into
the post2obp auxiliary info structure, and mark the info as "valid".
5) For hpost -Q arg: Make sure that extracted post2obp mirrored status
information is not misleading statement.
Implementation:
---
| | MANDATORY (Fully Proactive)
---
---
| | CONTROLLED PROACTIVE (per Sun Geo Plan)
---
---
| X | REACTIVE (As Required)
---
Corrective Action:
The following recommendation is provided as a guideline for authorized
Sun Services Field Representatives who may encounter the above
mentioned problem.
Install the following patches on E10000 systems for the complete solution:
SSP3.4: Patches 110316-04, 110304-06 or later
Comments:
None
============================================================================
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO index.
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@sdpsweb.EBay
--------------------------------------------------------------------------