![]() | Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | ||
|
|
||
Solution Type Problem Resolution Sure Solution 1018832.1 : Sun Fire[TM] Midframe servers: POST fails during IOPOST, marking all I/O Boards (IBs) as bad.
PreviouslyPublishedAs 230625 Symptoms All I/O Boards (IBs) are marked as bad during IOPOST. This can be misleading while diagnosing the right FRU Resolution Sometimes all I/O Boards (IBs) are marked as bad because of a faulty CPU running IOPOST. The CPU itself running POST is bad, which unfortunately goes undetected by LPOST (POST for the CPU itself). See a snippet from the console logs below. Replacing the SB containing the rogue CPU will resolve the issue.
Note the following from the console logs :
-------------------------------------------
a) SB4/P0 is the processor running the IOPOST
b) SB4/P0 marks IB6/P0 and IB6/P1 - the two IO controllers on IB6 as
"Failed"
c) SB4/P0 marks IB8/P0 and IB8/P1 - the two IO controllers on IB8 as
"Failed"
d) SB4/P0 is actually the bad CPU. Since the CPU itself is faulty, it
cannot reliably test the IBs, marking the controllers on the IBs as failed.
e) SB4/P0 goes undetected during its own Self Test (called LPOST)
f) It is highly unlikely that all of the IO controllers (IB6/P0, IB6/P1,
IB8/P0 and IB8/P1) are bad.
Console logs :
--------------
{/N0/SB4/P0} ERROR: TEST=PCI IO Controller Functional Tests,SUBTEST=PCI IO
Controller DMA loopback Tests ID=152.2
{/N0/SB4/P0} Component under test: /N0/IB6/P0 PCI IOC
{/N0/SB4/P0} Data Access Error from address 00000000.08000820. AFSR =
00000002.00000094
{/N0/SB4/P0} Secondary AFAR 00000000.08000820, Secondary AFSR =
00000002.00000094
{/N0/SB4/P0} tl tt tstate tpc tnpc
{/N0/SB4/P0} 01 63 00000099.80000606 00000000.0001ca48
00000000.0001ca4c
{/N0/SB4/P0} (CE) Correctable system data ECC error
{/N0/SB4/P0} tl tt tstate tpc tnpc
{/N0/SB4/P0} 02 32 00000044.80001504 00000000.0000f1e0
00000000.0000f1e4
{/N0/SB4/P0} 01 63 00000099.80000606 00000000.0001ca48
00000000.0001ca4c
{/N0/SB4/P0} (CE) Correctable system data ECC error
{/N0/SB4/P0} (TO) Time-out from system bus
{/N0/SB4/P0} tl tt tstate tpc tnpc
{/N0/SB4/P0} 03 34 00000091.80001507 00000000.00014d80
00000000.00014d84
{/N0/SB4/P0} 02 32 00000044.80001504 00000000.0000f1e0
00000000.0000f1e4
{/N0/SB4/P0} 01 63 00000099.80000606 00000000.0001ca48
00000000.0001ca4c
{/N0/SB4/P0} AFSR = 00000000.00000000
{/N0/SB4/P0} AFAR = 00000000.08000820
{/N0/SB4/P0} IMMU SFSR = 00000000.00000000
{/N0/SB4/P0} DMMU SFSR = 00000000.00700009
{/N0/SB4/P0} DMMU SFAR = 00000000.08000820
{/N0/SB4/P0} PState = 00000000.00000015
{/N0/SB4/P0} Dispatch Control =00000000.0000103f
{/N0/SB4/P0} Data Cache Unit Control =0000ce00.0000000e
{/N0/SB4/P0} Safari Config. = 0aaa0028.20200006
{/N0/SB4/P0} EState = 00000000.00000000
{/N0/SB4/P0} @(#) lpost 5.15.2 2003/08/04 10:27
{/N0/SB4/P0} Copyright 2001-2003 Sun Microsystems, Inc. All rights
reserved.
{/N0/SB4/P0} Use is subject to license terms.
{/N0/SB4/P0} Running PCI IO Controller Basic Tests
{/N0/SB4/P0} tl tt tstate tpc tnpc
{/N0/SB4/P0} 02 32 00000044.80001503 000007ff.f0007cc0
000007ff.f0007cc4
{/N0/SB4/P0} 01 32 00000000.80000405 000007ff.f0009bec
000007ff.f0009bf0
{/N0/SB4/P0} (TO) Time-out from system bus
{/N0/SB4/P0} (PRIV) Privileged code access error(s)
{/N0/SB4/P0} (ME) Multiple Errors of the same type occurred
{/N0/SB4/P0} tl tt tstate tpc tnpc
{/N0/SB4/P0} 03 32 00000099.80001502 000007ff.f0006a58
000007ff.f0006a5c
{/N0/SB4/P0} 02 32 00000044.80001503 000007ff.f0007cc0
000007ff.f0007cc4
{/N0/SB4/P0} 01 32 00000000.80000405 000007ff.f0009bec
000007ff.f0009bf0
{/N0/SB4/P0} (TO) Time-out from system bus
{/N0/SB4/P0} (PRIV) Privileged code access error(s)
{/N0/SB4/P0} @(#) lpost 5.15.2 2003/08/04 10:27
{/N0/SB4/P0} Copyright 2001-2003 Sun Microsystems, Inc. All rights
reserved.
{/N0/SB4/P0} Use is subject to license terms.
{/N0/SB4/P0} @(#) lpost 5.15.2 2003/08/04 10:27
{/N0/SB4/P0} Copyright 2001-2003 Sun Microsystems, Inc. All rights
reserved.
{/N0/SB4/P0} Use is subject to license terms.
{/N0/IB6/P0} Failed <--- !!
{/N0/IB6/P1} Failed <--- !!
Sep 10 11:05:24 he101 Domain-A.SC: Excluded unusable, unlicensed, failed or
disabled board: /N0/IB6
Copying IO prom to Cpu dram
...................................
{/N0/SB4/P0} Running PCI IO Controller Basic Tests
{/N0/SB4/P0} Jumping to memory 00000000.00000020 [00000010]
{/N0/SB4/P0} System PCI IO post code running from memory
{/N0/SB4/P0} @(#) lpost 5.15.2 2003/08/04 10:28
{/N0/SB4/P0} Copyright 2001-2003 Sun Microsystems, Inc. All rights
reserved.
{/N0/SB4/P0} Use is subject to license terms.
{/N0/SB4/P0} Subtest: PCI IO Controller Register Initialization for aid
0x1c
{/N0/SB4/P0} Running PCI IO Controller Functional Tests
{/N0/SB4/P0} Subtest: PCI IO Controller IOMMU TLB Compare Tests for aid
0x1c
{/N0/SB4/P0} Subtest: PCI IO Controller IOMMU TLB Flush Tests for aid 0x1c
{/N0/SB4/P0} Subtest: PCI IO Controller DMA loopback Tests for aid 0x1c
{/N0/SB4/P0} ERROR: TEST=PCI IO Controller Functional Tests,SUBTEST=PCI IO
Controller DMA loopback Tests ID=152.2
{/N0/SB4/P0} Component under test: /N0/IB8/P0 PCI IOC
{/N0/SB4/P0} Data Access Error from address 00000000.08000820. AFSR =
00000002.00000094
{/N0/SB4/P0} Secondary AFAR 00000000.08000820, Secondary AFSR =
00000002.00000094
{/N0/SB4/P0} tl tt tstate tpc tnpc
{/N0/SB4/P0} 01 63 00000099.80000605 00000000.0001c8b4
00000000.0001c8b8
{/N0/SB4/P0} (CE) Correctable system data ECC error
{/N0/SB4/P0} tl tt tstate tpc tnpc
{/N0/SB4/P0} 02 32 00000044.80001503 00000000.0000f1e0
00000000.0000f1e4
{/N0/SB4/P0} 01 63 00000099.80000605 00000000.0001c8b4
00000000.0001c8b8
{/N0/SB4/P0} (CE) Correctable system data ECC error
{/N0/SB4/P0} (TO) Time-out from system bus
{/N0/SB4/P0} tl tt tstate tpc tnpc
{/N0/SB4/P0} 03 34 00000091.80001506 00000000.00014d80
00000000.00014d84
{/N0/SB4/P0} 02 32 00000044.80001503 00000000.0000f1e0
00000000.0000f1e4
{/N0/SB4/P0} 01 63 00000099.80000605 00000000.0001c8b4
00000000.0001c8b8
{/N0/SB4/P0} AFSR = 00000000.00000000
{/N0/SB4/P0} AFAR = 00000000.08000820
{/N0/SB4/P0} IMMU SFSR = 00000000.00000000
{/N0/SB4/P0} DMMU SFSR = 00000000.00700009
{/N0/SB4/P0} DMMU SFAR = 00000000.08000820
{/N0/SB4/P0} PState = 00000000.00000015
{/N0/SB4/P0} Dispatch Control =00000000.00000000
{/N0/SB4/P0} Data Cache Unit Control =00000000.0000000c
{/N0/SB4/P0} Safari Config. = 0aaa0028.20200006
{/N0/SB4/P0} EState = 00000000.00000000
{/N0/SB4/P0} @(#) lpost 5.15.2 2003/08/04 10:27
{/N0/SB4/P0} Copyright 2001-2003 Sun Microsystems, Inc. All rights
reserved.
{/N0/SB4/P0} Use is subject to license terms.
{/N0/SB4/P0} Running PCI IO Controller Basic Tests
{/N0/SB4/P0} tl tt tstate tpc tnpc
{/N0/SB4/P0} 01 32 00000000.80000405 000007ff.f0009bec
000007ff.f0009bf0
{/N0/SB4/P0} (TO) Time-out from system bus
{/N0/SB4/P0} (PRIV) Privileged code access error(s)
{/N0/SB4/P0} tl tt tstate tpc tnpc
{/N0/SB4/P0} 02 32 00000099.80001502 000007ff.f0006a58
000007ff.f0006a5c
{/N0/SB4/P0} 01 32 00000000.80000405 000007ff.f0009bec
000007ff.f0009bf0
{/N0/SB4/P0} (TO) Time-out from system bus
{/N0/SB4/P0} (PRIV) Privileged code access error(s)
{/N0/SB4/P0} tl tt tstate tpc tnpc
{/N0/SB4/P0} 03 32 00000099.80001502 000007ff.f0006a58
000007ff.f0006a5c
{/N0/SB4/P0} 02 32 00000099.80001502 000007ff.f0006a58
000007ff.f0006a5c
{/N0/SB4/P0} 01 32 00000000.80000405 000007ff.f0009bec
000007ff.f0009bf0
{/N0/SB4/P0} (TO) Time-out from system bus
{/N0/SB4/P0} (PRIV) Privileged code access error(s)
{/N0/SB4/P0} @(#) lpost 5.15.2 2003/08/04 10:27
{/N0/SB4/P0} Copyright 2001-2003 Sun Microsystems, Inc. All rights
reserved.
{/N0/SB4/P0} Use is subject to license terms.
{/N0/SB4/P0} @(#) lpost 5.15.2 2003/08/04 10:27
{/N0/SB4/P0} Copyright 2001-2003 Sun Microsystems, Inc. All rights
reserved.
{/N0/SB4/P0} Use is subject to license terms.
{/N0/IB8/P0} Failed <--- !!
{/N0/IB8/P1} Failed <--- !!
Sep 10 11:05:47 he101 Domain-A.SC: Excluded unusable, unlicensed, failed or
disabled board: /N0/IB8
Sep 10 11:05:47 he101 Domain-A.SC: No usable Io board in domain.
setkeyswitch operation did not completeRelief/Workaround Disable the System Board (SB) containing the CPU running IOPOST (that fails IOPOST), so we move IOPOST to run on a different CPU. This can be achieved by using the "disablecomponent" command from the system controller interface (SC-App) Alternatively, disabling the processor itself using the "disablecomponent" command is a valid workaround too. Product Sun Fire V1280 Server Sun Fire 6800 Server Sun Fire 4810 Server Sun Fire 4800 Server Sun Fire 3800 Server IOPOST, IB6/P0, Failed, DMA, Functional, Controller Previously Published As 72039 Change History Date: 2009-11-25 User Name: Josh Freeman Action: Refreshed Comment: No changes made - refreshed per ESG Content Team effort. Date: 2004-02-10 User Name: 109197 Action: Update Canceled Comment: *** Restored Published Content *** Issue under review for possible seperate Infodoc Version: 0 Date: 2003-12-12 User Name: 77740 Action: Rejected Comment: Paul Attachments This solution has no attachment |
||||||||||||
|
||||||||||||