Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-73-1021721.1
Update Date:2010-08-27
Keywords:

Solution Type  FAB (standard) Sure

Solution  1021721.1 :   Sun Netra T5440 Server panics under heavy I/O with X4447A installed.  


Related Items
  • Sun Netra T5440 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Sun FAB>Standard>Reactive
  •  

PreviouslyPublishedAs
274350


Bug Id
<SUNBUG: 6726521>

Product
Sun Netra T5440 Server

Date of Resolved Release
16-Dec-2009

Netra T5440 panics under heavy I/O (see details below).

Affected Parts:

540-7689-02   PCI Express/XAUI Mezzanine Assembly, RoHS:Y
371-4413-01   2-Slot PCI-X and 2-Slot PCI Express Auxiliary Board, RoHS:Y
371-3647-04   2-Slot PCI-X and 2-Slot PCI Express Auxiliary Board, RoHS:Y

Impact

The Sun Quad GbE x8 PCIe UTP (X4447A-Z) network card can occasionally panic in some of PCIe slots when running the MAXQ test.  The issue is related to ACK bit setting of EEPROM FW in the PLX8525 and PLX8533 PLX chips of the PCI MEZZ boards.  The issue is resolved with updated firmware on both chips.  The PLX8533 chip is on the PCIM 1 board, and PLX8525 chip is on the PCIM2 board.

Contributing Factors

This issue affects any Sun Netra T5440 Server manufactured prior to February of 2010, and which has a Sun Quad GbE x8 PCIe card UTP (X4447A-Z) network card installed.

Symptoms

System paniced with Fatal error has occured in: PCIe fabric while running MAXQ on X4447A card.

SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major
EVENT-TIME: 0x487e49b7.0x68e8812 (0x1186a4864635)
PLATFORM: SUNW,Netra-T5440, CSN: -, HOSTNAME: atqa62
SOURCE: SunOS, REV: 5.10 Generic_127127-11
DESC: Errors have been detected that require a reboot to ensure system
integrity.  See http://www.sun.com/msg/SUNOS-8000-0G for more information.
AUTO-RESPONSE: Solaris will attempt to save and diagnose the error telemetry
IMPACT: The system will sync files, save a crash dump if needed, and reboot
REC-ACTION: Save the error summary below in case telemetry cannot be saved

panic[cpu82]/thread=2a1036e7ca0: Fatal error has occured in: PCIe fabric.

000002a1036e7780 px:px_err_panic+174 (0, 12f1800, 2a1036e7830, 43, 2a1036e7831, 0)
 %l0-3: 0000000000000034 000000000193c400 0000000000000000 0000000000000001
 %l4-7: 000000000193c400 0000000000000000 0000000001870000 ffffffffffffffff
000002a1036e7890 px:px_err_fabric_intr+b8 (30001e8fe00, 30, f00, 30001e6e238, 30001e8ff50, f00000000000000)
 %l0-3: 0000060060928630 0000060060be9200 000011867cfe1568 000000003b9aca00
 %l4-7: 0000000000000043 0000000000000000 0000000000000000 0000000000000001
000002a1036e7980 px:px_msiq_intr+204 (600608339f0, 30001e6e238, 12e4c38, 0, 6006085e608, 60060869800)
 %l0-3: 0000030001e6e238 00000600608303e0 000002a1036e7a50 000002a1036e7a80
 %l4-7: 0000000000000000 0000000000000000 0000060066c40000 0000000000000030

Root Cause

The default ACK latency timer in the EEPROM register from PLX in the PLX8533 and PLX8525 bridge chips is too short and causes a problem with the MAXQ program while at max transfer rates with the X4447A card.

Reprogramming the ACK latency timer in both PLX8533 and PLX8525 bridge chips in PCI MEZZ board corrects this issue.

Corrective Action

Workaround:

No workaround available - see Resolution section below.

Resolution:

The tool and PLX code can be obtained from a member of the TSC VSP group.  These parts can
be programmed in the field by following the below procedure.

1. Boot solaris and login as root.

2. Place the tar file into the root directory and untar plxtool, pxl8533.bin,
   and plx8525.bin.  Change permissions on plxtool to make it executable:

     chmod 755 plxtool

3. Run plxtool to see the Netra-T5440 plx switches information on /pci@400:

./plxtool /pci@400 -f
last = 255 1 7
Found PLX switch port: 2,0,0, VID: 10b5, DID: 8548, Model: 8548
Found PLX switch port: 4,0,0, VID: 10b5, DID: 8112, Model: 8112
Found PLX switch port: 8,0,0, VID: 10b5, DID: 8533, Model: 8533
Found PLX switch port: 13,0,0, VID: 10b5, DID: 8525, Model: 8525
Found PLX switch port: 14,0,0, VID: 10b5, DID: 8525, Model: 8525

4. Update plx8533

./plxtool /pci@400 -d 8.0.0 -B ./plx8533.bin -E

Note: 8.0.0 should match the switch plx 8533 prot 8,0,0

5. Update plx8525

./plxtool /pci@400 -d 13.0.0 -B ./plx8525.bin -E

Note: 13.0.0 should match the switch plx 8525 port 13,0,0

6. Check the plx 8533 switch's eeprom value data

./plxtool /pci@400 -i
Searching for PLX devices this can take a couple minutes...
last = 255 1 7
Found 5 devices, Select one from the list:
 1 : PLX 8548 @ 2:0:0
 2 : PLX 8112 @ 4:0:0
 3 : PLX 8533 @ 8:0:0
 4 : PLX 8525 @ 13:0:0
 5 : PLX 8525 @ 14:0:0
Choice? 3

Select an option:
 r : Read registers
 R : Read SEEPROM registers
 w : Write user inputed data to registers
 W : Write user inputed data to SEEPROM registers
 B : Import data from binary file
 T : Import data from text file
 D : Diff image file against internal registers
 s : Save register data to file
 b : Change barnum
 v : Change verbosity
 o : Change options
 q : Quit
R
Enter the starting offset and number of words to read, use ':' as a seperator
0:40
Reading registers
Complete

Starting offset: 0x00000000, Ending offset: 0x000000a0
0x000000 | 0030005a 0000041f 201f0028 00380000
0x000010 | 0000241f 27f10030 00000000 0043007e
0x000020 | 047e0000 00000043 0043207e 247e0000
0x000030 | 00000043 ffffffff ffffffff ffffffff
0x000040 | ffffffff ffffffff ffffffff ffffffff
0x000050 | ffffffff ffffffff ffffffff ffffffff
0x000060 | ffffffff ffffffff ffffffff ffffffff
0x000070 | ffffffff ffffffff ffffffff ffffffff
0x000080 | ffffffff ffffffff ffffffff ffffffff
0x000090 | ffffffff ffffffff ffffffff ffffffff

7. Check the plx 8525 switch's eeprom value data

./plxtool /pci@400 -i
Searching for PLX devices this can take a couple minutes...
last = 255 1 7

Found 5 devices, Select one from the list:
 1 : PLX 8548 @ 2:0:0
 2 : PLX 8112 @ 4:0:0
 3 : PLX 8533 @ 8:0:0
 4 : PLX 8525 @ 13:0:0
 5 : PLX 8525 @ 14:0:0
Choice? 4

Select an option:
 r : Read registers
 R : Read SEEPROM registers
 w : Write user inputed data to registers
 W : Write user inputed data to SEEPROM registers
 B : Import data from binary file
 T : Import data from text file
 D : Diff image file against internal registers
 s : Save register data to file
 b : Change barnum
 v : Change verbosity
 o : Change options
 q : Quit
R
Enter the starting offset and number of words to read, use ':' as a seperator
0:40
Reading registers
Complete

Starting offset: 0x00000000, Ending offset: 0x000000a0
0x000000 | 002a005a 0000041f 201f0000 00180000
0x000010 | 0000241f 27f10010 00000000 0043007e
0x000020 | 207e0000 00000043 0043247e 00000000
0x000030 | ffffffff ffffffff ffffffff ffffffff
0x000040 | ffffffff ffffffff ffffffff ffffffff
0x000050 | ffffffff ffffffff ffffffff ffffffff
0x000060 | ffffffff ffffffff ffffffff ffffffff
0x000070 | ffffffff ffffffff ffffffff ffffffff
0x000080 | ffffffff ffffffff ffffffff ffffffff
0x000090 | ffffffff ffffffff ffffffff ffffffff

8. Reboot the host to let the new plx switch 8533/8525 eeprom values be executed.

Comments

Boards with the lengthened timer will be phased in to production, currently scheduled for February of 2010, via ECO release and dash roll.



For information about FAB documents, its release processes, implementation strategies and billing information, go to the following URL:

For Sun Authorized Service Providers go to:

In addition to the above you may email:


Internal Contributor/submitter
Donald.Drygalski@Sun.COM

Internal Eng Responsible Engineer
Jim.Ye@Sun.COM Responsible Manager: Art.Weigel@Sun.COM

Internal Services Knowledge Engineer
Joe.Davis@Sun.COM

Internal Eng Business Unit Group
SSG NSN (Netra Systems and Networking)

Internal Sun Alert & FAB Admin Info
14-Dec-2009: Completed draft and sent to Extended Review.
16-Dec-2009: No issues from Ext Rvw - sending to Publish.


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback