Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type FAB (standard) Sure Solution 1021721.1 : Sun Netra T5440 Server panics under heavy I/O with X4447A installed.
PreviouslyPublishedAs 274350 Bug Id <SUNBUG: 6726521> Product Sun Netra T5440 Server Date of Resolved Release 16-Dec-2009 Netra T5440 panics under heavy I/O (see details below). Affected Parts: 540-7689-02 PCI Express/XAUI Mezzanine Assembly, RoHS:Y 371-4413-01 2-Slot PCI-X and 2-Slot PCI Express Auxiliary Board, RoHS:Y 371-3647-04 2-Slot PCI-X and 2-Slot PCI Express Auxiliary Board, RoHS:Y ImpactThe Sun Quad GbE x8 PCIe UTP (X4447A-Z) network card can occasionally panic in some of PCIe slots when running the MAXQ test. The issue is related to ACK bit setting of EEPROM FW in the PLX8525 and PLX8533 PLX chips of the PCI MEZZ boards. The issue is resolved with updated firmware on both chips. The PLX8533 chip is on the PCIM 1 board, and PLX8525 chip is on the PCIM2 board.Contributing FactorsThis issue affects any Sun Netra T5440 Server manufactured prior to February of 2010, and which has a Sun Quad GbE x8 PCIe card UTP (X4447A-Z) network card installed.SymptomsSystem paniced with Fatal error has occured in: PCIe fabric while running MAXQ on X4447A card.SUNW-MSG-ID: SUNOS-8000-0G, TYPE: Error, VER: 1, SEVERITY: Major EVENT-TIME: 0x487e49b7.0x68e8812 (0x1186a4864635) PLATFORM: SUNW,Netra-T5440, CSN: -, HOSTNAME: atqa62 SOURCE: SunOS, REV: 5.10 Generic_127127-11 DESC: Errors have been detected that require a reboot to ensure system integrity. See http://www.sun.com/msg/SUNOS-8000-0G for more information. AUTO-RESPONSE: Solaris will attempt to save and diagnose the error telemetry IMPACT: The system will sync files, save a crash dump if needed, and reboot REC-ACTION: Save the error summary below in case telemetry cannot be saved panic[cpu82]/thread=2a1036e7ca0: Fatal error has occured in: PCIe fabric. 000002a1036e7780 px:px_err_panic+174 (0, 12f1800, 2a1036e7830, 43, 2a1036e7831, 0) %l0-3: 0000000000000034 000000000193c400 0000000000000000 0000000000000001 %l4-7: 000000000193c400 0000000000000000 0000000001870000 ffffffffffffffff 000002a1036e7890 px:px_err_fabric_intr+b8 (30001e8fe00, 30, f00, 30001e6e238, 30001e8ff50, f00000000000000) %l0-3: 0000060060928630 0000060060be9200 000011867cfe1568 000000003b9aca00 %l4-7: 0000000000000043 0000000000000000 0000000000000000 0000000000000001 000002a1036e7980 px:px_msiq_intr+204 (600608339f0, 30001e6e238, 12e4c38, 0, 6006085e608, 60060869800) %l0-3: 0000030001e6e238 00000600608303e0 000002a1036e7a50 000002a1036e7a80 %l4-7: 0000000000000000 0000000000000000 0000060066c40000 0000000000000030 Root CauseThe default ACK latency timer in the EEPROM register from PLX in the PLX8533 and PLX8525 bridge chips is too short and causes a problem with the MAXQ program while at max transfer rates with the X4447A card.Reprogramming the ACK latency timer in both PLX8533 and PLX8525 bridge chips in PCI MEZZ board corrects this issue. Corrective ActionWorkaround:No workaround available - see Resolution section below. Resolution: The tool and PLX code can be obtained from a member of the TSC VSP group. These parts can be programmed in the field by following the below procedure. 1. Boot solaris and login as root. 2. Place the tar file into the root directory and untar plxtool, pxl8533.bin, and plx8525.bin. Change permissions on plxtool to make it executable: chmod 755 plxtool 3. Run plxtool to see the Netra-T5440 plx switches information on /pci@400: ./plxtool /pci@400 -f last = 255 1 7 Found PLX switch port: 2,0,0, VID: 10b5, DID: 8548, Model: 8548 Found PLX switch port: 4,0,0, VID: 10b5, DID: 8112, Model: 8112 Found PLX switch port: 8,0,0, VID: 10b5, DID: 8533, Model: 8533 Found PLX switch port: 13,0,0, VID: 10b5, DID: 8525, Model: 8525 Found PLX switch port: 14,0,0, VID: 10b5, DID: 8525, Model: 8525 4. Update plx8533 ./plxtool /pci@400 -d 8.0.0 -B ./plx8533.bin -E Note: 8.0.0 should match the switch plx 8533 prot 8,0,0 5. Update plx8525 ./plxtool /pci@400 -d 13.0.0 -B ./plx8525.bin -E Note: 13.0.0 should match the switch plx 8525 port 13,0,0 6. Check the plx 8533 switch's eeprom value data ./plxtool /pci@400 -i Searching for PLX devices this can take a couple minutes... last = 255 1 7 Found 5 devices, Select one from the list: 1 : PLX 8548 @ 2:0:0 2 : PLX 8112 @ 4:0:0 3 : PLX 8533 @ 8:0:0 4 : PLX 8525 @ 13:0:0 5 : PLX 8525 @ 14:0:0 Choice? 3 Select an option: r : Read registers R : Read SEEPROM registers w : Write user inputed data to registers W : Write user inputed data to SEEPROM registers B : Import data from binary file T : Import data from text file D : Diff image file against internal registers s : Save register data to file b : Change barnum v : Change verbosity o : Change options q : Quit R Enter the starting offset and number of words to read, use ':' as a seperator 0:40 Reading registers Complete Starting offset: 0x00000000, Ending offset: 0x000000a0 0x000000 | 0030005a 0000041f 201f0028 00380000 0x000010 | 0000241f 27f10030 00000000 0043007e 0x000020 | 047e0000 00000043 0043207e 247e0000 0x000030 | 00000043 ffffffff ffffffff ffffffff 0x000040 | ffffffff ffffffff ffffffff ffffffff 0x000050 | ffffffff ffffffff ffffffff ffffffff 0x000060 | ffffffff ffffffff ffffffff ffffffff 0x000070 | ffffffff ffffffff ffffffff ffffffff 0x000080 | ffffffff ffffffff ffffffff ffffffff 0x000090 | ffffffff ffffffff ffffffff ffffffff 7. Check the plx 8525 switch's eeprom value data ./plxtool /pci@400 -i Searching for PLX devices this can take a couple minutes... last = 255 1 7 Found 5 devices, Select one from the list: 1 : PLX 8548 @ 2:0:0 2 : PLX 8112 @ 4:0:0 3 : PLX 8533 @ 8:0:0 4 : PLX 8525 @ 13:0:0 5 : PLX 8525 @ 14:0:0 Choice? 4 Select an option: r : Read registers R : Read SEEPROM registers w : Write user inputed data to registers W : Write user inputed data to SEEPROM registers B : Import data from binary file T : Import data from text file D : Diff image file against internal registers s : Save register data to file b : Change barnum v : Change verbosity o : Change options q : Quit R Enter the starting offset and number of words to read, use ':' as a seperator 0:40 Reading registers Complete Starting offset: 0x00000000, Ending offset: 0x000000a0 0x000000 | 002a005a 0000041f 201f0000 00180000 0x000010 | 0000241f 27f10010 00000000 0043007e 0x000020 | 207e0000 00000043 0043247e 00000000 0x000030 | ffffffff ffffffff ffffffff ffffffff 0x000040 | ffffffff ffffffff ffffffff ffffffff 0x000050 | ffffffff ffffffff ffffffff ffffffff 0x000060 | ffffffff ffffffff ffffffff ffffffff 0x000070 | ffffffff ffffffff ffffffff ffffffff 0x000080 | ffffffff ffffffff ffffffff ffffffff 0x000090 | ffffffff ffffffff ffffffff ffffffff 8. Reboot the host to let the new plx switch 8533/8525 eeprom values be executed. CommentsBoards with the lengthened timer will be phased in to production, currently scheduled for February of 2010, via ECO release and dash roll.For information about FAB documents, its release processes, implementation strategies and billing information, go to the following URL: For Sun Authorized Service Providers go to: In addition to the above you may email: Internal Contributor/submitter Donald.Drygalski@Sun.COM Internal Eng Responsible Engineer Jim.Ye@Sun.COM Responsible Manager: Art.Weigel@Sun.COM Internal Services Knowledge Engineer Joe.Davis@Sun.COM Internal Eng Business Unit Group SSG NSN (Netra Systems and Networking) Internal Sun Alert & FAB Admin Info 14-Dec-2009: Completed draft and sent to Extended Review. 16-Dec-2009: No issues from Ext Rvw - sending to Publish. Attachments This solution has no attachment |
||||||||||||
|