Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type FAB (standard) Sure Solution 1020509.1 : FCO A0304-1: T6300 Blades with down rev foureyes chip firmware causing erroneous Failed, Hot Insertion and Removal messages.
PreviouslyPublishedAs 259708 Bug Id <SUNBUG: 6720809>, <SUNBUG: 6695705>, <SUNBUG: 6762599> Product Sun Blade T6300 Server Module Date of Resolved Release 06-Jul-2009 Foureyes chip firmware causing erroneous Failed, Hot Insertion and Removal messages (see details below). Affected X-Options: X5705A / 5714A X5706A / 5715A X5707A / 5716A X5708A / 5717A Affected Parts: 541-2317-06 (or below) 0 MB, 6-Core, UltraSPARC T1, 1.0GHz Sun Blade T6300 Server Module 541-2318-06 (or below) 0 MB, 8-Core, UltraSPARC T1, 1.0GHz Sun Blade T6300 Server Module 541-2319-06 (or below) 0 MB, 8-Core, UltraSPARC T1, 1.2GHz Sun Blade T6300 Server Module 541-2320-05 (or below) 0 MB, 8-Core, UltraSPARC T1, 1.4GHz Sun Blade T6300 Server Module ImpactSun T6300 Blade (A94) and chassis CMM exhibits erroneous Failed, Hot Insertion and Removal messages. While these messages do not effect system performance, functionality or reliability and are completely benign, some customers may not be comfortable with these messages as it creates a perception of instability of the blade/chassis product.Contributing FactorsThis issue manifests with the above listed T6300 Blade Module part numbers along with Blade sysFW 6.7.1 (or earlier).The problem is exacerbated by: - The number of T6300 blades in a system - The position of the T6300 Blades in relation to other blade types - The version of CMM hardware and CMM firmware in the system These messages will appear on a Sun Blade 6000 chassis with a minimum of one T6300 Blade, and become more prevalent as the blade count increases, if other blade types are present, and more so if the T6300 blades are located in slots above that of different blade types (i.e. T6300 in slots 6 and 7 while other blade(s) are in slots 0-5). Sun Blade 6000 chassis CMM, part number 371-1447-05 (or below), can increase susceptibility to the messaging. SymptomsWhen this issue manifests itself, the following messages appear when SP has been powered on.Examples of T6300 blade messages: SC Alert: PSU at MP/PS1 has been removed. SC Alert: PSU at MP/PS1 has been inserted. SC Alert: NEM at MP/NEM0 has been removed. SC Alert: PSU at MP/PS0 has been removed. SC Alert: PSU at MP/PS0 has been inserted. SC Alert: SYS_FAN at MP/FM3/FIN has FAILED. SC Alert: SYS_FAN at MP/FM3/FOUT has FAILED. SC Alert: SYS_FAN at MP/FM6/FIN has FAILED. SC Alert: SYS_FAN at MP/FM6/FOUT has FAILED. SC Alert: SYS_FAN at MP/FM7/FIN has FAILED. SC Alert: SYS_FAN at MP/FM7/FOUT has FAILED. SC Alert: NEM at MP/NEM1 has been removed. SC Alert: NEM at MP/NEM1 has been inserted. SC Alert: PSU at MP/PS0 has FAILED. SC Alert: PSU at MP/PS1 has FAILED. Examples of chassis CMM events: 1319 Thu Jan 1 00:01:56 1970 Chassis Action major Hot insertion of /CH/NEM1 1318 Thu Jan 1 00:01:56 1970 Chassis Action major Hot insertion of /CH/NEM0 1317 Thu Jan 1 00:01:56 1970 Chassis Action major Hot insertion of /CH/PS1 1316 Thu Jan 1 00:01:56 1970 Chassis Action major Hot insertion of /CH/PS0 1315 Thu Jan 1 00:01:56 1970 Chassis Action major Hot insertion of /CH/BL6 1314 Thu Jan 1 00:01:56 1970 Chassis Action major Hot insertion of /CH/BL5 Root CauseT6300 blades are being starved of I2C bus bandwidth and cannot reliably poll for NEM and PSU presence. The hardware does not provide direct PSU/NEM presence bits so the ALOM must poll for status across the I2C bus. ALOM interprets an access timeout as device not present. When bus access starvation occurs the erroneous device removed message is reported.This issue was addressed in Manufacturing via ECO# WO_39440 as of October 10, 2008 and in Services via GSAP 4795 beginning April 30, 2009. Corrective ActionWorkaround:In a chassis configuration populated with 1 to 3 T6300 blades, move these blades to the lower chassis slots and move blades of other types to the higher numbered slots. Also, see that CMM is at p/n 371-1447-06 (or above), which provides I2C bus bridge FourEyes chip firmware v1.3. Further ensure that CMM ILOM firmware is at 2.0.3.2 (or later), which provides reduced I2C bus traffic by utilizing VLAN protocol. As another temporary T6300 blade workaround so that the blade OS /var/adm/messages files don't get inundated with false messages, set sys_eventlevel to a 0 or a 1 in the T6300 ALOM. This will suppress the messages from being logged to OS /var/adm/messages logs. The insert/remove messages are "major" messages, and level 1 is for critical and 0 is for zero messages to be logged to the /var/adm/messages file. We recommend level 1 so that you still get critical messages logged. Make note that this will not stop the messages from being logged on the CMM or the T6300 blade SPs, as there is no way to suppress these. Resolution: Hot Swappable: No Upon failure only, replace as follows; . replace 541-2317-06 (or below) with 541-2317-07 (or above) . replace 541-2318-06 (or below) with 541-2318-07 (or above) . replace 541-2319-06 (or below) with 541-2319-07 (or above) . replace 541-2320-05 (or below) with 541-2320-06 (or above) Affected T6300 blade modules must be replaced as outlined above, which includes I2C bridge chip firmware v1.4 (or later). This firmware is not field upgradeable. Firmware v1.4 contains I2C bus Dynamic Arbitration and Dynamic Priority features (originally released in v1.3) that are needed for Round Robin Bus Access. All blades must use this polling mechanism to ensure fair access to the I2C Bus and prevents the timeouts that produce the false messaging. In addition to the required motherboard levels above the System Firmware must also be upgraded to sysfw 6.7.2 (or later), which contains ALOM code to enable the dynamic arbitration/priority feature within the bridge chip. Earlier production of T6300 blades use a static priority scheme based upon slot position for I2C Bus access. The Sun Blade 6000 CMM should be at Sun p/n 371-1447-06 (or above), and CMM ILOM firmware should be at 2.0.3.2 (or later). Please note only a very small number of Sun Blade 6000 CMM at Sun p/n 371-1447-05 (or below) were shipped to the field. If you are affected by this FCO and your Sun Blade 6000 CMM is at Sun p/n 371-1447-05 (or below), replace with Sun Blade 6000 CMM p/n 371-1447-06 (or above). To check the CMM you need to access the CMM as sunservice as shown in the below example. telnet dtnts214-248 7031 Trying 10.6.214.248... Connected to dtnts214-248.sfbay.sun.com. Escape character is '^]'. SUNCMM00144F6B9F76 login: sunservice Password: Copyright 2008 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. WARNING: The "sunservice" account is provided solely to allow Sun Services to perform diagnosis and recovery tasks. Customer use of the "sunservice" account may interfere with the correct operation of ILOM and is not supported other than to perform recovery procedures as documented by Sun Microsystems. Normal ILOM operations must be performed using other (non-"sunservice") accounts. Further usage of the "sunservice" account implies your agreement with these terms. [(flash)root@SUNCMM00144F6B9F76:~]#i2ct -t --info Four-Eyes v1.3 reg->STATUS@0x02 = 0x05 reg->BRID@0x06 = 0x02 reg->NXTBRID@0x07 = 0x00 reg->LSTBRID@0x08 = 0x00 reg->FLTRCTL@0x03 = 0x00 reg->DAID@0x05 = 0xE0 reg->DACTL@0x04 = 0x03 Important! Defective blades should be returned, with "FCO 304" written on the Defective Material Tag, as soon as possible to avoid any material availability issues. Identification of Affected Parts (how to): The following procedure is the only accurate method to identify affected blade(s); 1) Access the SP of the Target blade 2) Type "shownetwork" 3) Set the SP to ssqa mode, type... sc> setsc sc_ssqamode true xyz (xyz= last nibble from the three last bytes from the mac addr) 4) Get the Four Eyes firmware version... sc> i2cp 0x20 3 2 0 0 1. If the command returns a value == 12, the blade is impacted as described above. If the command returns a value == 14, the blade is not affected. 5) Upon finishing the data gathering above the sc_ssqamode parameter needs to be set back to false. sc> setsc sc_ssqamode false As a final note please refrain from using the showfru, ipmitool fru command, or even visual inspection to determine affected blades as it has been found that fruid dash level information and blade revision labeling can be inaccurate. Hardware Remediation and Material Availability Details: All Regions/Timezones were materially ready at the time of publication of this knowledge asset. Check with your Logistics Representative or TZ / Country / Area FCO Manager for more information with regard to material availability and the parts ordering process for this FCO. References: BugID: 6720809, 6695705, 6762599 Sun Alert: 248186 ECO: WO_39440 GSAP: 4550, 4795 For information about FAB documents, its release processes, implementation strategies and billing information, go to the following URL: For Sun Authorized Service Providers go to: In addition to the above you may email: Modification History Changes made since initial publication. 09-Jul-2009
Internal Contributor/submitter Michael.Tabor@Sun.COM Internal Eng Responsible Engineer John.Respicio@Sun.COM Responsible Manager: Randy.Luckenbihl@Sun.COM Internal Services Knowledge Engineer Joe.Davis@Sun.COM Internal Eng Business Unit Group SSG ES (Enterprise Systems) Internal Sun Alert & FAB Admin Info 21-May-2009: Finalized draft and sent to FCO Tiger Team for review. 06-Jul-2009: Material Readiness acquired - sending to Publish. Attachments This solution has no attachment |
||||||||||||
|