Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type FAB (standard) Sure Solution 1000049.1 : On Sun Blade 8000/8000P chassis, the Rear Fan Module may cycle OK on and off repeatedly.
PreviouslyPublishedAs 200065 Product Sun Blade 8000 Modular System Bug Id <SUNBUG: 6501084> Impact The problem is that when a rear fan module is hot plugged into the system, the fan CRU being inserted may not be able to start up. This causes the system to see the CRU as alternately present and absent in a repeating cycle. The "OK" LED associated with the fan CRU will toggle between green and off alternately. The fan will not come on-line. This problem is specific to fans made by Delta (black triangle logo) with part # PFC1248DE. The system may respond by reporting a fan fault condition and increasing the speed on all the other fans in the system to maximum. The affected fan will appear as absent in the system inventory. The other fans will cycle up and down in speed repeatedly while the affected fan is in this off-line state. Contributing Factors Product:
with Parts:
The problem is exacerbated in minimally configured systems. Systems with only 1 or 2 blades installed are more susceptible to this problem than a fully loaded system. This is because small configs do not allow much airflow through the fans allowing large back pressure to be created in the fan plenum area. The affected fans are made by Delta. These fans have a black triangle logo which can be viewed from the exhaust opening in the fan CRU. The fan part number is PFC1248DE which can also be viewed through the grate. Symptoms The affected fan CRU LED will be toggling between green (OK) and off. The fan speeds will be oscillating between max and normal roughly every 10 seconds. The fan CRU will display as Absent in the GUI chassis view. The following entries are added to the CMM unified log, and blade SEL (filling the latter up pretty quickly requiring an F1 BIOS intervention): -> show list /CMM/logs/event/list Targets: Properties: Commands: show ID Date/Time Class Type Severity ----- ------------------------ -------- -------- -------- 97710 Tue Dec 5 18:41:33 2006 IPMI Log critical ID = 18ef : 12/05/2006 : 18:41:33 : Entity Presence : /LCMM/RFM1_PRSNT : Device Absent 97709 Tue Dec 5 18:41:27 2006 Chassis Action major Hot removal of /CH/RFM1 97708 Tue Dec 5 18:41:25 2006 IPMI Log critical ID = 18ee : 12/05/2006 : 18:41:25 : Fan : /RFM1/FAN1_ERR : Predictive Failure Asserted 97707 Tue Dec 5 18:41:25 2006 IPMI Log critical ID = 18ed : 12/05/2006 : 18:41:25 : Entity Presence : /LCMM/RFM1_PRSNT : Device Present 97706 Tue Dec 5 18:41:25 2006 Chassis Action major Hot insertion of /CH/RFM1 97705 Tue Dec 5 18:41:24 2006 Chassis Action major Hot removal of /CH/RFM1 97704 Tue Dec 5 18:41:22 2006 Chassis Action major Hot insertion of /CH/RFM1
Root Cause The cause of this problem is due to the fact that when a fan is removed, large amounts of air flows into the vacant fan opening. When a fan is installed, this recirculation air flow is directed through the fan CRU causing the fans within to start spinning backwards. When the fan power connector finally makes contact, the fans soft start circuit tries to start the fan. It can be spinning backwards so quickly that it cannot get up to speed before it times out. This time-out is a locked rotor detection mechanism provided by the fan motor controller to protect the fan motor in case it is mechanically stuck. When the chassis is lightly loaded (just one or 2 blades), there is not much air flow through the fans. This makes the vaccuum in the fan plenum higher causing the fan being inserted to have more difficulty starting. This has been noted more often on SB8000 P (codename A14) chassis, but can also happen on SB8000 chassis (codename A19) as well. The fan is being modified to provide a braking function to slow it's reverse rotation prior to attempting start-up. This has been proven effective with the samples provided. The updated fan part number is not known at this time. Resolution A fan having start-up trouble in this way, can be helped to start by blocking the reverse air flow through the fan. Blocking the fan grate with your hand or some other solid object of similar size and shape as the Rear Fan should be enough to prevent the reverse air flow such that fan is able to start properly overcoming the issue. Power cycling the chassis (stop all blades, stop /CH from CMM, remove all power cords, and re-plug after 1 minute) will also fix the problem. Modification History Date: 29-MAR-2007
Date: 08-JUN-2007
Previously Published As 102861 Internal Comments No FCO is planned. It is a reactive fix on fail and can be fixed using the procedure outlined ie covering the fan with your hand when inserting. The issue would only be expected to be seen if a customer removes the fan module and puts it back in live which would only be expected to occur on a fan CRU replacement. Updated fan's that are not affected by this issue are being shipped in new systems, however a small quantity of affected fan's may still be in spares stock. It is recommended the customer or field engineer check the fan model prior to inserting a replacement Rear Fan and follow the procedure above to cover with their hand, or make it a best practice to do this procedure on all Rear Fan replacements. Internal Contributor/submitter Oliver.Sharwood@Sun.COM Internal Eng Business Unit Group NSG (Network Systems Group) Internal Eng Responsible Engineer scott.bleiweiss@sun.com Internal Services Knowledge Engineer karen.edwards@sun.com Internal Kasp FAB Legacy ID 102861 Internal Sun Alert & FAB Admin Info Critical Category: Significant Change Date: 2007-03-28 Avoidance: Service Procedure Responsible Manager: Nick Laplaca Original Admin Info: null Internal SA-FAB Eng Submission Field Action Bulletin (FAB) BLANK Submittal Template For Preliminary, Non-Hardware and Hardware FABs -- After following instructions for drafting a Field Action Bulletin, send this ASCII Text filled-in template to: FAB_Submit@sun.com The email Subject line should be similar to the following: Draft Field Action Bulletin / Synopsis ... --------------------------------------------------------------- Synopsis: Rear Fan Module cycles OK on and off repeatedly Avoidance: [ ] Binary [ ] T-Patch [ ] Patch [ ] Firmware [ ] Hardware [ ] Upgrade [ ] Workaround [ ] Reconfiguration [X] Service Procedure [ ] None [ ] Hardware Replacement (Must follow SMI FCO Process) * * Note:Please follow the below link for details on how to submit an FCO; http://sunwebcollab.central.sun.com/gm/folder-1.11.811470 Implementation: [X] Reactive (Upon Failure) [ ] Controlled Proactive (H/W FABs require Customer List and Customer Letter) [ ] Mandatory (Requires Customer List and Customer Letter) Product: (Mktg Part Number / Description) A81 Sun Blade[tm] 8000 Modular System A82 Sun Blade[tm] 8000 P Modular System Affected X-Options: (Xoption Part Number / Description) Affected Parts: (FRU/CRU Part Number / Description) F541-0385-01 FRU ASSY,REAR FAN MODULE Issue Description: Impact [This should explain the impact to the running system in general and the impact to the affected component in particular; in terms of outage, downtime, loss of availability, loss of data, etc. State the actual impact, for example, the issue causes a system panic, reset, hang etc. If this is a serviceability issue, state how it affects the ability to service or maintain the product.] The problem is that when a rear fan module is hot plugged into the system, the fan CRU being inserted may not be able to start up. This causes the system to see the CRU as alternately present and absent in a repeating cycle. The "OK" LED associated with the fan CRU will toggle between green and off alternately. The fan will not come on-line. This problem is specific to fans made by Delta (black triangle logo) with part # PFC1248DE. The system may respond by reporting a fan fault condition and increasing the speed on all the other fans in the system to maximum. The affected fan will appear as absent in the system inventory. The other fans will cycle up and down in speed repeatedly while the affected fan is in this off-line state. Contributing Factors [List anything, such as specific operating environments and/or configurations that may contribute to the issue.] The problem is exacerbated in minimally configured systems. Systems with only 1 or 2 blades installed are more susceptable to this problem than a fully loaded system. This is because small configs do not allow much airflow through the fans allowing large back pressure to be created in the fan plenum area. Symptoms [Provide exact error messages and where and when the error messages are likely to occur.] The affected fan CRU LED will be toggling between green (OK) and off. The fan speeds will be oscillating between max and normal roughly every 10 seconds. The fan CRU will display as Absent in the GUI chassis view. The following entries are added to the CMM unified log, and blade SEL (filling the latter up pretty quickly requiring an F1 BIOS intervention): -> show list /CMM/logs/event/list Targets: Properties: Commands: show ID Date/Time Class Type Severity ----- ------------------------ -------- -------- -------- 97710 Tue Dec 5 18:41:33 2006 IPMI Log critical ID = 18ef : 12/05/2006 : 18:41:33 : Entity Presence : /LCMM/RFM1_PRSNT : Device Absent 97709 Tue Dec 5 18:41:27 2006 Chassis Action major Hot removal of /CH/RFM1 97708 Tue Dec 5 18:41:25 2006 IPMI Log critical ID = 18ee : 12/05/2006 : 18:41:25 : Fan : /RFM1/FAN1_ERR : Predictive Fai lure Asserted 97707 Tue Dec 5 18:41:25 2006 IPMI Log critical ID = 18ed : 12/05/2006 : 18:41:25 : Entity Presence : /LCMM/RFM1_PRSNT : Device Present 97706 Tue Dec 5 18:41:25 2006 Chassis Action major Hot insertion of /CH/RFM1 97705 Tue Dec 5 18:41:24 2006 Chassis Action major Hot removal of /CH/RFM1 97704 Tue Dec 5 18:41:22 2006 Chassis Action major Hot insertion of /CH/RFM1 Root Cause [This is ultimate cause of the issue and can be provided in engineering/technical terms.] [Explain how and when the issue was resolved by manufacturing, engineering or by the vendor.] The cause of this problem is due to the fact that when a fan is removed, large amounts of air flows into the vacant fan opening. When a fan is installed, this recirculation air flow is directed through the fan CRU causing the fans within to start spinning backwards. When the fan power connector finally makes contact, the fans soft start circuit tries to start the fan. It can be spinning backwards so quickly that it cannot get up to speed before it times out. This time-out is a locked rotor detection mechanism provided by the fan motor controller to protect the fan motor in case it is mechanically stuck. When the chassis is lightly loaded (just one or 2 blades), there is not much air flow through the fans. This makes the vaccuum in the fan plenum higher causing the fan being inserted to have more difficulty starting. This has been noted more often on SB8000 P (codename A14) chassis, but can also happen on SB8000 chassis (codename A19) as well. The fan is being modified to provide a braking function to slow it's reverse rotation prior to attempting start-up. This has been proven effective with the samples provided. The updated fan part number is not known at this time. Corrective Action: Supported Workaround (if available) Final Resolution [Provide recommended action for Sun Field personnel to follow in order to implement this fix in the field.] [A detailed step-by-step procedure for implementing the fix, or high level instructions together with pointers to specific documentation pages.] [List all relevant product manuals, documentation and URL's which will help to implement the Corrective Action.] A fan having start-up trouble in this way, can be helped to start by blocking the reverse air flow through the fan. If the speed of the other chassis fans is relatively low, then blocking the fan grate with your hand may be sufficient. If the fan speeds are higher, a better method is to put a piece of paper or cardboard over the fan. The paper will initially be sucked against the fan CRU and will blow off once the fan starts. Power cycling the chassis (stop all blades, stop /CH from CMM, remove all power cords, and re-plug after 1 minute) will also fix the problem. Identification of Affected Parts (how to): [State whether the affected components require visual inspection.] [Explain how the field would identify the "bad" or affected parts from the "good" or fixed parts.] [Give precise commands, syntax, and sample output.] [State which Explorer files/directories might also be utilized for determining affected parts/product.] The affected fans are made by Delta. These fans have a black triangle logo which can be viewed from the exhaust opening in the fan CRU. The fan part number is PFC1248DE which can also be viewed through the grate. Hardware Remediation and Material Availability Details: (For Hardware FABs only) [List estimated dates seedstock material will be available in each Timezone. This information should be acquired from the Services Logistics representative.] Comments: [List anything specific to this asset that isn't already listed above.] Engineering has designed an "instruction label" which is to be assembled onto the fan CRU for fans being held as replacements. This label is to be left on the fan CRU until after the fan is installed. The instructions on the label indicate the label should be kept in place until after the fan is installed and has started. Fans that are held as replacement stock will come with these labels attached. That does not solve the problem of a customer removing and replacing an existing fan in their chassis however. References: * BugID: CR6501084: Delta rear fan modules sometimes experience startup problems * Escalation ID: * Sun Alert: * Pending Patches: * Resolution Patches: * Other References: * Reference Manual: * Related URL(s): * ECO: * GSAP: * WW Stop Ship: * Radiance ID: Contacts: * Contributor: Oliver Sharwood * Responsible Engineer: Scott Bleiweiss * Responsible Manager: Nick Laplaca * Business Unit Group: [ ] SSG WGS (Workgroup Systems) [ ] SSG NSN (Netra Systems and Networking) [ ] SSG ES (Enterprise Systems) [ ] SSG SW (Platform Software) [ ] SSG PNP (Processor) [X] NSG (Network Systems Group) [ ] NWS (Network Storage) [ ] OP/N1 RPE (Operating Platforms/N1 Revenue Product Engin.) [ ] JPSE (Java Platform Sustaining Engineering) [ ] JWSSE (Java Web Services Sustaining Engineering) [ ] USG (User Software Group) [ ] Other - Please specify Product_uuid 42c5a02e-c0f1-11da-857a-080020a9ed93|Sun Blade 8000 Modular System Attachments This solution has no attachment |
||||||||||||
|