Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type FAB (standard) Sure Solution 1001307.1 : Power Supply Fan Failures can Occur Without Notification in Sun Fire 3800, 4800, 4810, and 6800 Systems
PreviouslyPublishedAs 201768 Product Sun Fire 3800 Server Sun Fire 4800 Server Sun Fire 4810 Server Sun Fire 6800 Server Bug Id <SUNBUG: 6405762> Part
Impact When a fan fails on a power supply with firmware prior to 5.20.2, the power supply will not normally increase in temperature enough to reach the threshold for a warning message to be issued. Thus there is no indication that a fan on a power supply has failed. When fans fail on additional power supplies, the temperatures of the affected power supplies may rise enough to trigger the warning messages, but the appearance of these messages may be only a matter of minutes before the platform shuts down because of the rise in temperature. As a result affected platforms will shutdown with very little warning. Contributing Factors Use the "showboards" command from the SC (as shown in the example below) and reference the column labeled "Component Type" to see if the platform has any of the power supply models listed in the parts affected section of this FAB. sc0:SC> showboards Slot Pwr Component Type State Status Domain ---- --- -------------- ----- ------ ------ ... PS0 On A152 Power Supply - OK - PS1 On A152 Power Supply - OK - PS2 On A152 Power Supply - OK -
Symptoms The error messages that appear once there are multiple fans failed and potentially minutes before the platform shutdown will look similar to the following: Feb 6 19:32:31 sc0 Platform.SC: WARNING: PS2 temperature is approaching max limit of 78C Feb 6 19:32:32 sc0 Platform.SC: PS2 48 VDC 0 Temp. 0 value: 68 Degrees C Feb 6 19:32:32 sc0 Platform.SC: Check for abnormal environmental operating conditions. Feb 6 19:32:32 sc0 Platform.SC: PS2, sensor status, outside acceptable limits (7,1,0x605020b00030000) To determine if power supply fan failures have contributed to or caused the shut down of a platform, it is necessary to visually inspect the power supply fans to see if any have either stopped spinning or are spinning at a significantly reduced speed. Root Cause Certain power supplies do not have the feature that unaffected power supplies have which results in the power supply being shutdown when the fan on the power supply fails. An early warning messages is now provided in firmware 5.20.2 and later. This fix is also expected to be back ported to 5.19.x, but currently is not included in 5.19.6. It has been determined through extensive testing that the best indicator of a failed power supply fan is when one power supply in the platform reports a temperature that is at least 10 degrees C above the others. When this scenario occurs with the new firmware installed the following warning messages will be issued: Feb 6 19:32:31 sc0 Platform.SC: WARNING: PSx temperature is elevated indicating it may have a failed cooling fan. Feb 6 19:32:32 sc0 Platform.SC: PSx 48 VDC 0 Temp. 0 value: xx Degrees C Feb 6 19:32:32 sc0 Platform.SC: Contact Sun Support Services to check for PSU fan failure.
Resolution For platforms that contain power supplies affected by this issue: First, visually inspect fans in the power supplies and replace any that have either stopped spinning or are spinning at a noticeably reduced speed. If visual inspection is not possible or you are unsure whether a fan is spinning properly, the following observations can be made: Hold a piece of paper in front of the vent to determine air flow.
Second, upgrade to 5.20.2 firmware (available in patch 114527-03 or later) so that a failed power supply fan will produce warning messages before the Power Supply reaches the over temperature threshold. Keep in mind the new firmware does not have data to positively prove the power supply fan has failed when it prints the warning. It has recognized that the there is a situation for which a failed power supply fan is by far the most likely scenario. Visual verification of the slowly spinning or stopped fan(s) is still required to determine root cause. Third, the new firmware (5.20.2 and later) will detect failing fans and the PSU's should be replaced when detected and/or verified visually. There is no need to replace PSU's that are functioning properly and that are not identified by the firmware. Previously Published As 102577 Internal Comments Related Information
Internal Contributor/submitter Roy.Stiles@sun.com Internal Eng Business Unit Group SSG ES (Enterprise Systems) Internal Eng Responsible Engineer Darrell.May@sun.com Internal Services Knowledge Engineer Sean.Hassall@sun.com Internal Resolution Patches 114527-03 Internal Kasp FAB Legacy ID 102577 Internal Sun Alert & FAB Admin Info Critical Category: Significant Change Date: 2006-09-05 Avoidance: Firmware Responsible Manager: David.Re@sun.com Original Admin Info: [WF 05-Sep-2006, Sean Hassall: Patch is now available - sending to Joe for approval] [WF 23-Aug-2006, Sean Hassall: sending to extended review] [WF 22-Aug-2006, Sean Hassall: made some minor grammar changes] Internal SA-FAB Eng Submission With firmware prior to 5.20.2, when fans fail in power supplies identified in this FAB there is no notification. This can lead to an unexpected platform outage. Product_uuid 29d05214-0a18-11d6-92b2-a111614865b5|Sun Fire 3800 Server 29d3a694-0a18-11d6-92da-df959df44cdd|Sun Fire 4800 Server 29d6f808-0a18-11d6-8aa8-943929fbbdd8|Sun Fire 4810 Server 29da7938-0a18-11d6-8a41-9ed1ad6d6779|Sun Fire 6800 Server Attachments This solution has no attachment |
||||||||||||
|