Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-73-1001307.1
Update Date:2010-09-01
Keywords:

Solution Type  FAB (standard) Sure

Solution  1001307.1 :   Power Supply Fan Failures can Occur Without Notification in Sun Fire 3800, 4800, 4810, and 6800 Systems  


Related Items
  • Sun Fire 3800 Server
  •  
  • Sun Fire 6800 Server
  •  
  • Sun Fire 4800 Server
  •  
  • Sun Fire 4810 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Sun FAB>Standard>Controlled Proactive
  •  

PreviouslyPublishedAs
201768


Product
Sun Fire 3800 Server
Sun Fire 4800 Server
Sun Fire 4810 Server
Sun Fire 6800 Server

Bug Id
<SUNBUG: 6405762>

Part
  • Part No: 300-1529
  • Part Description: AC-48VDC PS - A145E
Part
  • Part No: 300-1460
  • Part Description: AC-48VDC PS - A153
Part
  • Part No: 300-1459
  • Part Description: AC-48VDC PS - A152
Part
  • Part No: 300-1441
  • Part Description: AC-48VDC PS - A145
Xoption
  • Xoption Number: X4303A
  • Xoption Description: A145, AC-48VDC PS
Xoption
  • Xoption Number: X4301A
  • Xoption Description: A153, AC-48VDC PS
Xoption
  • Xoption Number: X4302A
  • Xoption Description: A152, AC-48VDC PS

Impact

When a fan fails on a power supply with firmware prior to 5.20.2, the power supply will not normally increase in temperature enough to reach the threshold for a warning message to be issued. Thus there is no indication that a fan on a power supply has failed.

When fans fail on additional power supplies, the temperatures of the affected power supplies may rise enough to trigger the warning messages, but the appearance of these messages may be only a matter of minutes before the platform shuts down because of the rise in temperature.

As a result affected platforms will shutdown with very little warning.


Contributing Factors

Use the "showboards" command from the SC (as shown in the example below) and reference the column labeled "Component Type" to see if the platform has any of the power supply models listed in the parts affected section of this FAB.

sc0:SC> showboards
Slot     Pwr Component Type                 State      Status     Domain
----     --- --------------                 -----      ------     ------
...
PS0      On  A152 Power Supply              -          OK         -
PS1      On  A152 Power Supply              -          OK         -
PS2      On  A152 Power Supply              -          OK         -

 


Symptoms

The error messages that appear once there are multiple fans failed and potentially minutes before the platform shutdown will look similar to the following:

Feb  6 19:32:31 sc0 Platform.SC: WARNING: PS2 temperature is approaching max limit of 78C
Feb  6 19:32:32 sc0 Platform.SC: PS2 48 VDC 0 Temp. 0 value: 68 Degrees C
Feb  6 19:32:32 sc0 Platform.SC: Check for abnormal environmental operating conditions.
Feb  6 19:32:32 sc0 Platform.SC: PS2, sensor status, outside acceptable limits (7,1,0x605020b00030000)

To determine if power supply fan failures have contributed to or caused the shut down of a platform, it is necessary to visually inspect the power supply fans to see if any have either stopped spinning or are spinning at a significantly reduced speed.


Root Cause

Certain power supplies do not have the feature that unaffected power supplies have which results in the power supply being shutdown when the fan on the power supply fails.

An early warning messages is now provided in firmware 5.20.2 and later. This fix is also expected to be back ported to 5.19.x, but currently is not included in 5.19.6.  

It has been determined through extensive testing that the best indicator of a failed power supply fan is when one power supply in the platform reports a temperature that is at least 10 degrees C above the others. When this scenario occurs with the new firmware installed the following warning messages will be issued:

Feb  6 19:32:31 sc0 Platform.SC: WARNING: PSx temperature is elevated indicating it may have a failed cooling fan.
Feb  6 19:32:32 sc0 Platform.SC: PSx 48 VDC 0 Temp. 0 value: xx Degrees C
Feb  6 19:32:32 sc0 Platform.SC: Contact Sun Support Services to check for PSU fan failure.

 


Resolution

For platforms that contain power supplies affected by this issue:

First, visually inspect fans in the power supplies and replace any that have either stopped spinning or are spinning at a noticeably reduced speed.  If visual inspection is not possible or you are unsure whether a fan is spinning properly, the following observations can be made:

Hold a piece of paper in front of the vent to determine air flow.

  • 3800 Normal PSU fan - Blows air out
  • 3800 PSU fan failure - Sucks air in or no movement of air
  • 4800/6800 Normal PSU fan - Sucks air in
  • 4800/6800 PSU fan fail - No air movement

Second, upgrade to 5.20.2 firmware (available in patch 114527-03 or later) so that a failed power supply fan will produce warning messages before the Power Supply reaches the over temperature threshold.

Keep in mind the new firmware does not have data to positively prove the power supply fan has failed when it prints the warning. It has recognized that the there is a situation for which a failed power supply fan is by far the most likely scenario. Visual verification of the slowly spinning or stopped fan(s) is still required to determine root cause.

Third, the new firmware (5.20.2 and later) will detect failing fans and the PSU's should be replaced when detected and/or verified visually. There is no need to replace PSU's that are functioning properly and that are not identified by the firmware.


Previously Published As
102577
Internal Comments
Related Information
  • Other: SRDB 83819

Internal Contributor/submitter
Roy.Stiles@sun.com

Internal Eng Business Unit Group
SSG ES (Enterprise Systems)

Internal Eng Responsible Engineer
Darrell.May@sun.com

Internal Services Knowledge Engineer
Sean.Hassall@sun.com

Internal Resolution Patches
114527-03

Internal Kasp FAB Legacy ID
102577

Internal Sun Alert & FAB Admin Info
Critical Category:
Significant Change Date: 2006-09-05
Avoidance: Firmware
Responsible Manager: David.Re@sun.com
Original Admin Info: [WF 05-Sep-2006, Sean Hassall: Patch is now available - sending to Joe for approval]
[WF 23-Aug-2006, Sean Hassall: sending to extended review]
[WF 22-Aug-2006, Sean Hassall: made some minor grammar changes]

Internal SA-FAB Eng Submission
With firmware prior to 5.20.2, when fans fail in power supplies identified in this FAB there is no notification. This can lead to an unexpected platform outage.

Product_uuid
29d05214-0a18-11d6-92b2-a111614865b5|Sun Fire 3800 Server
29d3a694-0a18-11d6-92da-df959df44cdd|Sun Fire 4800 Server
29d6f808-0a18-11d6-8aa8-943929fbbdd8|Sun Fire 4810 Server
29da7938-0a18-11d6-8a41-9ed1ad6d6779|Sun Fire 6800 Server

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback