Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Sun Alert Sure Solution 1020306.1 : Limited Number of Sun Fire T2000 and SPARC Enterprise T2000 Servers may Experience a Shutdown with SC Alert "Chassis cover removed"
PreviouslyPublishedAs 255948 Bug Id <SUNBUG: 6815610> Product Sun Fire T2000 Server Date of Workaround Release 27-Mar-2009 Date of Resolved Release 16-Apr-2009 A limited number of Sun Fire T2000 and SPARC Enterprise T2000 servers may experience a shutdown with SC Alert: "Chassis cover removed" (see below for details) 1. ImpactA limited number of Sun Fire T2000 and SPARC Enterprise T2000 servers may experience a system shutdown after the System Controller (SC) Alert: "Chassis cover removed" is displayed on the console, causing system downtime.In addition, this issue may result in unnecessary hardware replacement. 2. Contributing FactorsThis issue can occur on the following platforms:
Note: This issue rarely occurs and has only been observed on the above mentioned T2000 servers. 3. SymptomsThe system will report the following errors on the system console, which will also be recorded in the ALOM logs. An example from 'showlogs -v' would be similar to the following:02:24:25: 0004007c: "System poweron is disabled." 02:24:25: 00040083: "Chassis cover removed." 02:24:25: 0004000e: "SC Request to Power Off Host Immediately." 02:24:26: 0004004f: "Indicator SYS/ACT is now STANDBY BLINK" 02:24:27: 0004007d: "System poweron is enabled." 02:24:31: 00040029: "Host system has shut down." As shown in the example, the key to identify this issue is that in the logs, the line "Chassis cover removed" will be followed by the line "SC Request to Power Off Host Immediately". If the line "SC Request to Power Off Host Immediately" is missing from the above message, then this is a different issue and may indicate a hardware condition with the cover interlock switch. 4. WorkaroundThere is no workaround. Please see Resolution below.5. ResolutionThis issue is addressed on the following platform:
This Sun Alert notification is being provided to you on an "AS IS" basis. This Sun Alert notification may contain information provided by third parties. The issues described in this Sun Alert notification may or may not impact your system(s). Sun makes no representations, warranties, or guarantees as to the information contained herein. ANY AND ALL WARRANTIES, EXPRESS OR IMPLIED, INCLUDING WITHOUT LIMITATION WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE, OR NON-INFRINGEMENT, ARE HEREBY DISCLAIMED. BY ACCESSING THIS DOCUMENT YOU ACKNOWLEDGE THAT SUN SHALL IN NO EVENT BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, PUNITIVE, OR CONSEQUENTIAL DAMAGES THAT ARISE OUT OF YOUR USE OR FAILURE TO USE THE INFORMATION CONTAINED HEREIN. This Sun Alert notification contains Sun proprietary and confidential information. It is being provided to you pursuant to the provisions of your agreement to purchase services from Sun, or, if you do not have such an agreement, the Sun.com Terms of Use. This Sun Alert notification may only be used for the purposes contemplated by these agreements. Copyright 2000-2009 Sun Microsystems, Inc., 4150 Network Circle, Santa Clara, CA 95054 U.S.A. All rights reserved. Modification History 01-Apr-2009: updated Workaround section 16-Apr-2009: updated Contributing Factors and Resolution sections. Now Resolved. Internal Eng Responsible Engineer steve.trullo@sun.com, grant.gredvig@sun.com Internal Contributor/submitter Dencho.Kojucharov@sun.com Internal Services Knowledge Engineer karen.edwards@sun.com Internal Eng Business Unit Group SSG WGS (Workgroup Systems) Internal Sun Alert & FAB Admin Info 26-Mar-2009: karen, created based on FAB and request from customer. Sending to sunalert_review today. Internal Comments (for SAs) Root Cause The suspected root cause is invalid CI (Chassis Intrusion) bit read from the ADM1026, either caused by i2c corruption or low ADM1026 CI pin noise tolerance. Also, the ALOM shutdown (based on SystemPowerON check) after failed Read from ADM1026 should be disabled, because in a real CI, the FPGA will have already turned off power. So the poweron check, in conjuction with the root cause (i2c corruption or over-sensitive adm1026 CI pin), causes the host to power off with the message "SC Request to Power Off Host Immediately". A firmware patch is being developed to permit up to three retry reads to ADM1026, with clear in between to confirm status. If ALOM is still reporting a chassis cover problem after 3 tries, it will display a message, but will NOT shutdown the box. Per CR 6815610: The fix for this will involve 4 changes: a) there should be multiple retries for the read, 3 retries seems reasonable. b) the error message should only be printed after the last failed retry c) the message text should change to "Chassis cover interlock open" It needs to be investigated if this can be done by changing just the text or if a new event needs to be created. d) if the fault has cleared by the next monitoring cycle a fault cleared message should be printed "Chassis cover interlock is OK" see also FAB 254469 http://sunsolve.sun.com/search/document.do?assetkey=1-63-254469-1 Attachments This solution has no attachment |
||||||||||||
|