Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type FAB (standard) Sure Solution 1001149.1 : USIV+ (Panther and Jaguar) System Boards could experience POST Failure if Board Replacement Procedure is not correctly followed
PreviouslyPublishedAs 201540 Product Sun Fire 12K Server Sun Fire E20K Server Sun Fire 15K Server Sun Fire E25K Server Bug Id <SUNBUG: 6426425> Part
Impact If the environmental status monitoring daemon (esmd) does not log the removal/insertion of System Boards, corruption of the System Board's SEEPROM can occur. If the aforementioned happens, POST fails with: FAIL E$Dimm SBx/Px/Ex: Not compatible with US-IV+ processor Subsequent POST will report: CHS reports E$Dimm SB0/P0/E0 status NOT_GOOD. Treating as blacklisted. Users are then unable to clear the CHS status via "setchs -s OK -r OK -c SBx/Px/Ex" - it returns the error: ERROR: cannot set status. FRU error 2: I/O error (this component may not exist, may be powered off, or the FRU may be corrupt).
Symptoms
Root Cause SEEPROM corruption can be triggered in a number of ways. In this case, a board is removed, but before esmd can notice the board removal a different board is inserted. Because esmd does not log the remove/insert events, it does not clear out the SEEPROM cache in picld, or frad, nor any of the pending SEEPROM events for the board. When frad next updates, it writes to the wrong location and corrupts the SEEPROM. Another failure signature related to this scenario is: FAIL Proc SB2/P0: Serial number of CPU (80000228.B850D4EF) doesn't match data in board SEEPROM (0000003E.77240482). Additional validation of SEEPROM corruption can be displayed via "prtfru" command: xc88-sc0:sms-svc:137> prtfru ex15?Label=ex15/EXB/sb15?Label=sb15/CPU/p0?Label=p0/e1?Label=e1/ECACHE/frutree/chassis /CP/ex15?Label=ex15/EXB/sb15?Label=sb15/CPU/p0?Label=p0/e1?Label=e1/ECACHE SEGMENT: ID SEGMENT: FD Error processing data in segment "FD": IO error SEGMENT: ED /Fru_Type: 0A04 (unrecognized value)
Workaround The maintenance procedure below is outlined in the document, Sun Fire 15K/12K Systems Service Manual, and should be followed: The proper FRU swap procedure should always be used when removing/inserting boards from/into the platform. It is required that service engineers wait for ESMD logging of the remove or insert message before additional actions are taken. Utilize "showlogs -F" to monitor platform message events such as the examples below: esmd[4326]: [50141 744994421110 NOTICE Cabinet.cc 240] V3CPU at SB17 has been removed. esmd[4326]: [50141 776011024237 NOTICE Cabinet.cc 207] V3CPU at SB17 has been inserted. These messages are logged to the platform message log located in /var/opt/SUNWSMS/adm/platform. ESMD polls for board insertion or removal every 30 seconds, but it may take up to 1.5 minutes before the message is logged (depending upon the SC load). Note 1: Due to the issue described in bug 6426425, USIV+ boards may have their SEEPROM corrupted if the above proper procedure is not followed. If the board becomes "corrupted" and exhibits Failure Signature "FAIL E$Dimm SBx/Px/Ex: Not compatible with US-IV+ processor" Please Escalate to Technology Service Center, to request Internal 'tool' that repairs corrupted SEEPPROM container, and restores board functionality. Exception: Failure Signature "Serial number of CPU (80000228.B850D4EF) doesn't match data in board SEEPROM" cannot be corrected with the Internal 'tool'. The System Board will have to be replaced. Note 2: None of the following actions taken in the field will correct the SEEPROM corruption, the System Board will require replacement.
Resolution
Modification History Date: 23-AUG-2006
Date: 09-OCT-2006
Date: 22-JAN-2007
Previously Published As 102488 Internal Comments Upon Verification of the esmd logged insertion event, the board may be powered on and POSTed. Definitons of FD and ED segments
Recommended Section: ReadWrite (Dynamic) Readable By: All Writable By: All settings other than Ops/Repair Lifetime: Forever or Field Dynamic Data Segment Size Bytes: 2949 Tagged Data Assigned to Segment: ECO_CurrentR,Customer_DataR,InstallationR,Soft_ErrorsR, Status_EventsR Status: Active Formater Error Type: Error
Reccomended Section: Read Only (Static) Readable By: Sun Only Writable By: Ops/Repair Lifetime: Forever Dynamic Data Segment Size Bytes: n/a Tagged Data Assigned to Segment: n/a Status: Active Formater Error Type: Error Related Information
Internal Contributor/submitter Scott.Barnard@sun.com Internal Eng Business Unit Group SSG ES (Enterprise Systems) Internal Eng Responsible Engineer Alex.Aftandilian@Sun.COM Internal Services Knowledge Engineer sean.hassall@sun.com, joe.davis@sun.com Internal Escalation ID 1-15782131 Internal Kasp FAB Legacy ID 102488 Internal Sun Alert & FAB Admin Info Critical Category: Significant Change Date: 2006-06-29 Avoidance: Service Procedure Responsible Manager: Mary.Vigil@Sun.COM Original Admin Info: WF - updated Note 1 and added exception in Corrective Action section per Scott Barnard (1/22/07) - Joe WF - i added Joe to KE list instead of me - 25-Jul-07 karen Product_uuid 077fd4c5-df8f-4320-ad69-7d01603a674d|Sun Fire 12K Server 1404a2d3-059a-11d8-84cb-080020a9ed93|Sun Fire E20K Server 29e4659c-0a18-11d6-9fa1-e67bbc033df8|Sun Fire 15K Server d842dd03-059b-11d8-84cb-080020a9ed93|Sun Fire E25K Server Attachments This solution has no attachment |
||||||||||||
|