Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type FAB (standard) Sure Solution 1021661.1 : J4400 SIM cards randomly failing due to heartbeat timeout.
PreviouslyPublishedAs 273189 Oracle Confidential (PARTNER). Do not distribute to customers Reason: FABs available to Partners and Internals only.
Applies to:Sun Storage 7310 Unified Storage SystemSun Storage 7410 Unified Storage System Sun Storage J4400 Array All Platforms __________ SUNBUG 6803801 Affected Parts: 375-3584 - J4400 SAS Interface Module (SIM) SymptomsThis SIM failure is indicated by a blue LED on the failed SIM (visible from the rear of the chassis). The failure will also be visible by viewing the number of paths associated with a particular JBOD in the "BUI Maintenance->Hardware" view. JBODs with a failed SIM will report only 1 path instead of the usual 2 paths. The combination of a lit blue LED on the SIM and the missing path in the "Maintenance->Hardware" view is the definitive symptom of this condition. Additionally, the Back view of the JBOD chassis will show the failed SIM as missing. At the time of failure, the appliance will log an alert as in the below example;The component 'SIM (0|1)' has been removed from chassis 'XYZ' Impact J4400 SIM cards randomly failing due to heartbeat timeout causes one of the two SIM modules in a JBOD to go offline, indicated by a blue light on failed SIM. Once failed, the JBOD has only one path available to connect the appliance head to the disks. Re-seating the failed SIM clears this issue. ChangesContributing FactorsThe above listed products running SIM firmware less than 3R24 are subject to this issue. The SIM failure condition is sporadic in nature. Customers with larger configurations tend to see this issue more than smaller configurations because each additional JBOD adds additional exposure. Among large configurations, some customers see this problem more often than others. Because manual intervention is required to clear the failure (re-seating the SIM module), customers who don't notice this failure tend to stack up failures on multiple JBODs over time. CauseRoot CauseThe SIM failure is caused by a missed heartbeat signal. The SIM that detects the heartbeat timeout takes the action of disabling it's peer (assuming that it is hung or otherwise non-functional). See CR 6803801 for more details. Sun engineering has very strong evidence to suggest that upgrading the SIM firmware to 3R24 resolves this issue. SolutionWorkaround Manually re-seat the failed SIM card. This may be done while the system is running, but care should be taken not to disturb the cabling to the remaining SIM or to other JBODs in the chain. Resolution Firmware 3R24 must be loaded on each attached JBOD SIM card in order to resolve the "Blue Light Special" issue. Firmware 3R24 is bundled with Appliance SW 2010.Q1 or later and is automatically updated once the Appliance SW is installed. For installing Sun Storage 7000 Software Update 2010.Q1.1.0 or later Release Notes can be found here: http://wikis.sun.com/display/FishWorks/ak-2010.02.09.1.0+Release+Notes and the release itself is linked from the Software Updates page: http://wikis.sun.com/display/FishWorks/Software+Updates Identification of Affected Parts (how to) As noted in the "Symptoms" section, SIMs status is indicated by the number of paths associated with a JBOD chassis. The Blue Light on the rear of a SIM module indicates a failure. References Bug Id: 6803801 For information about FAB documents, its release processes, implementation strategies and billing information, go to the following URL: * http://tns.central/fab In addition to the above you may email: * FAB-Manager@sun.com Internal Contributor/submitter cliff.thomas@oracle.com Internal Eng Responsible Engineer zuheir.totari@oracle.com Responsible Manager: Renee.Bennett@oracle.com Internal Services Knowledge Engineer Joe.Davis@oracle.com Internal Eng Business Unit Group NWS (Network Storage) Internal Sun Alert & FAB Admin Info 20-Nov-2009: Completed draft and sent to Extended Review. 24-Nov-2009: No feedback from Ext Rvw - sending to Publish. 23-Jun-2010: Major rewrite of the Solution section. Attachments This solution has no attachment |
||||||||||||
|