Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type FAB (standard) Sure Solution 1000288.1 : LDEV Blockade may occur during Microcode upgrade, or PDEV installation, leading to Host System unscheduled outage.
PreviouslyPublishedAs 200403 Product Sun StorageTek 9910 Sun StorageTek 9960 System Sun StorageTek 9970 System Sun StorageTek 9980 System Xoption
Impact Due to a Microcode bug, a LDEV Blockade may occur during a Microcode upgrade, or PDEV installation, this can result in the unscheduled outage to connected Host Systems. A Microcode bug has been identified that can cause an out of synchronization condition, between the SVP and the DKC, if a SVP to DKC communication time-out occurs during a DKU Microcode exchange. Restarting a DKU microcode exchange, or installing additional HDD's after this out of synchronization condition has occurred, can lead to the subsystem attempting to load the wrong DKU Microcode for the HDD model(s) installed, ** when there are two or more different HDD models installed and/or present **. The subsystem will detect this inconsistency during prechecking, and begin blocking HDD ports and HDD's. When two or more HDD's in a parity group become blocked, logical devices (LDEV's) then become blocked. -ALL- Microcode versions, other than fixed version's are affected. Subsystems with only 1 x model type of HDD installed, are NOT impacted. The following SIM Reference Codes will be recorded: DF8xxx/DF9xxx - Drive Port Blockade (port 0/1) EF1xxx - Drive Blockade. DFAxxx/DFBxxx - LDEV Blockade. As a result of these SIM's, and the associated hardware / HDD blocking, some Parity Groups will be in a Blocked status. And some host systems may have suffered an unscheduled outage. The sequence of events during a Microcode update, that can lead to the out of synchronization condition are as follows: 1. Using the FC wizard, a status message (left side of bottom line of FC Wizard) is posted: SVP-DKC Communication Time Out. At this point, due to the Microcode bug, an out of synchronization condition has occurred between the SVP and the DKC. 2. An SVP message is displayed: [SMT2435E - An error occurred when replacing a microprogram. Please check status by using the Maintenance window and check logs by using the Information window.] The Microcode exchange stopped. Status was checked and was found to be normal. There were no SIM's or SSB's posted that would indicate a issue. 3. During an attempt to restart the Microcode exchange manually, without using the FC Wizard, an SVP message was displayed: [INS2268E - Exclusive task (Install, Diagnosis, Replace, etc.) is already running on the SVP. Please try this operation after finishing the task.] This SVP message can be displayed several times, when attempting to restart the Microcode exchange. 4. Eventually the Microcode exchange restarts, without seeing the INS2268E message (also, without waiting to check the appropriate sense information (SSB A673) has been produced). 5. The Microcode exchange then appears to continue normally, including loading code to the CHA's/DKA's. 6. "Exchanging DKU microprogram" messages are then observed. At this time, the SVP message SMT2435E were again displayed. The DKU Microcode upgrade is stopped. Status is checked and is found to be normal. There are no SIM's or SSB's posted, that indicate a issue. 7. The DKU portion of the Microcode exchange is restarted. Shortly thereafter, HDD ports began to block, leading to the LDEV blockade situation, as per the details above. Symptoms Root Cause Resolution INTERIM CIRCUMVENTION: For circumvention until fixed Microcode is installed on a subsystem: 1. Perform a manual, non FC Wizard, DKU Microcode exchange separately from other portions of the new Microcode set. 2. Before beginning a DKU Microcode exchange, verify the health of the SVP to DKC communication by checking status and version (both "Running" and "FM"). If there are no SVP - DKC communication errors and all MP's display version for both "Running" and "FM", proceed with the Microcode exchange. If not, troubleshoot by SVP reboot, LAN Check diagnosis, and selfreplace / replacement of the PCB. 3. In the event that a DKU Microcode exchange is performed on a subsystem with more than one model type of HDD installed and/or present, and the DKU Microcode exchange is stopped with SVP messages SMT2435E or INS2268E or SVP-DKC Communication Time-out, please wait for a minimum of 10 minutes TIMES the number of different installed HDD models. (For example, if DKR2D-J72 and DKR2E-72GB HDD's are installed, 10 x 2 = 20 minutes minimum wait time.) Check for SSB EC=A673 to be logged with a time stamp after the wait period. (to view SSB's, use the SVP -> Information -> Log -> SSB then "List" and view the SSB screen). Only once SSB EC=A673 has been generated then retry the DKU code exchange, or install additional HDD's. The reason of waiting for these "minutes" is because both the SVP and DKC will detect the time-out, and the out of synchronization condition will eventually complete, as evidenced by the SSB A673. (NOTE: The SE9990 subsystem is NOT affected by this Microcode bug, as only one model of HDD is currently released at this time for SE9990 subsystem installation. In addition, by design, the SE9990 will not block both ports of an HDD.) In the event that a subsystem is impacted by multiple HDD port blockade failures, resulting in a LDEV blockade condition, please follow these recovery steps. RECOVERY: 1. Stop all host system I/O to the subsystem. 2. Turn off AC BOX Main breakers. 3. Do not unplug any PCBs or jumpers. 4. After waiting 10 seconds, perform normal subsystem power on. 5. Check subsystem status. Previously blocked parity groups should now be in Correction Access status. 6. If all parity groups are in Normal or Correction Access status, return subsystem to customer use. 7. Recover failed HDDs by performing HDD self-replacement. Use the "Replace (Inline)" button. When prompted by SVP message, DO NOT diagnose the device; DO NOT update the microprogram in the device; DO recover the device. NOTE: There are two parts to this issue: 1. The communication issue between DKC and SVP during Microcode upgrade. 2. The blockade of both HDD ports due to the communication problem. For (9900V) SE9970 / SE9980: Microcode DKCMAIN 21-12-01-00/00 (and any later versions of Microcode) contain countermeasures for both problems #1 and #2 described above. For (9900) SE9910 / SE9960 : Microcode DKCMAIN 01-19-89-00/00 (and any later versions of Microcode) contain a countermeasure for problem #2 described above. A countermeasure for problem #1 will be included in a future SE9910 / SE9960 Microcode version. Modification History Date: 21-MAR-2007
Previously Published As 100691 Internal Comments
Related Information
Internal Eng Business Unit Group KE Authors Internal Services Knowledge Engineer joe.davis@sun.com (as of 3/21/07) Internal Kasp FAB Legacy ID 100691, I1169-1 (FIN) Internal Sun Alert & FAB Admin Info Critical Category: Significant Change Date: 2005-06-15 Avoidance: Patch Responsible Manager: null Original Admin Info: WF - chgd Product fm KASP to SE9910/9960/9970/9980. - Joe 3/21/07 Internal SA-FAB Eng Submission LDEV Blockade may occur during Microcode upgrade, or PDEV installation, leading to Host System unscheduled outage. Product_uuid 2a918ae2-0a18-11d6-834a-c679537eebe7|Sun StorageTek 9910 2a94fb3c-0a18-11d6-90a8-c9c08656284f|Sun StorageTek 9960 System 4ea4b951-9fc9-4f1f-b64e-69572a400fb4|Sun StorageTek 9970 System c2428fbe-8ab7-41d0-8b6e-ab489823c9d4|Sun StorageTek 9980 System Attachments This solution has no attachment |
||||||||||||
|