Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-73-1019622.1
Update Date:2010-11-03
Keywords:

Solution Type  FAB (standard) Sure

Solution  1019622.1 :   Possible failure of multiple HDDs of certain type within the same Parity Group in ST9990V, ST9985V, ST9990 and ST9985.  


Related Items
  • Sun Storage 9990V System
  •  
  • Sun Storage 9985V System
  •  
  • Sun Storage 9990 System
  •  
  • Sun Storage 9985 System
  •  
Related Categories
  • GCS>Sun Microsystems>Sun FAB>Standard>Controlled Proactive
  •  

PreviouslyPublishedAs
242466


Product
Sun StorageTek 9990 System
Sun StorageTek 9985 System
Sun StorageTek 9990V System
Sun StorageTek 9985V System

Date of Resolved Release
29-Sep-2008

Possibility of multiple HDD failures of certain type within the same Parity Group during new installation, maintenance, or upgrade due to Microcode bug (see details below).

Affected Parts:

Affected HDD models:

DKR2G-K72FC, DKR2G-K146FC, DKR2G-K300FC

Impact

Possibility of multiple HDD failures in the same Parity Group causing loss of access to the LDEVs in the Parity Group (all LDEVs in that PG will be in blocked status).

Contributing Factors

Due to a Microcode bug on the above listed products, there is a possibility that DKR2G-K series (K300FC, K146FC and K72FC) HDD(s) will get blocked.

There is no bug open in Sun for this issue - HDS owns this bug.

Symptoms

Blocked DKR2G-K HDD(s) after one of the following HDD Initial Reset events occurs on these HDD when it is followed by the certain sequence of write/read access pattern as detailed in the "Root Cause" section...
  • Power On (possibly multiple HDD Blocked)
  • Parity Group Install (possibly multiple HDDs Blocked)
  • HDD replacement (single HDD blockade)
If this is an install of Parity Group or a Power on, then there is a possibility of multiple HDD failure within the same PG.  If this is a replacement of a single HDD then just the one HDD will be blocked.

Root Cause

Due to a microcode bug there is a possibility that DKR2G-K series HDD(s) will become blocked.  HDD blockade occurs due to an HDD being unresponsive triggered by HDD initial reset and then followed by certain specific Write/Read access pattern with a specific data length is issued.

Note: HDD initial reset: HDD power supply is turned on or an HDD micro-program exchange occurs or an HDD replacement using hot plugging (upgrading, reconfiguration, replacement) all of which require the initial reset to be performed.

Condition of Occurrence:

When all the following conditions are met, HDD blockade occurs due to an HDD being unresponsive.

(1) HDD model: DKR2G-K300FC/K146FC/K72FC
(2) Write command is issued right after an HDD initial reset is performed.
(3) The data length of a read command, issued right after the write command in (2), is longer than (229)hex sectors, and the leading part of data of previous (2) write command is hit. Note that this issue does not occur in the case of other hits, like all partial hits, intermediate partial hits, or a terminal partial hit.

All impacted systems shipping from Sun Manufacturing begining on September 23, 2008 contains the new microcode to address this issue.

Corrective Action

Workaround:

If you have multiple HDDs fail in the same parity group due to this bug DO NOT attempt to self replace or replace ANY of the affected HDDs.

The first action that should be taken is to perform a Normal Restore procedure of the affected LDEVs.  If the HDD failures are due to this Microcode bug then a Normal Restore procedure should restore the LDEVs on the affected parity group to Correction Access status thereby giving the customer back access to the LDEVs on the parity group.

If for any reason the Normal Restore does not work, again, DO NOT attempt to self replace or replace ANY of the affected HDDs but instead contact TSC support for additional guidance.

It is recommended to follow the steps in Resolution section below. However, this issue can be avoided by following either item A or B below:

A. Run the LDEV Verify function after every time Powering On the system.
(Verify check can be run at SVP, select maintenance > ldev tab > select each ECC group )

- To only the first LDEV in each Parity Group for approximately one minute.
- After one minute has passed then cancel the verify.
- Then move on to verify the next parity group.

B. Run a DCR (Dynamic Cache Residency) function to all the first LDEVs in each Parity Group.

However, it is required to install additional Cache to use this DCR.  Refer to respective Maintenance Manuals for details.

Resolution:

Please upgrade as soon as possible all DKR2G-K disk drive DKU micro-program versions to 00-00-AZ or higher by upgrading the entire Microcode set to one of the versions listed below that contain the modified DKU micro-program or perform the optional DKU Only micro-program load option.

Microcode Sets Containing Fixed DKU Version 00-00-AZ.

For ST9990V and ST9985V:

  60-03-27-00/00-M076 or higher
  60-03-07-00/00-M075

For ST9990 and ST9985:

  50-09-76-00/00-M251 or higher

For instructions on acquiring updated Microcode reference How To doc id 1018586.1: Sun StorEdge[TM] 9900: Requesting Microcode, Software and License Updates.  This knowledge asset can be accessed via the below URL;

  https://support.us.oracle.com/oip/faces/secure/km/DocumentDisplay.jspx?id=1018586.1

************************************************************************
Optional DKU Only Micro-Program Load

Optionally you may obtain one of the DKU Only Code Sets listed below by standard online ordering process (similar to system Microcode ordering) and load just the DKU micro-program into the DKR2G-K HDDs.  Do not load the DKU code from other Microcode sets.

Storage Model        Product Description & Product Code (part number) To Order

ST9990V and ST9985V    DKU 00-00-AZ-H036 & MC-USPV-045     
ST9990 and ST9985    DKU 00-00-AZ-H053 & MC-USP-NSC-079

Note that this can be done only if one of the pre-requisite Microcode sets below is already installed in the subsystem.

For ST9990V and ST9985V:

Microcode pre-requisite levels supporting online DKU only code loading to DKR2G-K FC drives (per ECN noted): Minimum version is

 a) 60-02-48-00/12 and higher

 b) 60-02-31-00/00 and higher (but not 60-02-48-00/00 or 60-02-48-00/10).

 c) 60-02-27-00/00 and higher if the system does not have below conditions:

   - Will be performing DKU exchange to an HDD in which I/O is being executed.
   - All HDD models are affected except SATA.
   - In one backend loop, six or more HDDs are installed and DKU updates
     will be to at least one of these HDDs. 

For ST9990 and ST9985:

50-09-70-00/00 and higher is the minimum Microcode support required to load DKU code into DKR2G-K HDDs.

Comments

For more details review below listed "Related URL(s)".

References:

FAB: 1019433.1
Escalation ID: 66057816
Related URL(s):

  http://se9990.eng/mc/mc.html  - ST9900 Microcode Matrix:
  http://sejsc.ebay/alerts_via_alias.html  - Subscribe to ST9900 important alerts
  http://se9990.eng/ecn.html  - ST9900 ECNs and FCBs
  http://pts-storage.west/products/T99x0/documentation.html  - Maintenance Manuals
  http://sccc-storage:5071/cgi-bin/microcode/request.cgi  - ST9900 Microcode CD and DKU code request tool
  http://se9990.eng/mc/mc.html - ST9900 Current Microcode Matrix
  http://sejsc.ebay/alerts_via_alias.html - Subscribe to ST9900 important alerts

For information about FAB documents, its release processes, implementation strategies and billing information, go to the following URL:

In addition to the above you may email:


Internal Contributor/submitter
Suresh.Gummanur@Sun.COM

Internal Eng Responsible Engineer
Suresh.Gummanur@Sun.COM Responsible Manager: Tejinder.Singh@Sun.COM

Internal Services Knowledge Engineer
Joe.Davis@Sun.COM

Internal Eng Business Unit Group
NWS (Storage)

Internal Sun Alert & FAB Admin Info
22-Sep-2008: Finalized draft and sent to Extended Review.
24-Sep-2008: Put onhold pending agreement between Services and PTeam on Implementation.
29-Sep-2008: Multiple modifications provided by submitter - sending to Publish.
02-Dec-2009: Corrected Product Name to swoRDFish inconsistency.


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback