Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1002938.1
Update Date:2009-11-17
Keywords:

Solution Type  Troubleshooting Sure

Solution  1002938.1 :   Analyzing Internal RAID Disk Failures for AMD Opteron Workstation Systems and X2 Servers  


Related Items
  • Sun Ultra 20 Workstation
  •  
  • Sun Ultra 20 M2 Workstation
  •  
  • Sun Fire X2200 M2 Server
  •  
  • Sun Ultra 40 Workstation
  •  
  • Sun Ultra 40 M2 Workstation
  •  
  • Sun Fire X2100 M2 Server
  •  
  • Sun Fire X2100 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Desktops>Workstations
  •  

PreviouslyPublishedAs
204041


Description
Document description

Summary:

This document addresses analyzing internal RAID disk failures for Sun AMD Opteron workstations and X2100, X2100M2, X2200M2 servers
At this time, RAID configurations using the on board Nvidia controller are supported only with Microsoft Windows operating systems.
In several cases, the root cause of a degraded array is not a faulty disk, but the consequence of an unclean shutdown or power outage.
In such case, it is not necessary to open a Service Request.

You can simply follow the troubleshooting steps below and rebuild the array.

Symptoms:

- On boot up, messages state Raid volume is degraded
- OS logs indicate a drive failure
- Popup window / icon says there is a degraded volume
- Mediashield shows raid status degraded



Steps to Follow
Please validate that each troubleshooting step below is true for your environment. The steps will provide instructions or a link to a document, for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step.

Step 1. Verify it is a RAID

First of all we must check that RAID is enabled and configured.
You can check it from Media Shield GUI or/and from BIOS.

From MediaShield

a) In Windows, start "Nvidia Media Shield" GUI (nvraidman.exe).
You should find the icon on the desktop or in:

Programs-->nVidia Corporation--> Media Shield

When started this tool immediately lists the RAID arrays configured and their status (Healty or Degraded).

From BIOS

a) If the system can't boot or you can restart it go into the system BIOS.
To do this press <F2> key when prompted during workstation power on.

Select "Advanced" --> IDE Configuration --> Nvidia RAID Setup

or

Select "Integreted devices" --> "NV Raid configuration."

nVidia RAID function must be enabled.

If it is enabled, below that entry you should find also some "SATA # Channels" in enabled state.
This identifies which channels (disks) are available to build raid volumes.

If these options are not enabled then no raid is present (or configured).
Look for InfoDocs about analyzing non-RAID disk failures.

b) If options in point (a) are enabled... reboot the system again.
This time press <F10> during boot when prompted, to enter Nvidia RAID BIOS.

From here, if a RAID volume exists, you should have the "Array List" menu that shows the status of every volume.
In particular check if the "Status" is "Healthy" or "Degraded".
If no volume is present, you don't have a RAID, or it is not configured.

Step 2. Verify RAID is degraded

Different ways to check it.

  • From Media Shield GUI, check the column "Status" of the associated array.

  • At workstation reboot, during Nvidia RAID phase, if there are degraded arrays,
    the system automatically warns you in red characters that you have a degraded array.

  • From Nvidia RAID BIOS. As already done in step (1) enter the Nvidia RAID BIOS pressing <F10>.
    There you have the array list and status.

Read also <Document: 1011590.1> "How to check for Windows platform disk errors and online/offline status"

Step 3. Rebuild the array

To rebuild a degraded array or to resync it, the preferred method is via Media Shield GUI.
Rebuild is possible only for RAID 1, RAID 0+1 and RAID 5 arrays.

  • Go to Windows and start the Media Shield GUI.

  • If the failed drive is now available, and you can re-use it,
    may be that Nvidia software shows you two copies of the same array "Mirroring".
    Both in degraded state.
    One is the updated copy (probably in use by Windows itself), the other is not.

  • If the previous is your case, delete one of the two.
    - To choose which one of the two, in "My Computer", you will find every Windows volume duplicated!
    - Find out which one is better or up to date, and backup needed data.
    - From "Media Shield" right click on the "Mirroring" entry you want/can delete and delete it.
    - The Nvidia delete array wizard will start. Follow the wizard instructions.

  • After the array removal,or if you are replacing the failed disk with a new one, you should have a "free" disk in the list.

  • If rebuild doesn't start automatically, Right-click on "Mirroring" (the array you want rebuild).

  • From the pop-up menu click on "Rebuild array"

  • The Nvidia rebuild array wizard will start

  • Follow the wizard instructions

  • Click Finish.
    The array rebuilding starts after a few seconds, and a small pop-up message appears at the screen bottom besides the clock.

  • The time required to rebuild the array depends from several factors... disk size and activity on the volume as well.
    It can take hours. It goes on in background so you can work while rebuild is in progress.
    In any case, the tools tells you the progress % status.

  • When the rebuilding process is finished you will see the pop-up box saying "Rebuild finished".
    An event message is generated as well.

Step 4. What if rebuild fails or one or more disks are unavailable?

If there is a real disk failure, you must open a Service Request and ask for disk replacement.
Collect info about the disk size, model and be ready to provide this info to the Sun engineer when requested.

See also resolution steps 5 and 6

Step 5. Verify the disk is a Sun supported disk

  • To have details about the disks that are components of an Nvidia RAID volume from OS you must use Media Shield.
    Without it there is no way to look at the single drives. This because Windows see and manage the logical entity.
    So in device manager you will find something like "NVIDIA Mirror"

  • Alternatively you can see the disk details also entering the Nvidia BIOS pressing <F10> during system POST.
    This option is only available when Raid is enabled in platform BIOS

For more details refer to the following Technical Instructions:

<Document: 1010055.1> "Identifying Sun supported disks"
<Document: 1008396.1> "How to identify optical and hard disk firmware revisions for checking of known issues"

Step 6. Collect additional system configuration informations

In Windows there are a couple of useful tool designed to collect system configuration data like OS version, drivers, events, etc.

Msinfo32 (bundled in Windows) and Microsoft Product Support's reporting tools that need to be downloaded from Microsoft website.

msinfo32:

You can start it manually: [Start]->[Run]->msinfo32
Then export system informations into a text file.
File->Export...

MPS report (Microsoft Product Support's Reporting Tools):

It is a compressed software package that contains one or more scripts and other utilities that you can use
to capture critical system, diagnostic, and configuration information about your system.
Go to http://support.microsoft.com/kb/818742 for a detailed description and download links.

See also "How to collect useful configuration information about my system" section in
<Document: 1007054.1> "How to handle Microsoft Windows Panics on x64 platforms"

Step 7. Open a Service Request

With disk model info and Windows configuration data at hand, you can open a Service Request.

Event logs examples

Below an example from a Sun Ultra 20 M2 Workstation running w2k3.
They are error messages generated in Windows "System" events.
You can see warnings by NVRAIDSERVICE about:

- Drive SATA 1.1 failure
- Array in degraded status
- Duplicated array deleted.
- Rebuild started and finished.

12/19/2007      8:13:13 PM      NVRAIDSERVICE   Information     None    1009    N/A     TESTSYS Array NVIDIA  Mirroring   74.53G rebuild finish.
12/19/2007      7:43:53 PM      NVRAIDSERVICE   Information     None    1008    N/A     TESTSYS Array NVIDIA  Mirroring   74.53G rebuild start.
12/19/2007      7:43:52 PM      NVRAIDSERVICE   Information     None    1000    N/A     TESTSYS New disk HDS728080PLA380 has been added to array NVIDIA  Mirroring   74.53G.
12/19/2007      7:43:52 PM      NVRAIDSERVICE   Information     None    1005    N/A     TESTSYS Array NVIDIA  Mirroring   74.53G found spare disk HDS728080PLA380.
12/19/2007      7:43:46 PM      NVRAIDSERVICE   Warning None    1002    N/A     TESTSYS Array NVIDIA  Mirroring   74.53G has been deleted.
12/19/2007      7:43:46 PM      NVRAIDSERVICE   Information     None    1001    N/A     TESTSYS New disk detected: HDS728080PLA380.
12/19/2007      7:43:46 PM      NVRAIDSERVICE   Warning None    999     N/A     TESTSYS Disk HDS728080PLA380 has been removed from array NVIDIA  Mirroring   74.53G.
12/19/2007      7:33:17 PM      NVRAIDSERVICE   Warning None    1003    N/A     TESTSYS Disk HDS728080PLA380 is gone or has been removed on port SATA 1.1.
12/19/2007      7:33:17 PM      NVRAIDSERVICE   Information     None    1004    N/A     TESTSYS Array NVIDIA  Mirroring   74.53G searching for spare disk.
12/19/2007      7:33:17 PM      NVRAIDSERVICE   Information     None    1001    N/A     TESTSYS New disk detected: HDS728080PLA380.
12/19/2007      7:33:17 PM      NVRAIDSERVICE   Warning None    999     N/A     TESTSYS Disk HDS728080PLA380 has been removed from array NVIDIA  Mirroring   74.53G.
12/19/2007      7:33:16 PM      NVRAIDSERVICE   Error   None    1006    N/A     TESTSYS Access failure: Critical error on disk HDS728080PLA380 (Port: SATA 1.1).

NVIDIA Media Shield

For more info and details about NVIDIA Media Shield please go to Nvidia website:

http://www.nvidia.com

In particular, for Media Shield features and usage, see the "Media Shield User's Guide".
You can find it at:

http://www.nvidia.com/object/feature_raid.html





Product
Sun Ultra 40 M2 Workstation
Sun Ultra 40 Workstation
Sun Ultra 20 Workstation
Sun Ultra 20 M2 Workstation
Sun Fire X2200 M2 Server
Sun Fire X2100 Server
Sun Fire X2100 M2 Server

Internal Comments
Audited/updated 11/17/09 - James.Carter@Sun.COM, x64 Content Team

This document contains normalized content and is managed by the the Domain Lead(s) of the respective domains. To notify content owners of a knowledge gap contained in this document, and/or prior to updating this document, please contact the domain engineers that are managing this document via the "Document Feedback" alias(es) listed below:

tsc-emea-x64@sun.com

nvraidman, mediashield, raid, degraded, windows, nvidia, nforce, x86, x64, normalized
Previously Published As
91564

Change History
Date: 2007-12-31
User Name: 31620
Action: Approved
Comment: Since this was only a minor change, and this doc has been verified before, I re-checked all links again - all ok
Now publishing..
Regards,
Andy
Version: 10
Date: 2007-12-31
User Name: 88029
Action: Approved
Comment: Good to go
Version: 0
Date: 2007-12-31
User Name: 79977
Action: Approved
Comment: Changed title
Changed summary to include "X2100 X2100M2 X2200M2"
Think this is correct syntax
If okay, pass to Andy
cheers
bj
Product_uuid
773868d4-75b4-11db-a4bd-080020a9ed93|Sun Ultra 40 M2 Workstation
7b070932-47f1-11da-ac39-080020a9ed93|Sun Ultra 40 Workstation
372415be-961d-11d9-9adf-080020a9ed93|Sun Ultra 20 Workstation
75a5313c-1d79-11db-a023-080020a9ed93|Sun Ultra 20 M2 Workstation
421f3c0e-dae5-11da-a742-080020a9ed93|Sun Fire X2200 M2 Server
28c0502a-fd60-11d9-a8ca-080020a9ed93|Sun Fire X2100 Server
417b81fb-dae5-11da-a742-080020a9ed93|Sun Fire X2100 M2 Server

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback