Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-75-1008407.1
Update Date:2011-02-14
Keywords:

Solution Type  Troubleshooting Sure

Solution  1008407.1 :   Analyzing Internal non-RAID Disk Failures for x64 Linux  


Related Items
  • Sun Fire X4200 M2 Server
  •  
  • Sun Java Workstation W2100z
  •  
  • Sun Ultra 20 Workstation
  •  
  • Sun Fire X4440 Server
  •  
  • Sun Fire X2200 M2 Server
  •  
  • Sun Ultra 20 M2 Workstation
  •  
  • Sun Fire X4600 Server
  •  
  • Sun Fire X4100 Server
  •  
  • Sun Fire X4500 Server
  •  
  • Sun Fire X4100 M2 Server
  •  
  • Sun Java Workstation W1100z
  •  
  • Sun Ultra 40 Workstation
  •  
  • Sun Fire X4540 Server
  •  
  • Sun Fire X2100 Server
  •  
  • Sun Ultra 40 M2 Workstation
  •  
  • Sun Fire X4600 M2 Server
  •  
  • Sun Fire X4200 Server
  •  
  • Sun Fire X2100 M2 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>x64 Servers
  •  

PreviouslyPublishedAs
211491


Applies to:

Sun Fire X2100 M2 Server
Sun Fire X2100 Server
Sun Fire X2200 M2 Server
Sun Fire X4100 M2 Server
Sun Fire X4100 Server
All Platforms

Purpose

Summary:

This document addresses failures of internal disks in Red Hat and SuSE/Novell x64 platforms. Failures under hardware RAID are not discussed in this document.

Symptoms:

- Disk service LED illuminated
- Disk errors in system messages files
- Disk errors on console
- Disk SMART errors during the boot process

Last Review Date

October 8, 2010

Instructions for the Reader

A Troubleshooting Guide is provided to assist in debugging a specific issue. When possible, diagnostic tools are included in the document to assist in troubleshooting.

Troubleshooting Details

Steps to Follow

Analyzing Internal non-RAID Disk Failures for x64 Linux

Step 1. Verify a supported platform disk and part number:

The following link references a support document that assists in the identification of a disk part number. In addition, the document provides the public web location of the Sun systems handbook to confirm the disk in question is a supported disk for your platform:


Document: 1010055.1 Identifying Sun Supported Platform Disks

Disks that are not listed on a platforms documentation and deemed unsupported. This is because they have not been tested and therefore have unknown properties and as such may produce unknown errors.

Even if an unsupported disk appears to work correctly, it is recommended to always use supported disks for contracted platforms.

Step 2. Verify disk is not a member of a RAID array:

The following link references a support document that assists in identifying if your Linux operating environment is installed as part of a RAID array or not:

Document: 1013003.1  How to Identify if a Linux Operating Environment is Installed on a Hardware RAID Controller

Troubleshooting steps differ for platforms that are installed under the control or a RAID management device. This is because disks under RAID control are hidden from the operating environment and are referenced as a pseudo or meta-device.

Step 3. Verify disk firmware revision and known applicable issues:

The following link references a support document that assists in identifying the disk model number and firmware revision to check for known issues and if applicable patch updates:

Document: 1008396.1  How to Identify Optical and Hard Disk Firmware Revisions for Checking of Known Issues

Patches and firmware updates are often available for disks under multiple operating systems.
Checking for known issues and updates results in decreased downtime.

Step 4. Verify disk is online has has not been going offline and no physical disk hardware problem:

The following link references a support document that assists in identifying the online/offline status of directly attached platform disks. This document also discusses the location of Linux error logs and the format in which disk errors should appear:

Document: 1002936.1  How to Check for Linux Platform Disk Errors and Online/Offline Status

Disks that are not directly attached to the platform for example installed in an external storage array, are not discussed in this document.
Storage array disks may have different properties when connected to and behind an external controller and as such change the error syntax and tools used for collection and configuration.

Step 5. Detect drive failures and disk replacement ? non RAID disks:

The following link references a troubleshooting document that assists in further identifying SCSI errors on direct attached platform disks. This document also discusses the location of Linux error logs and the format in which disk errors should appear:

Document: 1007706.1 Troubleshooting Tips for SCSI Disk Errors On Linux Systems

Although this document gives specific examples from Sun Fire v20z and Sun Fire v40z platforms, the errors seen are common among many SCSI and SAS platforms.
Use this document to identify if your disk is failing or if the problem seen requires further investigation by a Sun engineer.

Step 6. Detect drive failures and disk replacement - software RAID disks:

The following link references a support document that assists in identifying and replacing failed disks under control of the Linux Volume Manager:

Document: 1006465.1 Red Hat Linux: How To swap disks on a Software RAID and detect drive failures

Disk replacement under the Linux Volume Manager requires additional steps to remove the disk from volume manager control before physical replacement. Steps to introduce the replacement disk are also necessary when under the control of a volume manager.

Step 7. Run Linux information gathering scripts and raise a Sun service request:

The following links reference support documents that assists in the gathering of information from your platform using Red Hat and Novell/SuSE information gathering tools.

Novell/SuSE Enterprise Linux:

Document: 1010057.1 How to gather information on SuSE Linux Enterprise Systems

Red Hat Enterprise Linux:

Document: 1010058.1 How to Gather Information on Red Hat Enterprise Linux Systems

This is necessary if the resolution steps above did not resolve your issue and Sun needs to be engaged to continue diagnosis for you.

Linux information gathering scripts gathers operating system parameters and configuration information from your platform.

At this point, if you have validated that each troubleshooting step above is true for your environment, and the issue still exists, further troubleshooting is required. For additional support contact Sun Support.


Internal Comments
The following is strictly for the use of Sun employees:
This document contains normalized content and is managed by the the Domain Lead(s) of the
respective domains. To notify content owners of a knowledge gap contained in this document,
and/or prior to updating this document, please contact the domain engineers that are managing this
document via the "Document Feedback" alias(es) listed below:

Normalization team alias: tsc-emea-x64@sun.com
Domain Lead: anthony.mcnamara@oracle.com

x64, normalized, linux, firmware, RAID, SCSI, disk, error
Previously Published As
91559

Change History
Date: 2007-12-29
User Name: 31620
Action: Approved


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback