Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1002936.1
Update Date:2011-05-26
Keywords:

Solution Type  Technical Instruction Sure

Solution  1002936.1 :   How to Check for Linux Platform Disk Errors and Online/Offline Status  


Related Items
  • Sun Fire X4200 M2 Server
  •  
  • Sun Java Workstation W2100z
  •  
  • Sun Ultra 20 Workstation
  •  
  • Sun Fire X4440 Server
  •  
  • Sun Ultra 20 M2 Workstation
  •  
  • Sun Fire X2200 M2 Server
  •  
  • Sun Fire X4600 Server
  •  
  • Sun Fire X4100 Server
  •  
  • Sun Fire X4500 Server
  •  
  • Sun Ultra 40 Workstation
  •  
  • Sun Java Workstation W1100z
  •  
  • Sun Fire X4100 M2 Server
  •  
  • Sun Fire X4540 Server
  •  
  • Sun Fire X4600 M2 Server
  •  
  • Sun Fire X2100 M2 Server
  •  
  • Sun Fire X4200 Server
  •  
  • Sun Fire X2100 Server
  •  
  • Sun Ultra 40 M2 Workstation
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>x64 Servers
  •  

PreviouslyPublishedAs
204039




To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Sun x86 Systems


Description

This document describes how to identify if a Linux operating environments disk is online/offline or has reported errors.
This document does not detail how to recover data or replace storage devices and does not discuss service processor embedded Linux.

Symptoms:

- disk errors.


Steps to Follow

NOTE: Always perform disk management commands as a root or UID0 user.

IDENTIFYING DRIVE AVAILABILITY:

Execute the following command to identify the available disk configuration:

# /bin/more /proc/scsi/scsi

Output will vary depending on platform model and configuration, but will be similar to the following:

Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
Vendor: AMI      Model: Virtual CDROM    Rev: 1.00
Type:   CD-ROM                           ANSI SCSI revision: 02
Host: scsi1 Channel: 00 Id: 00 Lun: 00
Vendor: AMI      Model: Virtual Floppy   Rev: 1.00
Type:   Direct-Access                    ANSI SCSI revision: 02
Host: scsi2 Channel: 00 Id: 02 Lun: 00
Vendor: SEAGATE  Model: ST973401LSUN72G  Rev: 0556
Type:   Direct-Access                    ANSI SCSI revision: 03
Host: scsi2 Channel: 00 Id: 03 Lun: 00
Vendor: SEAGATE  Model: ST973401LSUN72G  Rev: 0556
Type:   Direct-Access                    ANSI SCSI revision: 03

The output above details all devices that are currently available to the platform for use.

Now execute the following command to identify the available partitions:

# /sbin/fdisk -l

Output will vary depending on platform model and configuration, but will be similar to the following:

Disk /dev/sdb: 73.4 GB, 73407865856 bytes
255 heads, 63 sectors/track, 8924 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sdb1   *           1         131     1052226   83  Linux
/dev/sdb2             132        8402    66436807+  83  Linux
/dev/sdb3            8403        8924     4192965   82  Linux swap
Disk /dev/sdc: 73.4 GB, 73407865856 bytes
255 heads, 63 sectors/track, 8924 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1   *           1        8924    71681998+  83  Linux

The above commands are dynamic and will reflect the currently available disks. Devices that have failed in such a way that they are now offline will not be reflected in this output.
Some platforms have virtual devices used for runtime addition of a storage media. These devices which are usually identified as 'virtual' can be ignored as their presence is not required for this diagnosis.

If one or more expected disks are not present then your disk is offline and is not able to respond to the scsi and fdisk probes.

IDENTIFYING ERRORS FROM MESSAGES:

Execute the following command to extract disk related errors from the system messages file:

# /bin/grep SCSI /var/log/messages*
# /bin/grep 'fs error' /var/log/messages*

Output will vary depending on platform model and configuration, but will be similar to the following:

Dec 12 12:30:00 x4100c kernel: SCSI device sdb: 143374738 512-byte hdwr sectors (73408 MB)
Dec 12 12:30:00 x4100c kernel: SCSI device sdb: drive cache: write through
Dec 12 12:30:01 x4100c kernel: SCSI device sdc: 143374738 512-byte hdwr sectors (73408 MB)
Dec 12 12:30:01 x4100c kernel: SCSI device sdc: drive cache: write through
Dec 12 13:35:00 x4100c kernel: SCSI error : <2 0 3 0> return code = 0x10000
Dec 12 13:35:00 x4100c kernel: EXT2-fs error (device sdc1): read_inode_bitmap: Cannot read inode bitmap - block_group = 422, inode_bitmap = 13828097

This output can be divided into two sections, expected and exception output.

EXPECTED OUTPUT:

Dec 12 12:30:00 x4100c kernel: SCSI device sdb: 143374738 512-byte hdwr sectors (73408 MB)
Dec 12 12:30:00 x4100c kernel: SCSI device sdb: drive cache: write through
Dec 12 12:30:01 x4100c kernel: SCSI device sdc: 143374738 512-byte hdwr sectors (73408 MB)
Dec 12 12:30:01 x4100c kernel: SCSI device sdc: drive cache: write through

These messages are runtime events that are output by the platforms hardware discovery during boot. These messages can be ignored as they are not errors but are useful because they allow us to understand the disks identities that are available at boot time.
For example, at boot time, disks 'sdb' and 'sdc' were probed and available to use.

EXCEPTION OUTPUT:

Dec 12 13:35:00 x4100c kernel: SCSI error : <2 0 3 0> return code = 0x10000
Dec 12 13:35:00 x4100c kernel: EXT2-fs error (device sdc1): read_inode_bitmap: Cannot read inode bitmap - block_group = 422, inode_bitmap = 13828097

These messages are errors and are output due to a failing component or complete disk.
The first of the two errors details the SCSI error type.
The second of the two errors details the device which suffered the SCSI error and the type of error decoded for human readable format.

We use the keyword SCSI because all storage devices in a modern Linux platform including IDE/PATA, FC-AL, SAS, SATA, SCSI, and USB emulate SCSI to be represented as a storage device. Therefore, most error messages reported in the system messages file are prefixed with the word SCSI.



This document contains normalized content and is managed by the the Domain Lead(s) of the respective domains. To notify content owners of a
knowledge gap contained in this document, and/or prior to updating this document, please contact the domain engineers that are managing this
document via the "Document Feedback" alias(es) listed below:
tsc-emea-x64@sun.com
Anthony McNamara x64 Global Domain Lead

x64, normalized, linux, RAID
Previously Published As
91488

Change History
Date: 2010-06-07
User Name: brian.jackson@oracle.com
Action: Currency check
Comment: Please review this article and document any changes you made here...

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback