Asset ID: |
1-75-1012833.1 |
Update Date: | 2010-08-26 |
Keywords: | |
Solution Type
Troubleshooting Sure
Solution
1012833.1
:
Analyzing Sun StorEdge[TM] 6920 Component Failure Alarms, Notifications, and LEDs
Related Categories |
- GCS>Sun Microsystems>Storage - Disk>Modular Disk - 6xxx Arrays
|
PreviouslyPublishedAs
217614
DescriptionThis document addresses the identification of failed or failing components in the array via various symptoms provided.
Symptoms:
- StorADE Alarms(issued via email, or observed in the User Interface)
- Amber LEDs
- Loss of Access(Outage, can't find data, etc)
- Appearance of filesystem corruption
- Application discovered corruption
- Application failed/services stopped
- Application can't read data
- Bad/Slow performance
- Data Host Messages
- Array Event Log Messages
Please validate that each troubleshooting step below is true for your environment. The steps will provide instructions or a link to a document, for validating the step and taking corrective action as necessary. The steps are ordered in the most appropriate sequence to isolate the issue and identify the proper resolution. Please do not skip a step.
Steps to FollowAnalyzing Sun StorEdge[TM] 6920 Component Failure Alarms, Notifications, and LEDs
A. Verify presence of fault(Amber) LEDs on the array
If you are remote and cannot review the array physically, please
skip this step and skip to step B.
1. Use the table below to locate and review component fault LEDs.
Component |
Component Location |
LED location |
Notes |
6020 Tray Fault LED |
NA |
Drive side of array, far right side of tray(Above Wrench Symbol) |
None |
6020 Drive |
Drive Slot |
Middle LED under drive slot |
Wrench Symbol itself lights up |
6020 RAID Controller |
Below Fans, Middle card |
Next to Fibre Channel connection(left of Wrench Symbol) |
None |
6020 Loop Cards |
Below fans on far left or far right of tray |
Above Cable Connections(left of Wrench Symbol |
None |
6020 Power Supply Unit |
Housed with Fans on the trays |
Middle of FRU(below Wrench Symbol) |
None |
DSP Global Fault LED |
DSP Housing |
Drive side of array, top of DSP(Left of Wrench Symbol) |
None |
DSP Fan |
DSP Housing |
Drive side of array, Middle left on DSP(Below the word FAN) |
Fan FRUs are accessed via the cabled side. LED is on drive side |
DSP SRC |
Drive Side of DSP |
Left of word Attention
|
SRCs are located in slots 1-4 |
DSP SFC |
Drive Side of DSP |
Left of word Attention
|
SFCs are located in slots 5-6 |
DSP SIO Combo |
Cabled Side of DSP |
Left side of card, Left of Wrench Symbol |
SIO Combo Cards are located in slots 1-4, and have a network port |
DSP SIO |
Cabled Side of DSP |
No FAULT LED |
SIOs are located in slots 1-4 |
DSP MIC |
Cabled Side of DSP |
Left Side of card, left of the word Attention
|
MICs are located in slots 5-6. Redund LED is amber, and will normally blink for the slot that is in STANDBY state |
SP(v100) |
Top of Rack |
Cabled side of Rack |
Right side of SP |
SP(v210) |
Top of Rack |
Cabled side of Rack |
Middle of SP, to the right of wrench symbol |
SPA Tray |
Top of Rack, Below SP |
NO FAULT LEDs |
None |
10/100 Hub |
Top of Rack, Below SPA |
NO FAULT LEDs |
None |
2. Note any fault LEDs that are lit, and continue to Step B.
B. Verify the presence of an alarm on the array.
- Log into the browser interface for your array: https://<my_array_address>:6789
- Click on the Storage Automated Diagnostic Environment link.
- Click on the Alarms tab. If there are no alarms, skip to step D. If there are no alarms
AND
there is a fault LED noted from step A, skip to step E.
- Continue to step C for each alarm listed.
C. Check the details of the alarm
NOTE: Some alarms are generated to indicate that a component needs to be replaced. For those alarms that link to a Service Adviser action(see step 3), suggesting part replacement, it is recommended that you contact Sun Services, before taking action
- Click on the Details link in the alarms table.
- If the Event Code field is listed in the following table, complete the instructions associated with it.
- If the Event Code field is NOT listed below, Follow the Recommended Actions provided by the [ Click here to access the Service Adviser ] link. This may require a service call to get a replacement part.
- If the Recommended Actions were followed and this did not alleviate your symptoms go to step D.
EventCode |
Instructions |
30.12.31 |
Please collect alarm details, and go to Step E. |
30.20.451 |
Snapshots must be deleted to recover. Please review The 6920 Best Practices Guide
|
30.20.452 |
Please follow alarm recommendations, and review The 6920 Best Practices Guide
|
30.20.457 |
Please follow alarm recommendations, and review The 6920 Best Practices Guide
|
30.34.21 |
Please collect alarm details and Solution Extract per alarm Service Action, and go to Step E. |
30.36.16 |
Please collect alarm details, and go to Step E. |
38.5.48 |
Please collect alarm details, and go to Step E. |
38.20.13 |
Please collect alarm details, and go to Step E. |
38.20.74 |
Please collect alarm details and Solution Extract per alarm Service Action, and go to Step E. |
38.20.204 |
Contact
Sun Support
for a disk drive replacement |
38.34.71 |
Please collect alarm details, and go to Step E. |
D. Verify whether there are Events being logged
- Log into the browser interface for your array: https://<my_array_address>:6789
- Click on the Storage Automated Diagnostic Environment link.
- Click on the Administration Tab.
- Click on the Event Log Tab.
- Verify whether there are any Events logged at or around the time of any other symptoms you may be having.
- For each log entry identified, click on the "Details" link.
- Check the Event Code in the Event Details page, against the table below. Follow the directions associated with the Event Code. If the Event Code is NOT listed in the table below, go to step E.
EventCode |
Instructions |
38.13.354 |
A Diagnostic test has been run and failed on your 6020 tray. Please go to Step E. |
E. Additional Data Collection
If after following the above troubleshooting steps, you have not resolved your potential hardware issue, please collect:
- Detailed description of hardware problem(host messages, array messages, etc.)
- Any Amber LED status.
- Array Solution Extract. Refer to <Document: 1003756.1> How to collect an extractor from a Sun StorEdge[TM] 6920 (2.x and 3.x)
- Whether there is a fault LED present without an alarm.
- Solaris Explorer Collection. Refer to <Document: 1006990.1> Sun[TM] Explorer Implementation Best Practice
- Microsoft Windows[R] Data Collection. Refer to <Document: 1006608.1> Microsoft Windows? operating system: How to obtain troubleshooting information for storage issues
- Any other data you deem pertinent.
and contact
Sun Support
ProductSun StorageTek 6920 System
Sun StorageTek 6920 Maintenance Update 2
Sun StorageTek 6920 Maintenance Update 1
Internal Comments
Analyzing Sun StorEdge[TM] 6920 Component Failure Alarms, Notifications, and LEDs
This document contains normalized content and is managed by the the Domain Lead(s) of the respective domains. To notify content owners of a knowledge gap contained in this document, and/or prior to updating this document, please contact the domain engineers that are managing this document via the “Document Feedback” alias(es) listed below:
storage-os-disk-mid-domain@sun.com
Instructions for Sun Support
F. Review LEDs on system
Are the LEDs Solid Amber? If so, go to Step G., after identifying the
components based on the table in step A. Most Amber LED's will have an Alarm
associated with them.
Note: A USB Flash with a Solid
RED
LED is not a fault. Refer to <Document: 1018698.1>
: Sun StorEdge[tm] 6320/6920 Service Processor USB flash disk LED is red
G. Review the Alarms logged by the system
- Recommend component replacement based on alarms logged. If multiple FRU's have failed based on the total number of alarms, skip to Step #.
- If the Alarm is listed in the table below, follow the recommended actions.
- If there are no alarms, continue to Step H.
Alarm Code |
Action |
30.12.31 |
Follow Recommended actions in the Alarm Details, if components cannot be pinged from the SP, reference < Solution: 209097 > : Sun StorEdge[TM] 6920 system: Unable to connect to DSP from SP via telnet
|
30.20.222 |
If the MIC is not in a FAULT or OFFLINE state, reference <Document: 1018149.1> : Verify MIC (Management Interface Card) Status on DSP-1000 (Sun StorEdge[TM] 6920) to verify which MIC is Master or Slave, and go to step H. |
38.5.48 |
This requires a change in the "enable_volslice" parameter in the array called out. This can only be done by logging into the array |
H. Review Event Codes
- If the Event Code is listed in the table below, follow the recommended actions. Otherwise complete the recommended actions described in the Event.
- If there are no Events in the array, continue to Step I
Event Code |
Action |
| There are no Event Codes that require special attention beyond normal health check provided by Step H
|
I. Review 6920 Health
Review customer provided extractor for overall health status. A good guideline
is to review the contents of Document:
- <Document: 1005447.1> Sun StorEdge[TM] 6920 System Health Checklist. If no faults are found, continue to Step I.
J. Review of Data Host System Health
Host system health should be reviewed from a 6920 Centric perspective. In basic terms, this means that the host data collection should be "mined" for the following information:
* SCSI errors
* Fibre Channel errors
* Path status information(luxadm display for Solaris, sstm for Windows/Other)
This should be adjusted to the time on the SP of the 6920. Host problems that correlate to information found in the messages.dsp file, found in this step
could indicate a fault in one or more of the following major components:
- FC Switch
- FC SFP
- FC Host Bust Adapter
- FC Cables between all segments
Refer to Troubleshooting Document:
<Document: 1009557.1> : Troubleshooting Fibre Channel Devices from the OS
If no host hardware problems are identified, continue to Step K.
K. Escalation
Escalate to the next level of support providing the following data in a central location:
- Detailed description of hardware problem(host messages, array messages, etc.)
- Any Amber LED status.
- Array Solution Extract. Refer to <Document: 1003756.1> How to collect an extractor from a Sun StorEdge[TM] 6920 (2.x and 3.x)
- Whether there is a fault LED present without an alarm.
- Solaris Explorer Collection. Refer to <Document: 1006990.1> Sun[TM] Explorer Implementation Best Practice
- Microsoft Windows[R] Data Collection. Refer to <Document: 1006608.1> Microsoft Windows? operating system: How to obtain troubleshooting information for storage issues
- Results of reviewing the extractor from Step I.
The Knowledge Work Queue for this article is KNO-STO-MIDRANGE_DISK.
6920, system1, unity, LED, Alarm, Event, Health, normalized, Audited
Previously Published As
89103
Change History
Date: 2007-12-10
User Name: 7058
Action: Approved
Comment: Fixed Tmark in title. KE audit.
Version: 6
Date: 2007-12-10
User Name: 7058
Action: Update Started
Comment: Title missing Tmark
Version: 0
Date: 2007-09-11
User Name: 7058
Action: Approved
Comment: Fixed 2 instances of a link that was pointing to 81801 when actually it should have been pointing to 81805.
Minor punctuation fixes.
Spell ck OK.
Other dependent docs are now in final review and will be published soon.
OK to publish.
Version: 5
Attachments
This solution has no attachment