Document Audience: | INTERNAL |
Document ID: | I0854-3 |
Title: | Invalid Device structure found after disk removal when under Volume Manager 3.2 Control. |
Copyright Notice: | Copyright © 2005 Sun Microsystems, Inc. All Rights Reserved |
Update Date: | 2004-07-27 |
---------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
FIN #: I0854-3
Synopsis: Invalid Device structure found after disk removal when under Volume Manager 3.2 Control.Create Date: Jul/27/04
SunAlert: No
Top FIN/FCO Report: Yes
Products Reference: Sun Fire V880/V480, E3500 Servers, A5x00 StorEdge Array
Product Category: Server / SW Admin
Product Affected:
Systems Affected:
-----------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- A30 ALL Sun Fire V880 -
- A37 ALL Sun Fire V480 -
- E3500 ALL Ultra Enterprise 3500 -
X-Options Affected:
-------------------
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
- A5x00 ALL A5x00 Storage Array -
Parts Affected:
Part Number Description Model
----------- ----------- -----
- - -
References:
BugId: 4630477 - bogus device in "vxdisk list" output after replacing
a disk drive.
PatchId: 113201-03: VxVM 3.2: general patch for Solaris 2.6; 7; and 8.
112385-05: VxVM 3.2s9: general patch for Solaris 9.
ESC: 534731- bogus device in `vxdisk list` output after replacing a
disk drive.
URL: http://sdn.sfbay/cgi-bin/escweb?-I024271?-M534731?-P1
Issue Description:
--------------------------------------------------------------------------
| Change History from FIN I0854-2 to FIN I0854-3 |
| ============================================== |
| Date Modified: July 27, 2004 |
| |
| Updated Sections: Corrective Action |
| |
| CORRECTIVE ACTION: Eliminated temporary workaround fix and replaced |
| with permanent patch fix. |
| |
| |
| Change History from I0854-1 on Dec 17, 2002 |
| =========================================== |
| Date Modified: Nov/24/2003 |
| |
| Updates: CORRECTIVE ACTION: |
| |
| CORRECTIVE ACTION: . Added a procedure for 'disk replacement for Root |
| Disk', and 'V880 disk replacement'. |
| |
| . Added a procedure for Manual replacement on |
| Volume Manager Disk w/CLI on 'Mirrored boot disk'|
| and 'Standard mirrored data disk'. |
-------------------------------------------------------------------------
After removing an FCAL disk under VM 3.2 control from the internal
disk sub-system of a V880, V480, E3500, or from an A5X00, an 'Invalid
device structure' error message can be seen. This causes the
replacement of the disks to fail. When this happens, a reboot is
needed to allow replacement of the disk to proceed. Having to reboot
the system nullifies the advantage of disk hotswap, causing unnecessary
downtime.
The following is an example of removing a disk under control of VM3.2.
Invalid device can be seen that's left behind:
# vxdiskadm
Volume Manager Support Operations
Menu: VolumeManager/Disk
1 Add or initialize one or more disks
2 Encapsulate one or more disks
3 Remove a disk
4 Remove a disk for replacement
5 Replace a failed or removed disk
6 Mirror volumes on a disk
7 Move volumes from a disk
8 Enable access to (import) a disk group
9 Remove access to (deport) a disk group
10 Enable (online) a disk device
11 Disable (offline) a disk device
12 Mark a disk as a spare for a disk group
13 Turn off the spare flag on a disk
14 Unrelocate subdisks back to a disk
15 Exclude a disk from hot-relocation use
16 Make a disk available for hot-relocation use
17 Prevent multipathing/Suppress devices from VxVM's view
18 Allow multipathing/Unsuppress devices from VxVM's view
19 List currently suppressed/non-multipathed devices
20 Change the disk naming scheme
21 Get the newly connected/zoned disks in VxVM view
list List disk information
? Display help about menu
?? Display help about the menuing system
q Exit from menus
Select an operation to perform: 4
Remove a disk for replacement
Menu: VolumeManager/Disk/RemoveForReplace
Use this menu operation to remove a physical disk from a disk group,
while retaining the disk name. This changes the state for the disk
name to a "removed" disk. If there are any initialized disks that are
not part of a disk group, you will be given the option of using one of
these disks as a replacement.
Enter disk name [,list,q,?] list
Disk group: rootdg
DM NAME DEVICE TYPE PRIVLEN PUBLEN STATE
dm disk01 c1t11d0s2 sliced 4711 35358848 -
dm disk02 c1t13d0s2 sliced 4711 35358848 -
Enter disk name [,list,q,?] disk01
The following volumes will lose mirrors as a result of this operation:
vol01
No data on these volumes will be lost.
The following devices are available as replacements:
c1t2d0
Choose one of these disks now, to replace disk01.
Select "none" if you do not wish to select a replacement disk.
Choose a device, or select "none"
[,none,q,?] (default: c1t2d0) none
The requested operation is to remove disk disk01 from disk group
rootdg. The disk name will be kept, along with any volumes using the
disk, allowing replacement of the disk.
Select "Replace a failed or removed disk" from the main menu
when you wish to replace the disk.
Continue with operation? [y,n,q,?] (default: y)
Removal of disk disk01 completed successfully.
Remove another disk? [y,n,q,?] (default: n)
Volume Manager Support Operations
Menu: VolumeManager/Disk
1 Add or initialize one or more disks
2 Encapsulate one or more disks
3 Remove a disk
4 Remove a disk for replacement
5 Replace a failed or removed disk
6 Mirror volumes on a disk
7 Move volumes from a disk
8 Enable access to (import) a disk group
9 Remove access to (deport) a disk group
10 Enable (online) a disk device
11 Disable (offline) a disk device
12 Mark a disk as a spare for a disk group
13 Turn off the spare flag on a disk
14 Unrelocate subdisks back to a disk
15 Exclude a disk from hot-relocation use
16 Make a disk available for hot-relocation use
17 Prevent multipathing/Suppress devices from VxVM's view
18 Allow multipathing/Unsuppress devices from VxVM's view
19 List currently suppressed/non-multipathed devices
20 Change the disk naming scheme
21 Get the newly connected/zoned disks in VxVM view
list List disk information
? Display help about menu
?? Display help about the menuing system
q Exit from menus
Select an operation to perform: q
Goodbye.
# luxadm remove_device /dev/rdsk/c1t11d0s2
WARNING!!! Please ensure that no filesystems are mounted on these device(s).
All data on these devices should have been backed up.
The list of devices being used (either busy or reserved) by the host:
1: Box Name: "dak" slot 9
Please enter 's' or to Skip the "busy/reserved" device(s) or
'q' to Quit and run the subcommand with
-F (force) option. [Default: s]:
# luxadm remove_device -F /dev/rdsk/c1t11d0s2 --> Had to Force removal
of device.
WARNING!!! Please ensure that no filesystems are mounted on these device(s).
All data on these devices should have been backed up.
The list of devices which will be removed is:
1: Box Name: "dak" slot 9
Node WWN: 2000002037d9ff50
Device Type:Disk device
Device Paths:
/dev/rdsk/c1t11d0s2
Please verify the above list of devices and then enter 'c' or to
Continue or 'q' to Quit. [Default: c]:
stopping: Drive in "dak" slot 9....Done
offlining: Drive in "dak" slot 9....Done
Hit after removing the device(s).
Jun 6 08:51:52 eis-dak-f picld[233]:
Device DISK9 removed
Jun 6 08:51:52 eis-dak-f picld[233]: Device DISK9 removed
Drive in Box Name "dak" slot 9
Logical Nodes being removed under /dev/dsk/ and /dev/rdsk:
Logical Nodes being removed under /dev/dsk/ and /dev/rdsk:
c1t11d0s0
c1t11d0s1
c1t11d0s2
c1t11d0s3
c1t11d0s4
c1t11d0s5
c1t11d0s6
c1t11d0s7
# format
Searching for disks...done
AVAILABLE DISK SELECTIONS:
0. c1t0d0
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037f3d422,0
1. c1t1d0
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037bd2c91,0
2. c1t2d0
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff6a,0
3. c1t3d0
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff44,0
4. c1t4d0
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff4c,0
5. c1t5d0
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037e6057a,0
6. c1t8d0
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9fd70,0
7. c1t9d0
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff5d,0
8. c1t10d0
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff56,0
9. c1t11d0
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff50,0
10. c1t12d0
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037bde2dc,0
11. c1t13d0
/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff46,0
Specify disk (enter its number): 1
selecting c1t1d0
[disk formatted]
FORMAT MENU:
disk - select a disk
type - select (define) a disk type
partition - select (define) a partition table
current - describe the current disk
format - format and analyze the disk
repair - repair a defective sector
label - write label to the disk
analyze - surface analysis
defect - defect list management
backup - search for backup labels
verify - read and display labels
save - save new disk/partition definitions
inquiry - show vendor, product and revision
volname - set 8-character volume name
! - execute , then return
quit
format> q
# devfsadm -C
# cd /dev/rdsk
# ls -al c1t11*
lrwxrwxrwx 1 root root 74 May 22 08:10 c1t11d0s0 ->
../../devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff50,0:a,raw
lrwxrwxrwx 1 root root 74 May 22 08:10 c1t11d0s1 ->
../../devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff50,0:b,raw
lrwxrwxrwx 1 root root 74 May 22 08:10 c1t11d0s2 ->
../../devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff50,0:c,raw
lrwxrwxrwx 1 root root 74 May 22 08:10 c1t11d0s3 ->
../../devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff50,0:d,raw
lrwxrwxrwx 1 root root 74 May 22 08:10 c1t11d0s4 ->
../../devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff50,0:e,raw
lrwxrwxrwx 1 root root 74 May 22 08:10 c1t11d0s5 ->
../../devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff50,0:f,raw
lrwxrwxrwx 1 root root 74 May 22 08:10 c1t11d0s6 ->
../../devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff50,0:g,raw
lrwxrwxrwx 1 root root 74 May 22 08:10 c1t11d0s7 ->
../../devices/pci@8,600000/SUNW,qlc@2/fp@0,0/ssd@w2100002037d9ff50,0:h,raw
All above Devices should NOT exist.......
# luxadm insert_device
Please hit when finished adding Fibre Channel Enclosure(s)/Device(s):
Jun 6 08:54:49 eis-dak-f picld[233]: Device DISK9 inserted
Jun 6 08:54:49 eis-dak-f picld[233]: Device DISK9 inserted
Waiting for Loop Initialization to complete...
New Logical Nodes under /dev/dsk and /dev/rdsk :
c1t11d0s0
c1t11d0s1
c1t11d0s2
c1t11d0s3
c1t11d0s4
c1t11d0s5
c1t11d0s6
c1t11d0s7
No new enclosure(s) were added!!
# vxdctl enable
Jun 6 08:56:37 eis-dak-f vxdmp: NOTICE: vxvm:vxdmp: enabled path 118/0x48
belonging to the dmpnode 239/0x10
Jun 6 08:56:37 eis-dak-f vxdmp: NOTICE: vxvm:vxdmp: enabled path 118/0x48
belonging to the dmpnode 239/0x10
Jun 6 08:56:37 eis-dak-f vxdmp: NOTICE: vxvm:vxdmp: enabled dmpnode
239/0x10
Jun 6 08:56:37 eis-dak-f vxdmp: NOTICE: vxvm:vxdmp: enabled dmpnode
239/0x10
Jun 6 08:56:41 eis-dak-f vxdmp: NOTICE: vxvm:vxdmp: disabled path 118/0x208
belonging to the dmpnode 239/0x8
Jun 6 08:56:41 eis-dak-f vxdmp: NOTICE: vxvm:vxdmp: disabled path 118/0x208
belonging to the dmpnode 239/0x8
Jun 6 08:56:41 eis-dak-f vxdmp: NOTICE: vxvm:vxdmp: disabled dmpnode
239/0x8
Jun 6 08:56:41 eis-dak-f vxdmp: NOTICE: vxvm:vxdmp: disabled dmpnode
239/0x8
# vxdisk list
DEVICE TYPE DISK GROUP STATUS
c1t0d0s2 sliced - - error
c1t1d0s2 sliced - - error
c1t2d0s2 sliced - - online
c1t3d0s2 sliced - - error
c1t4d0s2 sliced - - online
c1t5d0s2 sliced - - error
c1t8d0s2 sliced - - online
c1t9d0s2 sliced - - online
c1t10d0s2 sliced - - online
c1t11d0s2 sliced - - error
c1t11d0s2 sliced - - error --> Invalid Device
c1t12d0s2 sliced - - error
c1t13d0s2 sliced disk02 rootdg online
- - disk01 rootdg removed was:c1t11d0s2
The root cause of the problem is that if the VxVM is not disabled,
then the drive device node is dangling and should be removed from the
device tree. After the new drive is inserted, a new node is created
on the device tree and will not see the dangling device node.
Implementation:
---
| | MANDATORY (Fully Proactive)
---
---
| X | CONTROLLED PROACTIVE (per Sun Geo Plan)
---
---
| | REACTIVE (As Required)
---
Corrective Action:
The following recommendation is provided as a guideline for authorized
SUN Services Field Representatives who may encounter the above
mentioned issue.
This issue has now been fixed in 113201-03 (For Solaris 6,7,8) and
112385-05 (Solaris 9). The appropriate patch need to be applied to avoid
issue.
Once patches are appllied, the workaround in bug 4630477 should not be
required and normal disk replacemnt procedure should now be followed to
replace a failed disk.
Comments:
Using 'vxdisk rm' command in conjunction with luxadm -e offline will
allow replacement of disks to succeed without having to reboot the
system, maintaining uptime and availablity, and allowing hotswap of
disks to remain a viable solution. The advantage to using this
procedure is that whether the disk was pulled prior to taking any VM
action, or when using the procedure outlined in above steps, disk
replacement is successful without needing a reboot. In reference to
SRDB 17003 using option 11 to offline a disk through VM, in some cases
this will work, and in other cases where the disk has physically been
pulled first, option 11 to offline disk won't cleanup any opens left.
Thus we will have the duplicate enties.
============================================================================
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO index.
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@sdpsweb.EBay
--------------------------------------------------------------------------