Document Audience: | INTERNAL |
Document ID: | I0547-1 |
Title: | Intermittently, SCSI devices connected to a UDWIS card may |
Copyright Notice: | Copyright © 2005 Sun Microsystems, Inc. All Rights Reserved |
Update Date: | 2000-01-21 |
---------------------------------------------------------------------
- Sun Proprietary/Confidential: Internal Use Only -
---------------------------------------------------------------------
FIELD INFORMATION NOTICE
(For Authorized Distribution by SunService)
FIN #: I0547-1
Synopsis: Intermittently, SCSI devices connected to a UDWIS card mayCreate Date: Jan/21/00
Keywords:
Intermittently, SCSI devices connected to a UDWIS card may
Top FIN/FCO Report: Yes
Products Reference: UDWIS fcode
Product Category: Server / System Board;
Product Affected:
Mkt_ID Platform Model Description Serial Number
------ -------- ----- ----------- -------------
Systems Affected
----------------
- A11 ALL Ultra Enterprise 1 -
- A12 ALL Ultra Enterprise 1E -
- A14 ALL Ultra Enterprise 2 -
- E3000 ALL Ultra Enterprise 3000 -
- E3500 ALL Ultra Enterprise 3500 -
- E4000 ALL Ultra Enterprise 4000 -
- E4500 ALL Ultra Enterprise 4500 -
- E5000 ALL Ultra Enterprise 5000 -
- E5500 ALL Ultra Enterprise 5500 -
- E6000 ALL Ultra Enterprise 6000 -
- E6500 ALL Ultra Enterprise 6500 -
- E10000 ALL Ultra Enterprise 10000 -
(See Corrective Action)
X-Options Affected
------------------
- - ALL StorEdge A1000 -
- - ALL Netra st A1000 -
- - ALL StorEdge D1000 -
- - ALL Netra st D1000 -
- - ALL StorEdge A3500 -
- - ALL StorEdge L280 tape library -
- - ALL StorEdge L700 tape library -
- - ALL StorEdge L1000 tape library -
- - ALL StorEdge L1800 tape library -
- - ALL StorEdge L3500 tape library -
- - ALL StorEdge L11000 tape library -
Parts Affected:
Part Number Description Model
----------- ----------- -----
370-2443-01 Differential Ultra/Wide SCSI (UDWIS/S) -
References:
BugId: 4272400 4230719
Esc: 523110 522070 521024 522036 522925 523016 523175
FIN: I0552-1
Issue Description:
Intermittently, SCSI devices connected to a UDWIS card with FCode earlier
than 1.28 may not be usable after a system boot or reboot. This can
prevent a system from booting if the boot device is connected to an
affected UDWIS card.
Alternatively, if a non-boot UDWIS card is affected, the system may boot
but will not have have access to any SCSI devices which are connected to
that affected UDWIS card.
These issues can occur irrespective of the type of SCSI device connected
to the UDWIS card and hence can affect Sun and third-party SCSI devices.
This problem is seen on both standalone and clustered systems. Faster
CPUs (like the E10k) are more likely to be affected, so a previously
working system may start to exhibit UDWIS problems after upgrading the
CPUs to those with a higher clock speed. Also systems with a large number
of UDWIS cards are more likely to see these problems since any one of the
installed cards could be affected.
This problem is due to some issues in FCode versions less than 1.28 in
the UDWIS card. When it is booting, the card fails to send a SCSI bus
reset, or the SCSI bus reset is not held long enough to meet the SCSI
specification, or the card fails to initialize correctly. This leads to
the attached SCSI devices failing to negotiate correctly with the host.
Example 1
---------
In some cases, if the affected UDWIS card does not control the boot device
so that the system does boot, then the 'sd' driver will record a corrupted
SCSI inquiry string in the messages file during the boot for devices attached
to the affected UDWIS card. This example was from an A3x00:
unix: sd2044 at QLGC,isp17: target 4 lun 0
unix: sd2044 is /sbus@5d,0/QLGC,isp@1,10000/sd@4,0
unix: Vendor 'SoEG', product '00****7*********',
(unknown capacity)
You would also see corrupted inquiry strings using format -> inquiry.
For example, with a Seagate ST39103LC 9GB drive, you would see:
Vendor S313CU90
This is every _other_ expected character S(T)3(9)1(0)3(L)C(S)U(N)9(.)0....
Example 2
---------
As an example of the possible effect of this problem, with an A3x00
connected to an affected UDWIS card, a controller path will be offline.
In a cluster, this affects the running node due to I/O loads migrating to
one controller.
Example 3
---------
Another example of the 'missing SCSI bus reset' problem in a cluster
environment, can be recognized as follows:
In a configuration with an A3x00 connected in a cluster, then when one
node reboots, the other (running) node should record the same number of
SCSI bus resets as the number of shared SCSI busses.
For example, if 8 A3x00s are dual-hosted, then the running node should
receive 16 resets like the following, when the other node is booting:
unix: WARNING: /sbus@49,0/QLGC,isp@1,10000 (isp5):
unix: Received unexpected SCSI Reset
If fewer than 16 are received, then expect some A3x00 controllers to
be offline.
Implementation:
---
| | MANDATORY (Fully Pro-Active)
---
---
| X | CONTROLLED PRO-ACTIVE (per Sun Geo Plan)
---
---
| | REACTIVE (As Required)
---
Corrective Action:
Enterprise Customers and authorized Field Service Representatives may
avoid the above mentioned problems by following the recommendations
as shown below:
If the affected UDWIS card is connected to an A3x00 which is not the
boot device, then a simple workaround is to online the failed controller
through RM6.
- In the recovery guru -> options -> manual recovery -> controller pairs.
With other SCSI devices, the only workaround is to perform a shutdown and
bringup on the affected E10k domain (a reboot will not correct the problem
in all cases), or a shutdown and power cycle on other systems (since,
again, a reboot will not correct the problem in all cases).
**********
This issue is currently being addressed in forthcoming FCO A0163-1 that
will update the UDWIS card to FCode version 1.28.
**********
The recommendation is to evaluate customer configurations to determine if
the forthcoming FCO A0163-1 change will apply. This problem can affect
any devices connected to UDWIS cards including D1000, A1000, A3x00, or
A7000 Sun Storage products, as well as SCSI-attached OEM storage,
especially (but not only) when connected to systems with fast CPUs, like
E10000s.
Also strongly recommend that any mission-critical sites implement
FCO A0163-1 once it is released. Currently the FCO A163-1 is not released
and is in pending state.
Comments:
There are three ways to determine if a particular UDWIS card has the
affected (versions earlier than 1.28) FCode:
a) Physically inspect the card.
The part number sticker on the SBus connector will show '370-2443-01'
for FCode versions prior to 1.28.
b) Use the OBP '.properties' command.
Use the following sequence of commands at the 'ok' prompt:
dev
.properties
device-end
First, find the correct path to the UDWIS card you wish to check and use
that for the 'dev' command (see example below).
When '.properties' executes, examine the value of the property "isp-fcode".
If it shows "1.28 99/11/08" then this is the 1.28 FCode which contains the
fixes for the problems described in this FIN. If it shows any earlier
version number, then the problems described in this FIN may be experienced.
Here is an example from Ultra-2 with a UDWIS card with the affected
(pre-1.28) FCode in SBus slot 1:
ok> reset-all
ok> dev /sbus@1f,0/QLGC,isp@1,10000
ok> .properties
scsi-initiator-id 00000007
clock-frequency 03938700
differential
isp-fcode 1.25 96/10/15 <-- FCode earlier than 1.28
device_type scsi
intr 00000003 00000000
interrupts 00000003
wide
fast-20
reg 00000001 00010000 00000450
64-bit-clean
model QLGC,ISP1000U
name QLGC,isp
ok> device-end
ok>
This example shows a UDWIS card with FCode version 1.25. Since this is
earlier than version 1.28, it could experience the problems described in
this FIN.
c) Use the Solaris 'prtconf -vp' command.
In the output from the 'prtconf -vp' command, examine the 'isp-fcode' value
as for the OBP example above.
Example from a system with a UDWIS card with the affected (pre-1.28) FCode:
[lots of other output ...]
Node 0xf007aa94
scsi-initiator-id: 00000007
clock-frequency: 03938700
differential: 00
isp-fcode: '1.25 96/10/15' <-- FCode earlier than 1.28
device_type: 'scsi'
intr: 00000003.00000000
interrupts: 00000003
wide: 00
fast-20: 00
reg: 00000001.00010000.00000450
64-bit-clean: 00
model: 'QLGC,ISP1000U'
name: 'QLGC,isp'
[lots of other output ...]
--------------------------------------------------------------------------
Implementation Footnote:
i) In case of MANDATORY FINs, Enterprise Services will attempt to
contact all affected customers to recommend implementation of
the FIN.
ii) For CONTROLLED PROACTIVE FINs, Enterprise Services mission critical
support teams will recommend implementation of the FIN (to their
respective accounts), at the convenience of the customer.
iii) For REACTIVE FINs, Enterprise Services will implement the FIN as the
need arises.
----------------------------------------------------------------------------
All released FINs and FCOs can be accessed using your favorite network
browser as follows:
SunWeb Access:
--------------
* Access the top level URL of http://sdpsweb.ebay/FIN_FCO/
* From there, select the appropriate link to query or browse the FIN and
FCO Homepage collections.
SunSolve Online Access:
-----------------------
* Access the SunSolve Online URL at http://sunsolve.Corp/
* From there, select the appropriate link to browse the FIN or FCO index.
Supporting Documents:
---------------------
* Supporting documents for FIN/FCOs can be found on Edist. Edist can be
accessed internally at the following URL: http://edist.corp/.
* From there, follow the hyperlink path of "Enterprise Services Documenta-
tion" and click on "FIN & FCO attachments", then choose the appropriate
folder, FIN or FCO. This will display supporting directories/files for
FINs or FCOs.
Internet Access:
----------------
* Access the top level URL of https://infoserver.Sun.COM
--------------------------------------------------------------------------
General:
--------
* Send questions or comments to finfco-manager@Sun.COM
---------------------------------------------------------------------------