Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Sun Alert Sure Solution 1000907.1 : Sun StoreEdge T3 and T3+ Arrays (Including SE3900 and SE6900 Series) May Reset and/or Briefly Lose Host Connectivity After Running Continuously For 497 Days
PreviouslyPublishedAs 201217 Product Sun StorageTek 3900 Series Sun StorageTek 6900 Series Sun StorageTek T3 Array Sun StorageTek T3+ Array Bug Id <SUNBUG: 4785593> Date of Workaround Release 30-JAN-2003 Date of Resolved Release 30-MAY-2003 Impact Sun StorEdge T3 and T3+ arrays (including Sun StorEdge T3+ arrays contained in the Sun StorEdge 3900 and Sun StorEdge 6900 Series) may reset and/or lose host connectivity for 2-3 minutes if it has been running continuously for exactly 497 days and if I/O operations are in progress at that time. Data may become unavailable or may get lost permanently, depending on the system configuration and how applications react to the arrays resetting or temporarily losing host connectivity. Contributing Factors This issue can occur with the following configurations:
This issue will only occur if I/O operations are executed on the array at the time it has been running continuously for exactly 497 days. If the array is idle at that time, this issue will not occur. Note: There is no "uptime" or similar command on the T3/T3+. To identify how long the T3/T3+ has been running it is necessary to review the T3/T3+ syslog (or remote logging file), or to review the change logs possibly kept by the system administrator to find the date of the last T3/T3+ boot. Applying new firmware to a T3/T3+ requires an array reboot. Therefore, T3/T3+ arrays whose firmware has been kept updated with firmware releases will have been rebooted during the update process and hence are less likely to be impacted by this issue. Symptoms 1. Messages similar to the following may be seen in the "/var/adm/messages" file: unix: ID[SUNWssa.socal.link.5010] socal3: port 0: Fibre Channel is OFFLINE unix: WARNING: /sbus@49,0/SUNW,socal@1,0/sf@0,0 (sf6): unix: Offline Timeout unix: sf6: target 0x2 al_pa 0xe4 offlined unix: WARNING: /sbus@49,0/SUNW,socal@1,0/sf@0,0/ssd@w50020f2300007193,0 (ssd3): unix: SCSI transport failed: reason 'tran_err': giving up The above messages indicate a loss of host connectivity to a T3/T3+ array and may occur for different reasons. Should the issue described in this Sun Alert document occur, one of the sets of T3/T3+ messages listed below will also be seen: 2. As it restarts, the T3/T3+ syslog may record the following reason for the T3/T3+ resetting: ROOT[1]: W: u1ctr Assertion Reset (3000) was initiated at yyyymmdd hhmmss ../../common/bss/qlcf.c line xxxx, Assert(cmd->cmd_deadline != CAM_TIME_INFINITY) => 0 BOOT 3. The T3/T3+ syslog may record messages similar to the following, at the time that host connectivity is lost: ISR1[1]: N: u1ctr ISP2100[2] Fatal timeout on host 125 ISR1[1]: N: u1ctr ISP2100[2] Received LIP(f0,f0) async event ISR1[1]: N: u1ctr ISP2100[2] qlcf_invalidate_pdb: PDB Invalidate (host id 125) FCC0[1]: N: u1ctr Port event received on port 0, abort 0 (id 125) ISR1[1]: N: u1ctr ISP2100[2] qlcf_sync_pdb: PDB Sync Initiated (host id 125) ISR1[1]: N: u1ctr ISP2100[2] qlcf_update_pdb: PDB Sync Done (host id 125) ISR1[1]: N: u1ctr ISP2100[2] PDB Sync Done (host id 125, host WWN 2004020000101a00) FCC0[1]: N: u1ctr PDB Changed on port 0 (id 125) [...] 10 seconds later: ISR1[1]: N: u1ctr ISP2100[2] Fatal timeout on host 125 ISR1[1]: N: u1ctr ISP2100[2] qlcf_i_watch_host_port: Debug Code - ISP2100 Hang Detected ISR1[1]: N: u1ctr ISP2100[2] interface going offline ISR1[1]: N: u1ctr ISP2100[2] qlcf_init_pdb: PDB Initialize ISR1[1]: N: u1ctr ISP2100[2] QLCF_I_ABORT_ALL_TM_CMDS: Target-mode Flush Started (lun = 0x0) ISR1[1]: N: u1ctr ISP2100[2] interface going online [...] SIMT[1]: N: u1ctr Initializing host port u1p1 ISP2100 ... firmware status = 7 [...] DUMP[1]: N: u1ctr ISP2100[2] [==>BEG]ISPDEBUGDUMP: DUMP[1]: N: u1ctr ISP2100[2] PBIU REGISTERS (OFFSET 00H, 8): DUMP[1]: N: u1ctr ISP2100[2] 0000 0001 0002 0003 0004 0005 0006 0007 DUMP[1]: N: u1ctr ISP2100[2] ---- ---- ---- ---- ---- ---- ---- ---- [... followed by lines of hex data ...] Workaround To prevent the described issue, the Sun StorEdge T3/T3+ array needs to be rebooted before it has been running for 497 days. Resolution This issue is addressed in the following releases:
Modification History Date: 30-MAR-2003
Date: 30-MAY-2003
References<SUNPATCH: 109115-13><SUNPATCH: 112276-07> Previously Published As 101168 Internal Comments
Internal Contributor/submitter younan.sarkis@Sun.COM Internal Eng Business Unit Group NWS (Network Storage) Internal Eng Responsible Engineer sam.gibson@sun.com, sanjay.jagad@sun.com Internal Services Knowledge Engineer olaf.reineke@germany.sun.com Internal Escalation ID 542409, 542406, 542934 Internal Resolution Patches 109115-13, 112276-07 Internal Sun Alert Kasp Legacy ID 101168, 50381 (Sun Alert) Internal Sun Alert & FAB Admin Info Critical Category: Data Loss, Availability ==> Pervasive Significant Change Date: 2003-01-30, 2003-03-30, 2003-05-30 Avoidance: Patch, Workaround Responsible Manager: aseem.rastogi@sun.com Original Admin Info: This document has been imported from KMS Creator and may need adjustment before re-publishing. This imported document has been reviewed/adjusted by: Review Name: Review Date: Original KMS Creator attributes below: --- PLEASE DO NOT MAKE ANY CHANGES BELOW THIS LINE! --- Sun Alert ID: 50381 Synopsis: Sun StoreEdge T3 and T3+ Arrays (Including SE3900 and SE6900 Series) May Reset and/or Briefly Lose Host Connectivity After Running Continuously For 497 Days Category: Data Loss, Availability Product: Sun StorEdge T3, Sun StorEdge T3+, Sun StorEdge 3900 Series, Sun StorEdge 6900 Series BugIDs: 4785593 Avoidance: Patch, Workaround State: Resolved Date Released: 30-Jan-2003, 30-Mar-2003, 30-May-2003 Date Closed: 30-May-2003 Date Modified: 30-Mar-2003, 30-May-2003 Escalation IDs: 542409, 542406, 542934 Pending Patches: Resolution Patches: 109115-13, 112276-07 FIN: FCO: Date Submitted: 24-Jan-2003 Submitter: younan.sarkis@Sun.COM Responsible Engineer: sam.gibson@sun.com, sanjay.jagad@sun.com Responsible Manager: aseem.rastogi@sun.com CTE group: CPRE NWS EMEA Responsible Writer: olaf.reineke@germany.sun.com Distribution: Public SunSolve Workflow History: WF State: Issued, 30-May-2003, Olaf Reineke WF Note: Re-issued resolved. WF State: Issued, 30-Jan-2003, Karen Edwards WF Note: oked by susan, released by karen WF State: Draft, 30-Jan-2003, Olaf Reineke WF Note: Sent out for signoff WF State: Draft, 29-Jan-2003, Olaf Reineke WF Note: Sent out for technical review and BT review. WF State: Draft, 28-Jan-2003, Olaf Reineke WF Note: Article created. Exported from KMS Creator Sat May 21 08:55:01 2005 GMT, olaf.reineke@sun.com Internal SA-FAB Eng Submission Sun StoreEdge T3 and T3+ Arrays (Including SE3900 and SE6900 Series) May Reset and/or Briefly Lose Host Connectivity After Running Continuously For 497 Days Product_uuid 04ccc2c2-16a1-11d7-9f9a-f83fdd2e2f1b|Sun StorageTek 3900 Series 09a6d778-16f2-11d7-8802-94885c013b6c|Sun StorageTek 6900 Series 2a6d7d50-0a18-11d6-8e0b-f0bd33b24928|Sun StorageTek T3 Array 2a714b10-0a18-11d6-86e2-d56b387d4fbf|Sun StorageTek T3+ Array ReferencesSUNPATCH:109115-13SUNPATCH:112276-07 Attachments This solution has no attachment |
||||||||||||
|