Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Sun Alert Sure Solution 1000559.1 : SE3310/SE3320/SE3510/SE3511 Storage Arrays May Experience Data Integrity Events
PreviouslyPublishedAs 200705 Bug Id <SUNBUG: 6511494> Product Sun StorageTek 3510 FC Array Sun StorageTek 3310 NAS Array Sun StorageTek 3320 SCSI Array Sun StorageTek 3511 SATA Array Date of Workaround Release 22-FEB-2007 Date of Resolved Release 20-MAR-2007 Impact System panics and warning messages on the host Operating System may occur due to a filesystem reading and acting on incorrect data from the disk or a user application reading and acting on incorrect data from the array. Contributing Factors This issue can occur on the following platforms:
The above raid arrays (single or double controller) with "Write-Back Caching" enabled on Raid 5 LUNs (or other raid level LUNs and an array disk administration action occurs), can return stale data when the I/O contains writes and reads in a very specific pattern. This pattern has been observed in both QFS and UFS metadata updates, and could be seen in other situations. Symptoms Filesystem warnings and panics occur and with no indication of an underlying storage issue. For UFS these messages could include: "panic: Freeing Free Frag" WARNING: /<mount point>: unexpected allocated inode XXXXXX, run fsck(1M) -o f WARNING: /<mount point>: unexpected free inode XXXXXX, run fsck(1M) -o f This list is not exhaustive and other symptoms of stale data read might be seen. Workaround Disable the "Write-Back Caching" option inside the array using your preferred array administration tool (sccli(1M) or telnet). This workaround can be removed on final resolution. Use ZFS to detect (and correct if configured) the Data Integrity Events. If not using a filesystem make sure your application has checksums and identity information embedded in its disk data so it can detect Data Integrity Events. Migrating back to 3.X firmware is a major task and is not recommended. Resolution This issue is addressed on the following platforms:
Modification History Date: 20-MAR-2007
Date: 23-MAY-2007
References<SUNPATCH: 113722-16><SUNPATCH: 113730-02> <SUNPATCH: 113723-17> <SUNPATCH: 113724-10> Previously Published As 102815 Internal Comments sub-CR: 2146925 Comments: this regression was introduced when the firmware 4.X code base was introduced by the software manufacturer. PTS Reviewer (approved by): James.Evans@Sun.COM 23-May-2007 added, per ENG: this issue is also seen in QFS metadata updates Internal Contributor/submitter tim.uglow@sun.com Internal Eng Business Unit Group NWS (Network Storage) Internal Eng Responsible Engineer tejinder.singh@sun.com Internal Services Knowledge Engineer karen.edwards@sun.com Internal Escalation ID 1-21037989, 1-21037591, 1-20986492, 1-20837385, 1-19849835 Internal Resolution Patches 113722-16, 113730-02, 113723-17, 113724-10 Internal Sun Alert Kasp Legacy ID 102815 Internal Sun Alert & FAB Admin Info Critical Category: Data Loss, Availability ==> Regression Significant Change Date: 2007-02-22, 2007-03-20 Avoidance: Firmware Responsible Manager: tejinder.singh@sun.com Original Admin Info: [WF 23-May-2007, dave m: request by ENG to include clarification for QFS, just found recently, important to FEs] [WF 21-Feb-2007, karened: submitted friday 16-Feb, I'm just drafting now and will send to sunalert_review] Internal SA-FAB Eng Submission -------- Original Message -------- Subject: Draft Sun Alert: Bug ID 6511494 :3510/3310 with huge I/O and Write Cache Enable can lead to filesystem corruption and panic Date: Fri, 16 Feb 2007 21:37:06 +0000 From: tim uglow To: sunalert-submit@Sun.COM CC: nws-review@Sun.COM, pts-storage-sunalerts@Sun.COM, Tim Uglow - Principal Engineer Hi Please find my draft Sun Alert for this minnow issue. ------------------------------------------------------------------------------------------------------------------- Synopsis: SE3310/SE3320/SE3510/SE3511 Storage arrays can suffer data integrity events. Category: {X] Data Loss. [X] Availability. Product: SUN 3310/3320/3510/3511 Raid arrays BugID: 6511494 Avoidance: [X] Workaround State: [X] Workaround 1. Impact: A user application could read and act on incorrect data from the array. A filesystem could read and act on incorrect data from the disk and produce warning messages or panic the host Operating System. 2. Contributing Factors: This issue can occur on the following platforms: * Sun StorEdge 3310 (SCSI) Array with firmware version 4.11K/4.13B/4.15F (as delivered in patch 113722-10/113722-11/113722-15) * Sun StorEdge 3320 (SCSI) Array with firmware version 4.15G (as delivered in patch 113730-01) * Sun StorEdge 3510 (FC) Array with firmware version 4.11I/4.13C/4.15F (as delivered in patch 113723-10/113723-11/113723-15) * Sun StorEdge 3511 (FC) Array with firmware version 4.11I/4.13C/4.15F (as delivered in patch 113724-04/113724-05/113724-09) Sun StorEdge 3310/3320/5310 raid arrays (single or double controller) with "Write Behind Caching" enabled on Raid 5 LUNs (or other raid level LUNs and an array disk administration action occurs), can return stale data when the i/o contains writes and reads in a very specific pattern. This pattern has only be observed in UFS metadata updates but could be seen in other situations. 3. Symptoms: Filesystem warnings and panics with no indication of an underlying storage issue. For UFS these messages could include.. "panic: Freeing Free Frag" WARNING: / -o f WARNING: / This list is not exhaustive other symptons of stale data read could be seen. 4. Relief/Workaround: Disable the "write behind" caching option inside the array using your preferred array administration tool(sccli(1M) or telnet), this workaround can be removed on final resolution. Use ZFS to detect (and correct if configured ) the Data Integrity Events. If not using a filesystem make sure your application has checksums and identity information embedded in its disk data so it can detect Data Integrity Events. Migrating back to 3.X firmware is a major task and is not recommended. 5. Resolution: A final resolution is pending urgent completion. 6. Internal Section: Escalation IDs: 1-21037989 1-21037591 1-20986492 1-20837385 1-19849835 Pending Patches: I'm just getting the proposed patch numbers, but FYI the fix will go into the following firmware versions... SE3510 4.15G SE3310 4.15G SE3511 4.15G SE3320 4.15H and 4.21 for all array types. Resolution Patches: FIN: FCO: Submitter: tim.uglow@sun.com Responsible Engineer: tejinder.singh@sun.com Responsible Manager: tejinder.singh@sun.com PTS/Engineering organization: [X] NWS (Network Storage) Distribution: [X] Public SunSolve Comments: this regression was introduced when the firmware 4.X code base was introduced by the software manufacturer. PTS Reviewer (approved by): James.Evans@Sun.COM -------------------------------------------------------------------------------------------------------------------- thanks tim ReferencesSUNPATCH:113722-16SUNPATCH:113723-17 SUNPATCH:113724-10 SUNPATCH:113730-02 Attachments This solution has no attachment |
||||||||||||
|