Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-73-1267544.1
Update Date:2011-03-20
Keywords:

Solution Type  FAB (standard) Sure

Solution  1267544.1 :   Older versions of the Service Processor firmware on Sun Storage 7110, 7210, 7310 and 7410 can leak memory.  


Related Items
  • Sun Storage 7410 Unified Storage System
  •  
  • Sun Storage 7310 Unified Storage System
  •  
  • Sun Storage 7110 Unified Storage System
  •  
  • Sun Storage 7210 Unified Storage System
  •  
Related Categories
  • GCS>Sun Microsystems>Sun FAB>Standard>Reactive
  •  




In this Document
  Symptoms
  Changes
  Cause
  Solution


Oracle Confidential (PARTNER). Do not distribute to customers
Reason: FABs available to Internals and Partners only

Applies to:

Sun Storage 7410 Unified Storage System - Version: Not Applicable to Not Applicable - Release: N/A to N/A
Sun Storage 7110 Unified Storage System - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
Sun Storage 7210 Unified Storage System - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
Sun Storage 7310 Unified Storage System - Version: Not Applicable to Not Applicable   [Release: N/A to N/A]
Information in this document applies to any platform.
__________

SUNBUG 6961121

Symptoms

Systems running older versions of bios 43 or lower:
  •   Cannot connect to Service Processor via serial or network.
  •   Service Processor absent from hardware details page in BUI Alert.
  •   Service Processor has stopped responding to requests.
  •   Directories, such as /SYS, missing from SP interface.
  •   Fans in server node running continuously at full speed.
  •   Slow throughput to system disks (due to fan vibration).
  •   Time out during software upgrade (due to system disks/fan vibration).
  •   Disappearing Readzillas

   [Messages displayed on the system console: ]
   WARNING: /pci@0,0/ci10de,cb84@5:
   SATA device detached at port 0 <----
   WARNING: /pci@0,0/pci10de,cb84@5/disk@0,0 (sd0):
   Command failed to complete...Device is gone


Alerts displayed from appliance Management BUI/CLI:

   The disk in slot 'HDD 0' has been removed from chassis 'sus7410-010'

The above alerts indicate the readzilla installed in slot 0 (HDD0) has been removed.

If these errors are seen for HDD0, HDD1, HDD2, HDD3, HDD4 or HDD5, which were automatically generated by the appliance, proceed with the BIOS update procedure.

Impact


Older versions of the Service Processor firmware can leak memory, eventually resulting in a variety of issues as listed in the symptoms section. Also, appliances installed with Readzilla (slots 0-5) and running BIOS 0ABMN064 when under heavy workload can result in loss of connectivity to the Readzilla and its detection as a removed device, resulting in a performance degradation of read workload.

Changes

Contributing Factors

Platforms with Readzilla (slots 0-5) installed and running BIOS 0ABMN064 described in this document are impacted by this issue. The failure mode described in CR 6961121 can be fixed by applying the BIOS update to BIOS 0ABMN080, as described in CR 6965414 along with Appliance OS upgrade to 2010.Q1.3.0 (or later) Release.

Cause

Root Cause

The PHY settings for Slots 0-7 were modified in BIOS 0ABMN064 to accommodate the Seagate Dragonfly 2.5" Boot disk which resulted in the issue mentioned in this document. The fix for this issue which is the BIOS 0ABMN080 reverts the PHY settings of Slots 0-5 to their original settings and slots 6-7 with the optimized settings for the Seagate Dragonfly 2.5" Boot disk.

Additionally, the nv_sata driver does not handle controller resets correctly and can trigger hardware bugs leading to controller lockup when poor signal integrity results in a device reset.

Solution

Workaround

No workaround available - see Resolution section below.

Resolution

Note: The following should only be performed by trained Oracle service personnel.

+ For the Sun Storage 7110:

BIOS 80 must be loaded to resolve memory leak conditions in the SP firmware.

First - Upgrade the Fishworks software to 2010.Q1.3 (or later) release by following the procedure provided in online help of the appliance BUI.

Second - Download the correct BIOS (0ABMN080) and Service Processor firmware (version 2.0.2.16) for the system being serviced.

For installing Sun Storage 7000 Software Update 2010.Q1.3.0 (or later) the Release Notes can be found at...

   http://wikis.sun.com/display/FishWorks/Fishworks

...and the release itself is linked from the Software Updates page;

   http://wikis.sun.com/display/FishWorks/Software+Updates

+ For the Sun Storage 7210:

BIOS 32 must be loaded to resolve memory leak conditions in the SP firmware.

Download the ILOM and BIOS package from the following link;

  http://pts-storage.us.oracle.com/products/AmberRoad/download/0ABNF032-r45117.pkg

+ For the Sun Storage 7310 and 7410:

BIOS 80 must be loaded on the appliance in order to resolve memory leak condition and the "Readzilla Removed" issue mentioned in this document. Appliance OS should also be upgraded to 2010.Q1.3.0 (or later).

First - Upgrade the Fishworks software to 2010.Q1.3 (or later) release by following the procedure provided in online help of the appliance BUI.

Second - Download the correct BIOS (0ABMN080) and Service Processor (version 2.0.2.16) firmware for the system being serviced, as follows:

For installing Sun Storage 7000 Software Update 2010.Q1.3.0 (or later) Release Notes can be found via the below URL...

   http://wikis.sun.com/display/FishWorks/Fishworks

...and the release itself is linked from the Software Updates page below;

   http://wikis.sun.com/display/FishWorks/Software+Updates

The Link to the BIOS80 can be found at;

   https://stbeehive.oracle.com/teamcollab/library/st/AmberRoadSupport/Public+Documents


+ To upgrade the ILOM and BIOS on the Sun Storage 7110, 7210, 7310
    and 7410 do the following:

Connect to the Service Processor via ssh using root credentials. Use this interface to shut down the storage controller with:

"stop /SYS"

It is important to complete the above command before continuing.

Connect to the Service Processor IP address via browser and provide the root login credentials.

Then follow the below steps to upgrade the Service Processor and BIOS on the Sun Storage7110, 7310 and 7410:

1. Click on Maintenance tab
2. Firmware Upgrade will be the default and correct subtab
3. Click on "Enter Upgrade Mode"
4. Confirm this action with the pop up
5. Click on "Browse" and select the appropriate image from your local filesystem
6. Click on "Upload"
7. Wait for upload to complete and the verification to succeed
8. You will now see a Summary Table of the SP firmware and BIOS versions
    (Existing vs New). Confirm that "Preserve existing configuration" is
    checked for the SP Firmware
9. Click on "Start Upgrade"
10. Confirm this action with the pop up
11. Now wait for the upgrade to proceed. If the storage controller was up at
     this point, it will be cleanly shutdown.
    Warning! Do not interrupt the update. Leave the browser undisturbed until
     the update is complete.
12. When finished, you will see "Upgrade Complete" and the SP will reboot.


+ Configuring the BIOS Settings:

Note: If the system hangs during boot following the BIOS upgrade, a work around is to disconnect the SAS cables and reboot to enter into the BIOS setting screen. Be very certain to reconnect the SAS cables immediately after correcting the BIOS settings.

The SP firmware and BIOS will now have been updated to the correct 7000 version. Each platform has specific BIOS settings. Boot the storage controller and enter setup with:

   -> start /SYS
   Are you sure you want to start /SYS (y/n)? y
   Starting /SYS
   -> start /SP/console
   Are you sure you want to start /SP/console (y/n)? y
   Serial console started. To stop, type ???ESC (???


Once you see the initial BIOS banner, hit CONTROL-E a few times; this will trigger the BIOS Setup menu after the initialization. You can drop back to the SP with ???ESC ( ???

Note: Escape followed by shift 9 - at least open parenthesis is usually on shift 9.

   Serial console stopped.
  -> reset /SYS
   Are you sure you want to reset /SYS (y/n)? y
   Performing hard reset on /SYS
   -> start /SP/console
   Are you sure you want to start /SP/console (y/n)? y
   Serial console started. To stop, type ESC (


o For Sun Storage 7110:

Disable PCIPnP Option-ROM scanning for slots 1-5

Disable I/O allocation
Use the right arrow key to page over to "PCIPnP" menu. Use the down arrow to highlight:

   Scanning OPROM on PCI-E Slot1 Enabled

Press return and select "Disabled". This will now appear as:

   Scanning OPROM on PCI-E Slot1 Disabled

Repeat this for slots 2-5 (the last slot is off the bottom of the screen).
The settings should now be:

   Scanning OPROM on PCI-E Slot0 Enabled
   Scanning OPROM on PCI-E Slot1 Disabled
   Scanning OPROM on PCI-E Slot2 Disabled
   Scanning OPROM on PCI-E Slot3 Disabled
   Scanning OPROM on PCI-E Slot4 Disabled
   Scanning OPROM on PCI-E Slot5 Disabled


Just below these OPROM settings are a group of settings which allow IO allocation to be disabled per-slot.

Disable PCI-E slots 1-4. Only slots 0 and 5 should be enabled. It should look like:

   IO Allocation on PCI-E Slot0 Enabled
   IO Allocation on PCI-E Slot1 Disabled
   IO Allocation on PCI-E Slot2 Disabled
   IO Allocation on PCI-E Slot3 Disabled
   IO Allocation on PCI-E Slot4 Disabled
   IO Allocation on PCI-E Slot5 Enabled


On boot, you will see the following warning message from the BIOS:

   Warning: IO resource not allocated

This is an expected message and does not indicate a failure.

Exiting BIOS Setup:

Use right arrow to page over to "Exit". Press for the default "Save Changes ???
and Exit", and again to confirm the action with the pop up.


o For Sun Storage 7210:

Disable PCIPnP Option-ROM scanning for all slots
Disable I/O allocation

Use the right arrow key to page over to "PCIPnP" menu. Use the down arrow to highlight:

   Scanning OPROM on PCI-E Slot0 Enabled

Press return and select "Disabled". This will now appear as:

   Scanning OPROM on PCI-E Slot0 Disabled

Repeat this for slot 1 and 2. You should now have:

   Scanning OPROM on PCI-E Slot0 Disabled
   Scanning OPROM on PCI-E Slot1 Disabled
   Scanning OPROM on PCI-E Slot2 Disabled


Just below these OPROM settings are a group of settings which allow IO allocation to be disabled per-slot. Disable PCI-E slots 0 and 2. Only slot 1 should be enabled. It should look like:

   IO Allocation on PCI-E Slot0 Disabled
   IO Allocation on PCI-E Slot1 Enabled
   IO Allocation on PCI-E Slot2 Disabled

On boot, you will see the following warning message from the BIOS:

   Warning: IO resource not allocated

This is an expected message and does not indicate a failure.

Exiting BIOS Setup:

Use right arrow to page over to "Exit". Press for the default "Save Changes
and Exit", and again to confirm the action with the pop up.


o For Sun Storage 7310:

Disable PCIPnP Option-ROM scanning for all slots Disable I/O allocation Configure boot drives. Use the right arrow key to page over to "PCIPnP" menu. Use the down arrow to highlight:

   Scanning OPROM on PCI-E Slot0 Enabled

Press return and select "Disabled", followed by return. This will now appear as:

   Scanning OPROM on PCI-E Slot0 Disabled

Repeat this for slots 1-2. You should now have:

   Scanning OPROM on PCI-E Slot0 Disabled
   Scanning OPROM on PCI-E Slot1 Disabled
   Scanning OPROM on PCI-E Slot2 Disabled


Just below these OPROM settings are a group of settings which allow IO allocation to be disabled per-slot. Disable PCI-E slots 1 and 2. Only slot 0 should be enabled. It should look like:

   IO Allocation on PCI-E Slot0 Enabled
   IO Allocation on PCI-E Slot1 Disabled
   IO Allocation on PCI-E Slot2 Disabled


Next, arrow over to the Boot menu. Select the last item: "Hard Disk Drives" and press return. The list should include only 2 drives (the 2 internal SATA drives) with labels like:

   SATA:11M-<drive model>
   SATA:12M-<drive model>


If this list includes anything else (such as readzilla cache devices with a 'STEC MACH8' string, or JBOD attached drives) you will need to remove them from the list by selecting the boot position and setting it to 'Disabled' for each non-boot drive.

Exiting BIOS Setup

Once you've removed any readzilla cache or JBOD drive entries from the "Hard Disk Drives" list, perform the following;

- Press ESC to exit the "Hard Disk Drives" menu, then arrow right to th
   "Exit" menu.
- Press for the default "Save Changes and Exit", and return again to
  confirm the action with the pop up.

On boot, you will see the following warning message from the BIOS:

   Warning: IO resource not allocated

This is an expected message and does not indicate a failure.


o For Sun Storage 7410:

Disable PCIPnP Option-ROM scanning for all slots Disable I/O allocation Configure boot drives.

Use the right arrow key to page over to "PCIPnP" menu. Use the down arrow to highlight:

   Scanning OPROM on PCI-E Slot0 Enabled

Press return and select "Disabled", followed by return. This will now appear as:

   Scanning OPROM on PCI-E Slot0 Disabled

Repeat this for slots 1-5 (the last slot is off the bottom of the screen). You should now have:

   Scanning OPROM on PCI-E Slot0 Disabled
   Scanning OPROM on PCI-E Slot1 Disabled
   Scanning OPROM on PCI-E Slot2 Disabled
   Scanning OPROM on PCI-E Slot3 Disabled
   Scanning OPROM on PCI-E Slot4 Disabled
   Scanning OPROM on PCI-E Slot5 Disabled


o For SAS1 configurations (Connected with J4400 JBODs)

Just below these OPROM settings (they are actually off the bottom of the screen and you will need to scroll down) are a group of settings which allow IO allocation to be disabled per slot. Disable PCI-E slots 0-4, checking that slots 4 and 5 are Enabled. It should look like:

   IO Allocation on PCI-E Slot0 Disabled
   IO Allocation on PCI-E Slot1 Disabled
   IO Allocation on PCI-E Slot2 Disabled
   IO Allocation on PCI-E Slot3 Disabled
   IO Allocation on PCI-E Slot4 Enabled
   IO Allocation on PCI-E Slot5 Enabled


o For SAS2 configurations (Connected with J4410 JBODs)

Just below these OPROM settings (they are actually off the bottom of the screen and you will need to scroll down) are a group of settings which allow IO allocation to be disabled per slot. Disable PCI-E slots 1-4, checking that slots 0 and 5 are Enabled. It should look like:

   IO Allocation on PCI-E Slot0 Enabled
   IO Allocation on PCI-E Slot1 Disabled
   IO Allocation on PCI-E Slot2 Disabled
   IO Allocation on PCI-E Slot3 Disabled
   IO Allocation on PCI-E Slot4 Disabled
   IO Allocation on PCI-E Slot5 Enabled

Next, arrow over to the Boot menu. Select the last item: "Hard Disk Drives" and press return. The list should include only 2 drives (the 2 internal SATA drives) with labels like:

   SATA:11M-<drive model>
   SATA:12M-<drive model>


If this list includes anything else (such as readzilla cache devices with a 'STEC MACH8' string, or JBOD attached drives) you will need to remove them from the list by selecting the boot position and setting it to 'Disabled' for each of non-boot drives.

Exiting BIOS Setup

Once you've removed any readzilla cache or JBOD drive entries from the "Hard Disk Drives" list, perform the following;

. Press ESC to exit the "Hard Disk Drives" menu, then arrow right to the
  "Exit" menu.
. Press for the default "Save Changes and Exit", and return again to
  confirm the action with the pop up.

On boot, you will see the following warning message from the BIOS:

   Warning: IO resource not allocated

This is an expected message and does not indicate a failure.


Identification of Affected Parts (how to):

Connect via ssh to the Service Processor and supply root credentials. The version displayed from the command "version" should now be 2.0.2.16 for 7110, 7310 and 7410. The version for 7210 is 2.0.2.15.

To display the BIOS version, type "show /SYS/MB/BIOS". Alternatively, start the system with "start /SYS". The BIOS version will be displayed in the initial BIOS banner. The current version for the 7310 and 7410 is 0ABMN080. The current version for 7210 is 0ABNF032. Any prior version is susceptible to these issues. Checking the BIOS version via other means, such as the administrative BUI/CLI can be viewed in the OS environment.

CLI: Login to CLI by providing administrative credentials and type;

   Configuration ==> Version ==> Show

BUI: Clicking on SUN/Oracle Icon Upon logging into the BUI by providing
        administrative credentials.

Note that checking the SP version via other means, such as the administrative BUI may be unreliable. Due to a bug in some releases, version 2.0.2.16 may also be displayed as 2.0.2.22.


References:

SUNBUG 6961121



For information about FAB documents, its release processes, implementation strategies and billing information, go to the following URL:

* http://tns.central/fab

In addition to the above you may email:

* FAB-Manager@sun.com


Contacts

Contributor: andy.laker@oracle.com
Responsible Engineer: Keith.Wesolowski@oracle.com
Responsible Manager: Keith.Wesolowski@oracle.com
Business Unit Group: Network Storage


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback