Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-73-1020255.1
Update Date:2010-11-16
Keywords:

Solution Type  FAB (standard) Sure

Solution  1020255.1 :   Sun Storage 7410 upgrade procedure for BIOS43 to fix cluster card resource issue.  


Related Items
  • Sun Storage 7410 Unified Storage System
  •  
Related Categories
  • GCS>Sun Microsystems>Sun FAB>Standard>Reactive
  •  

PreviouslyPublishedAs
254928


Bug Id
<SUNBUG: 6805681>

Date of Resolved Release
16-Mar-2009

Product
Sun Storage 7410 Unified Storage System

7410 upgrade procedure for BIOS43 (see details below).

Affected Parts:

375-3487-xx   PCI Express 8-Port Host Adapter
375-3481-xx   PCI Express Quad Gigabit Ethernet UTP
371-0905-xx   PCI Express Dual Gigabit Ethernet UTP
371-0904-xx   PCI Express Dual Gigabit Ethernet MMF
375-3357-xx   Dual-Channel Ultra320 LVD SCSI PCI Express Adapter
371-3024-xx   Cluster Heartbeat Card

Impact

Certain combinations of PCIe cards installed in the Sun Storage 7410 can cause the Cluster Heartbeat (interconnect) Card (Sun p/n 371-3024) to fail to initialize properly.  This results in the cluster feature becoming unusable or unconfigurable.  If the cluster is already configured and additional cards are added to trigger this problem, the clustering software will completely fail.  If the system is being setup for the first time with a failing card/slot combination, cluster will not be configurable during or after the setup phase.

Contributing Factors

The failure mode described here depends on the card combination installed in the PCIe slots.  The following configurations have been confirmed to suffer from the failure mode described by CR 6805681 and can be fixed by applying the BIOS update described here;

    Slot0: SAS HBA
    Slot1: SAS HBA
    Slot2: Empty
    Slot3: Quad-Gig NIC
    Slot4: SAS HBA
    Slot5: clustron

    and

    Slot0: SAS HBA
    Slot1: SAS HBA
    Slot2: Quad-Gig NIC
    Slot3: Empty
    Slot4: Dual-Gig NIC
    Slot5: clustron
  
Other configurations may be susceptible to this problem.  Any configuration with at least one Quad-Gig NIC plus additional cards (beyond base-config) is at high risk.

Symptoms

Message from BIOS when system initially boots:

  Warning: Not enough IO address space for device

The telltale sign of this failure mode are the following messages in the system log:

  clustron: [ID 820254 kern.warning] WARNING: clustron0/1: UART scratch register does not work: expected 0x5a, got 0x00
  clustron: [ID 820254 kern.warning] WARNING: clustron0/1: UART scratch register does not work: expected 0x5a, got 0x00
  clustron: [ID 147989 kern.warning] WARNING: clustron0/0: UART has no 16C650 enhanced mode
  clustron: [ID 147989 kern.warning] WARNING: clustron0/0: UART has no 16C650 enhanced mode

Physical inspection of the clustron card will show that all 3 lights are dark.  If you see these errors you will need to proceed with the bios update procedure described here.  If you do not see these messages, your system is not experiencing the IO address allocation problem and does not need this bios update.

Root Cause

These messages are telling you that the clustron card is not able to initialize it's two serial interfaces because they were not assigned IO address space by the bios at boot time.

The fix for this issue (BIOS43) adds a option to the bios to disable IO Address allocation for each individual PCIe slot.  Only certain cards require this functionality, and disabling the IO Address allocation for cards that do not require it helps to conserve the resource.

Corrective Action

Workaround:

Remove extra NICs or SAS HBA cards until the cluster interconnect card initializes properly.

Resolution:

Upon failure perform the bios update outlined below.

What you'll need in order to perform the bios update:

  - The service processors for both cluster heads must be configured on the network
  - Access to the network connected to the service processors (SPs)
  - Access to a client machine with an up-to-date browser+java version
  - You will need the ILOM package from the below (Internal Only) URL;
  http://nsgrelease.sfbay/doradotucana/releases/G12N-SW2.1.2A-rc2/sp_firmware/r42603_47096/

  - right-click and save the file named:  ILOM-2_0_2_5_r42603-Sun_Fire_X4140_X4240_X4440.pkg

Bios Update Procedure:

  0. Power down both head_A and head_B by issuing a 'stop /SYS' command at each
    service processor.  Confirm the host power is off by checking the output of
    'show /SYS' at the SP.
  
 1. Point your client browser to head_A service processor (https://head_a-sp)
  
    Enter root/appliance password to login.
    Navigate to Maintenance -> Enter firmware upgrade mode
    Supply the ILOM package file and allow it to upload.
    choose "preserve SP settings"
    Proceed with the update.

    Warning! Do not interrupt the update.  Leave the browser undisturbed until the
             update is complete.

    SP will reboot at the end of the update.  You connect via ssh to the SP as soon
    as it's back online.
  
 2. As part of the upgrade, previous BIOS settings are cleared.  Certain BIOS settings
    must be changed at this point.  Failure to login to the BIOS menu and change the
    settings described below will result in one or more failure modes including a
    system hang at boot.  Log back into the SP as root and start the host.  Be ready
    to enter BIOS setup right away (Esc+2 when connected to the console via SP).
  
    myclient$ ssh -l root appliance-sp
    Password:
  
    Sun(TM) Integrated Lights Out Manager
  
    Version 2.0.2.5
  
    Copyright 2007 Sun Microsystems, Inc. All rights reserved.
    Use is subject to license terms.
  
    -> start SYS
    Are you sure you want to start /SYS (y/n)? y
    Starting /SYS
  
    -> start SP/console
    Are you sure you want to start /SP/console (y/n)? y
  
    <Esc+2>
  
    Once in the BIOS menu - arrow over to the PCIPnP tab.  All of the 6 "Scanning OPROM
    on PCI-E SlotN" settings should be set to "Disabled".
  
     * Scanning OPROM on PCI-E Slot0  [Disabled]
     * Scanning OPROM on PCI-E Slot1  [Disabled]
     * Scanning OPROM on PCI-E Slot2  [Disabled]
     * Scanning OPROM on PCI-E Slot3  [Disabled]
     * Scanning OPROM on PCI-E Slot4  [Disabled]
     * Scanning OPROM on PCI-E Slot5  [Disabled]
  
    Just below the OPROM settings are a new group of settings (new in BIOS_43) which allow
    IO allocation to be disabled per-slot.  PCI-E Slots 4 and 5 must be set to "Enabled."
  
     IO Allocation on PCI-E Slot0   [Disabled]  <<<<< Slot 0 must be Disabled
     IO Allocation on PCI-E Slot1   [Disabled]  <<<<< Slot 1 must be Disabled
     IO Allocation on PCI-E Slot2   [Disabled]  <<<<< Slot 2 must be Disabled
     IO Allocation on PCI-E Slot3   [Disabled]  <<<<< Slot 3 must be Disabled
     IO Allocation on PCI-E Slot4   [Enabled]  <<<<< Slot 4 must be Enabled
     IO Allocation on PCI-E Slot5   [Enabled]  <<<<< Slot 5 must be Enabled
  
    Next, arrow over to the Boot tab.  Select the last item: "Hard Disk Drives".  The list
    should include only 2 drives (the 2 internal sata drives) with labels like:

     [SATA:11M-<drive model>]
     [SATA:12M-<drive model>]
  
    If this list includes anything else (such as readzilla devices with a 'STEC MACH8'
    string, or JBOD attached drives) you'll need to remove the from the list by selecting
    the boot position and setting it to 'Disabled' for each of non-boot drives.
  
    If the list is full (with 16 drives) you will not be able to edit the list.  However,
    the change to the OPROM settings above will cause the JBOD drives to disappear from
    the list on the next boot.  You will need to exit and save changes and immediately
    re-enter the BIOS menu on the next boot <Esc+2>.
  
    Once you've removed any readzilla or jbod drive entries from the "Hard Disk Drives"
    list, save changes and exit the bios.  The head can now be fully booted.
  
    You will see the following warning message from the BIOS:

      Warning: IO resource not allocated

    This is an expected message and does not indicate a failure.
  
 3. Allow the first head to boot and confirm that the logs from the most recent boot
    do not contain clustron UART warning messages.
  
 4. Power down head_A (it must remain powered off throughout the upgrade process for head_B).
  
 5. Repeat steps 1-4 to upgrade head_B.
  
    Once head_B is upgraded, power up head_A and resume cluster operation, or begin
    cluster setup if not yet configured.

Identification of Affected Parts (how to):

Cluster interconnect cards (Sun P/N 371-3024) normally light up each of the 3 port lights.  The failure mode described causes the card to appear dark (no lights are lit).  Once the upgrade is applied, the cluster card should initialize properly and the lights should turn on at the next boot.

Note to Authorized Service Partners:

Sun Authorized Service Partners may contact Sun Services or their Sun Services Representative to receive FAB related information.

References:

    BugID: 6805681
    Escalation ID: 70667988


For information about FAB documents, its release processes, implementation strategies and billing information, go to the following URL:

In addition to the above you may email:



Internal Contributor/submitter
Michael.Harsch@Sun.COM, Clifford.Thomas@Sun.COM

Internal Eng Responsible Engineer
Will.Harper@Sun.COM Responsible Manager: Renee.Bennett@Sun.COM

Internal Services Knowledge Engineer
Joe.Davis@Sun.COM

Internal Eng Business Unit Group
NWS (Storage)

Internal Sun Alert & FAB Admin Info
12-Mar-2009: Completed draft and sent to Extended Review.
16-Mar-2009: Addressed issues from Ext Rvw - sending to Publish.


Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback