Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type FAB (standard) Sure Solution 1020255.1 : Sun Storage 7410 upgrade procedure for BIOS43 to fix cluster card resource issue.
PreviouslyPublishedAs 254928 Bug Id <SUNBUG: 6805681> Date of Resolved Release 16-Mar-2009 Product Sun Storage 7410 Unified Storage System 7410 upgrade procedure for BIOS43 (see details below). Affected Parts: 375-3487-xx PCI Express 8-Port Host Adapter 375-3481-xx PCI Express Quad Gigabit Ethernet UTP 371-0905-xx PCI Express Dual Gigabit Ethernet UTP 371-0904-xx PCI Express Dual Gigabit Ethernet MMF 375-3357-xx Dual-Channel Ultra320 LVD SCSI PCI Express Adapter 371-3024-xx Cluster Heartbeat Card ImpactCertain combinations of PCIe cards installed in the Sun Storage 7410 can cause the Cluster Heartbeat (interconnect) Card (Sun p/n 371-3024) to fail to initialize properly. This results in the cluster feature becoming unusable or unconfigurable. If the cluster is already configured and additional cards are added to trigger this problem, the clustering software will completely fail. If the system is being setup for the first time with a failing card/slot combination, cluster will not be configurable during or after the setup phase.Contributing FactorsThe failure mode described here depends on the card combination installed in the PCIe slots. The following configurations have been confirmed to suffer from the failure mode described by CR 6805681 and can be fixed by applying the BIOS update described here;Slot0: SAS HBA Slot1: SAS HBA Slot2: Empty Slot3: Quad-Gig NIC Slot4: SAS HBA Slot5: clustron and Slot0: SAS HBA Slot1: SAS HBA Slot2: Quad-Gig NIC Slot3: Empty Slot4: Dual-Gig NIC Slot5: clustron Other configurations may be susceptible to this problem. Any configuration with at least one Quad-Gig NIC plus additional cards (beyond base-config) is at high risk. SymptomsMessage from BIOS when system initially boots:Warning: Not enough IO address space for device The telltale sign of this failure mode are the following messages in the system log: clustron: [ID 820254 kern.warning] WARNING: clustron0/1: UART scratch register does not work: expected 0x5a, got 0x00 clustron: [ID 820254 kern.warning] WARNING: clustron0/1: UART scratch register does not work: expected 0x5a, got 0x00 clustron: [ID 147989 kern.warning] WARNING: clustron0/0: UART has no 16C650 enhanced mode clustron: [ID 147989 kern.warning] WARNING: clustron0/0: UART has no 16C650 enhanced mode Physical inspection of the clustron card will show that all 3 lights are dark. If you see these errors you will need to proceed with the bios update procedure described here. If you do not see these messages, your system is not experiencing the IO address allocation problem and does not need this bios update. Root CauseThese messages are telling you that the clustron card is not able to initialize it's two serial interfaces because they were not assigned IO address space by the bios at boot time.The fix for this issue (BIOS43) adds a option to the bios to disable IO Address allocation for each individual PCIe slot. Only certain cards require this functionality, and disabling the IO Address allocation for cards that do not require it helps to conserve the resource. Corrective ActionWorkaround:Remove extra NICs or SAS HBA cards until the cluster interconnect card initializes properly. Resolution: Upon failure perform the bios update outlined below. What you'll need in order to perform the bios update: - The service processors for both cluster heads must be configured on the network - Access to the network connected to the service processors (SPs) - Access to a client machine with an up-to-date browser+java version - You will need the ILOM package from the below (Internal Only) URL; http://nsgrelease.sfbay/doradotucana/releases/G12N-SW2.1.2A-rc2/sp_firmware/r42603_47096/ - right-click and save the file named: ILOM-2_0_2_5_r42603-Sun_Fire_X4140_X4240_X4440.pkg Bios Update Procedure: 0. Power down both head_A and head_B by issuing a 'stop /SYS' command at each service processor. Confirm the host power is off by checking the output of 'show /SYS' at the SP. 1. Point your client browser to head_A service processor (https://head_a-sp) Enter root/appliance password to login. Navigate to Maintenance -> Enter firmware upgrade mode Supply the ILOM package file and allow it to upload. choose "preserve SP settings" Proceed with the update. Warning! Do not interrupt the update. Leave the browser undisturbed until the update is complete. SP will reboot at the end of the update. You connect via ssh to the SP as soon as it's back online. 2. As part of the upgrade, previous BIOS settings are cleared. Certain BIOS settings must be changed at this point. Failure to login to the BIOS menu and change the settings described below will result in one or more failure modes including a system hang at boot. Log back into the SP as root and start the host. Be ready to enter BIOS setup right away (Esc+2 when connected to the console via SP). myclient$ ssh -l root appliance-sp Password: Sun(TM) Integrated Lights Out Manager Version 2.0.2.5 Copyright 2007 Sun Microsystems, Inc. All rights reserved. Use is subject to license terms. -> start SYS Are you sure you want to start /SYS (y/n)? y Starting /SYS -> start SP/console Are you sure you want to start /SP/console (y/n)? y <Esc+2> Once in the BIOS menu - arrow over to the PCIPnP tab. All of the 6 "Scanning OPROM on PCI-E SlotN" settings should be set to "Disabled". * Scanning OPROM on PCI-E Slot0 [Disabled] * Scanning OPROM on PCI-E Slot1 [Disabled] * Scanning OPROM on PCI-E Slot2 [Disabled] * Scanning OPROM on PCI-E Slot3 [Disabled] * Scanning OPROM on PCI-E Slot4 [Disabled] * Scanning OPROM on PCI-E Slot5 [Disabled] Just below the OPROM settings are a new group of settings (new in BIOS_43) which allow IO allocation to be disabled per-slot. PCI-E Slots 4 and 5 must be set to "Enabled." IO Allocation on PCI-E Slot0 [Disabled] <<<<< Slot 0 must be Disabled IO Allocation on PCI-E Slot1 [Disabled] <<<<< Slot 1 must be Disabled IO Allocation on PCI-E Slot2 [Disabled] <<<<< Slot 2 must be Disabled IO Allocation on PCI-E Slot3 [Disabled] <<<<< Slot 3 must be Disabled IO Allocation on PCI-E Slot4 [Enabled] <<<<< Slot 4 must be Enabled IO Allocation on PCI-E Slot5 [Enabled] <<<<< Slot 5 must be Enabled Next, arrow over to the Boot tab. Select the last item: "Hard Disk Drives". The list should include only 2 drives (the 2 internal sata drives) with labels like: [SATA:11M-<drive model>] [SATA:12M-<drive model>] If this list includes anything else (such as readzilla devices with a 'STEC MACH8' string, or JBOD attached drives) you'll need to remove the from the list by selecting the boot position and setting it to 'Disabled' for each of non-boot drives. If the list is full (with 16 drives) you will not be able to edit the list. However, the change to the OPROM settings above will cause the JBOD drives to disappear from the list on the next boot. You will need to exit and save changes and immediately re-enter the BIOS menu on the next boot <Esc+2>. Once you've removed any readzilla or jbod drive entries from the "Hard Disk Drives" list, save changes and exit the bios. The head can now be fully booted. You will see the following warning message from the BIOS: Warning: IO resource not allocated This is an expected message and does not indicate a failure. 3. Allow the first head to boot and confirm that the logs from the most recent boot do not contain clustron UART warning messages. 4. Power down head_A (it must remain powered off throughout the upgrade process for head_B). 5. Repeat steps 1-4 to upgrade head_B. Once head_B is upgraded, power up head_A and resume cluster operation, or begin cluster setup if not yet configured. Identification of Affected Parts (how to): Cluster interconnect cards (Sun P/N 371-3024) normally light up each of the 3 port lights. The failure mode described causes the card to appear dark (no lights are lit). Once the upgrade is applied, the cluster card should initialize properly and the lights should turn on at the next boot. Note to Authorized Service Partners: Sun Authorized Service Partners may contact Sun Services or their Sun Services Representative to receive FAB related information. References: BugID: 6805681 Escalation ID: 70667988 For information about FAB documents, its release processes, implementation strategies and billing information, go to the following URL: In addition to the above you may email: Internal Contributor/submitter Michael.Harsch@Sun.COM, Clifford.Thomas@Sun.COM Internal Eng Responsible Engineer Will.Harper@Sun.COM Responsible Manager: Renee.Bennett@Sun.COM Internal Services Knowledge Engineer Joe.Davis@Sun.COM Internal Eng Business Unit Group NWS (Storage) Internal Sun Alert & FAB Admin Info 12-Mar-2009: Completed draft and sent to Extended Review. 16-Mar-2009: Addressed issues from Ext Rvw - sending to Publish. Attachments This solution has no attachment |
||||||||||||
|