Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-77-1220873.1
Update Date:2011-06-08
Keywords:

Solution Type  Sun Alert Sure

Solution  1220873.1 :   A Misconfigured Gateway (Network Route) May Corrupt the XSCF Database on Sun SPARC Enterprise M8000/M9000 Servers Running XCP Firmware 1092 or 1093  


Related Items
  • Sun SPARC Enterprise M9000-64 Server
  •  
  • Sun SPARC Enterprise M9000-32 Server
  •  
  • Sun SPARC Enterprise M8000 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Sun Alert>Criteria Category>Availability
  •  
  • GCS>Sun Microsystems>Sun Alert>Release Phase>Resolved
  •  




In this Document
  Description
  Likelihood of Occurrence
  Possible Symptoms
  Workaround or Resolution
  Patches
  Modification History
  References


Applies to:

Sun SPARC Enterprise M8000 Server - Version: Not Applicable and later    [Release: N/A and later]
Sun SPARC Enterprise M9000-32 Server - Version: Not Applicable and later    [Release: N/A and later]
Sun SPARC Enterprise M9000-64 Server - Version: Not Applicable and later    [Release: N/A and later]
Sun SPARC Sun OS
_________________


_________________

Date of Workaround Release: 29-Sep-2010

Date of Resolved Release: 01-Dec-2010
___________________________________

Description


Sun SPARC Enterprise M8000/M9000 Servers with XCP firmware revision 1092 or 1093 will fail to start after an XSCF reboot (applynetwork/rebootxscf) if a gateway is misconfigured, which indicates the gateway IP address is not on the same subnet as the IP address for the LAN interface.

During the XSCF reboot sequence, the XSCF (eXtended System Control Facility) database will become corrupted as a consequence of the misconfigured gateway. There will be no immediate impact to the running domains, but if no XSCF is available, the domains cannot reboot.

The XSCF with the corrupted database is the one with the misconfigured gateway(s). It is possible that the database on either one of the two or both XSCFs are corrupted.

Note: In the unlikely event of having both XSCFs corrupted, please contact Oracle Support to guide you through a procedure to remove the trigger of this software issue.

If both XSCFs have incorrect routes leading to XSCF database corruption, no XSCF will be available and a platform power cycle will be required to recover.

Note: As a precaution, preventive measures outlined in this document should also be taken on other XCP versions, even if an upgrade is not planned.

Likelihood of Occurrence


This issue can occur on the following platforms:
  • Sun SPARC Enterprise M8000/M9000 Servers with XCP 1092 or 1093 firmware
To determine the XCP firmware version on one of these systems, do the following:
XSCF> version -c xcp
XCP 1093 output will appear similar to the following:
XSCF#0 (Active )
XCP0 (Reserve): 1093
XCP1 (Current): 1093
XSCF#1 (Standby)
XCP0 (Reserve): 1093
XCP1 (Current): 1093

Notes:

1. This issue is not applicable to the Sun SPARC Enterprise M3000/M4000/M5000 Servers.

2. A cabling problem will not trigger this issue.

3. The XSCF 'showroute' command cannot be used to determine if the gateway is misconfigured.

4. A system is only vulnerable to this issue if an XSCF is rebooted (applynetwork/rebootxscf) and a gateway is misconfigured. To determine if the gateway is misconfigured, perform the 'applynetwork' procedure as described in the Workaround section.

Possible Symptoms


Should the described issue occur, erroneous routes configured on XCP 1091 and lower will produce errors similar to the following on the XSCF console during the XSCF reset sequence:
[output omitted]
execute S11network.sheth0: PHY is Intel LXT972A (1378e2)
ERROR: failed to configure the routeing.
The network system may not work correctly.
[output omitted]

Workaround or Resolution


Prior to installing the affected firmware revisions 1092 or 1093, it is important that the gateway configuration is verified to be sure to avoid this issue at the next reboot.

Once firmware revision 1092 or 1093 is installed, it is critical that any configuration changes (setroute/setnetwork) are carefully checked to ensure that this issue is not invoked at the next reboot.

It is possible to manually verify the network configuration which will avoid the XSCF database corruption by doing the following:

1. Log into the Active XSCF as a user with 'platadm' privilege

2. Change one of the network settings and use 'applynetwork -n' to see routes defined in the XSCF database via the following command:
XSCF> setnetwork -c down xscf#0-lan#0; setnetwork -c up xscf#0-lan#0; applynetwork -n
verify there are no routes with a gateway not located on the local network (i.e., same subnet) as in the following example:
XSCF> setnetwork -c down xscf#0-lan#0; setnetwork -c up xscf#0-lan#0; applynetwork -n
The following network settings will be applied:
xscf#0 hostname  :m8000-xscf0
xscf#1 hostname  :m8000-xscf1
DNS domain name  :domain.com
nameserver       :10.244.240.150

interface        :xscf#0-lan#0
status           :up
IP address       :10.244.128.27                                <===lan interface IP address (subnet)
netmask          :255.255.255.0                                <===netmask defines subnet range
!!  route        :-n 0.0.0.0 -m 0.0.0.0 -g 10.244.128.1
!!  route        :-n 0.0.0.0 -m 0.0.0.0 -g 128.244.128.1       <===bad gateway is here
In the example above, gateway 128.244.128.1 is not reachable on the xscf#0-lan#0 10.244.128.xxx subnet.

3. Delete the bad routes:
XSCF> setroute -c del -n 0.0.0.0 -m 0.0.0.0 -g 128.244.128.1 xscf#0-lan#0
TIP: copy/paste the route from applynetwork output when constructing 'setroute -c del'

4. Apply the changes with "applynetwork -y", and verify that all routes are reachable on the interface subnet:
XSCF> applynetwork -y
5. Re-verify all routes are reachable on the interface subnet (i.e., gateway is same subnet):
XSCF> rebootxscf -y
Once the configuration is verified correct, you may safely upgrade to XCP 1092 or 1093 or safely proceed with XSCF reset.


This issue is addressed in the following release:
  • XCP 1100 firmware for Sun SPARC Enterprise M8000/M9000 Servers
and can be downloaded from the following site:

http://www.oracle.com/technetwork/server-storage/sun-sparc-enterprise/downloads/index.html

Patches

Internal Section:
Please send technical questions to the following email:
sunalert-tech-questions@sun.com
and copy the Responsible Engineer

It's possible to verify the network configuration by collecting an XSCF
snapshot and then unpacking this snapshot on the OPL Snapshot analysis
Toolset at http://oplpass.us.oracle.com/ The tool has been augmented with
automation to verify the network configuration. The tool will provide
the command(s) to fix the problem when applicable. Note that the snapshot
must be properly unpacked in order to check the configuration.

Once your configuration is verified correct you may safely upgrade to
XCP 1092 or 1093 or safely proceed with XSCF reset.

When the XSCF is unable to start, the following message can be observed
in the XSCF console logs:

execute S60checktestdb[317]: ERR: Database problems detected: (ret=-9010).
Cleaning up
scdb_init_all: -9010, Database verify bad
root: ERROR: Database problems detected: Inconsistency.  Cleaning up
Initiating shutdown
...
XSCF BOOT STOP (recover by NFB-OFF/ON)

Eng Support: There is a special procedure which requires a complete
platform power cycle to remove the condition that triggers this issue.
Replacing XSCF hardware, or any other hardware for that matter,
is not going to solve this issue.
Internal Contributor/Submitter: stephane.dutilleul@oracle.com
Internal Eng Responsible Engineer: alex.aftandilian@oracle.com
Internal Services Knowledge Engineer:david.mariotto@oracle.dom
Internal Eng Business Unit Group: Systems Group - OPL
Internal Escalation ID: 73220322, 73374292, 73361718, 73477978, 73501428,
73531790, 73522408

Modification History

29-Sep-2010: Workaround Release
01-Dec-2010: Republish - Issue is now Resolved

References

SUNBUG:6984765

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback