Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-73-1001038.1
Update Date:2010-08-26
Keywords:

Solution Type  FAB (standard) Sure

Solution  1001038.1 :   Executing 'hpost' from the command line can result in a cascading Dstop for Sun Fire 12K/15K domains comprised of split-expanders.  


Related Items
  • Sun Fire 12K Server
  •  
  • Sun Fire 15K Server
  •  
Related Categories
  • GCS>Sun Microsystems>Sun FAB>Standard>Reactive
  •  

PreviouslyPublishedAs
201365


Product
Sun Fire 12K Server
Sun Fire 15K Server


Impact

Executing 'hpost' from the command line can result in a cascading Dstop for Sun Fire 12K/15K domains in split-expander configurations. This could halt active domains.

This issue can occur on any Sun Fire 12K/15K system with domains in a split-expander configuration. The failure scenario involves executing 'hpost' from the command line on a split-expander domain, followed by a "setkeyswitch on" for another domain that shares the split-expander, resulting in a POST failure and a cascading Dstop.

NOTE: The following example is for illustrative purposes only; the failure is not dependent on Expander locations or DomainIds. The key factors are split-expanders, 'hpost', and 'setkeyswitch on'.

    Split-Expander Configuration                         Domain Status
 Domain A - EX15                                        *On
 Domain B   EX17                                        *On
    Domain C EX14, EX15, EX16, EX17                        *Standby mode and running hpost from the command line: hpost -d c l64
 Domain D   EX16                                        *setkeyswitch -d d on

Domain C is in 'standby' mode and running hpost from the command line.  A 'setkeyswitch on' is executed for Domain D, which will generate a flood of POST logs reporting:

"Failure creating lockfile /var/opt/SUNWSMS/.lock/hpost.lock.16 ....etc" due to the hpost already running on Domain C.

Domain C will fail post during asic_config_darb(): phase for EX16 which is shared by Domain D, followed by EX15, and EX17, resulting in a Dstop for Domain A and Domain B.

The POST log contents of the EX16 failure are:

Performing ASIC config with bus config a/d/r = 333...
Slot0 in domain: 04000
Slot1 in domain: 3C000
EXBs in use: 3BFFF
  asic_config_axq(): WARNING: Configuring Slot1 of in-use AXQ EX16,
but it is not enabled for Slot0
  asic_config_sdi(): WARNING: Configuring Slot1 of in-use SDI EX16/S0,
but it is not enabled for Slot0
  asic_config_darb(): EXB EX16 in use by another domain is not configured
in DARB C0 as shared_exp, can't be changed.
  FAIL EXB EX16: PORT 16 - DARB configuration failure

POST log contents of EX15 and EX17 failures:

      asic_config_axq(): Status of AXQ EX15, in use by another domain for
 Slot SB15 while this code is configuring Slot IO15, is detected to
        have an expander-global Dstop.
        Policy is to FAIL this exp for this domain.
 Err2[25]: D 1E AMX 0-3 hio flow control didn't arrive simultaneously

Domain D will successfully complete POST upon the release of the hpost lock.

A 'setkeyswitch' calls enqueueHpost() when it interacts with an expander. This is the "Waiting on exclusive access to EXB(s):"messages seen during 'setkeyswitch'. These messages are generated by the Task Management Daemon(tmd) which provides task management services such as scheduling for SMS. The purpose of this service is to reduce the number of conflicts that can arise during concurrent invocations of the hardware tests and configuration software. When a 'setkeyswitch' is executed, it contacts 'tmd' to obtain exclusive access to the expander(s). The 'setkeyswitch' calls a powerOnDomain() routine that does some initial reset/deconfig of the ASICs. Executing 'hpost' from the command line circumvents the Task Management Daemon(tmd), thus not obtaining exclusive access to the expanders it is testing. This will result in the command line executed 'hpost' failing during "asic_config_darb phase EXB EXxx in use by another domain is not configured in DARB C0 as shared_exp, can't be changed".

Do not execute 'hpost' from the command line. 'hpost' should only be run from within the "setkeyswitch" operations. See the workaround for this case provided below.


Symptoms

Resolution

1. Do not execute 'hpost' from the command line for domains in split-expander configurations. 'hpost' should only run within "setkeyswitch" operations.

2. If 'hpost' needs to be manually run in such a configuration to test a FRU, use this procedure:

Example: A level 64 hpost is required to test a replacement FRU

  • Create a test domain utilizing an available DomainId and add the FRU  to be tested.  Use the 'addboard' command.
  • Create a Domain specific .postrc file with the directive "level 64" and place in /etc/opt/SUNWSMS/config/[A-R]/.postrc. The .postrc file should contain the line: level 64
  • The 'setkeyswitch -d [domainID] on' command will read the contents of the .postrc file and execute 'hpost -l64'.
  • When 'hpost' is completed, delete the FRU from the test domain, add the FRU to desired location.  Use the 'deleteboard' and 'addboard' commands.  Remove /etc/opt/SUNWSMS/config/[A-R]/.postrc.

 


Modification History
Date: 19-JUL-2005
  • Updated Corrective Action to say "level 64" instead of "level_64"

Date: 14-JUL-2006
  • Updated doc to remove a sentence about running hpost from the command line for testing FRUs.


Previously Published As
100486
Internal Comments


None.


Internal Eng Business Unit Group
KE Authors

Internal Eng Responsible Engineer
scott.barnard@sun.com

Internal Kasp FAB Legacy ID
100486, I0965-1 (FIN)

Internal Sun Alert & FAB Admin Info
Critical Category:
Significant Change Date:
Avoidance: Patch
Responsible Manager: null
Original Admin Info: null

Product_uuid
077fd4c5-df8f-4320-ad69-7d01603a674d|Sun Fire 12K Server
29e4659c-0a18-11d6-9fa1-e67bbc033df8|Sun Fire 15K Server

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback