Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type FAB (standard) Sure Solution 1001038.1 : Executing 'hpost' from the command line can result in a cascading Dstop for Sun Fire 12K/15K domains comprised of split-expanders.
PreviouslyPublishedAs 201365 Product Sun Fire 12K Server Sun Fire 15K Server Impact Executing 'hpost' from the command line can result in a cascading Dstop for Sun Fire 12K/15K domains in split-expander configurations. This could halt active domains. This issue can occur on any Sun Fire 12K/15K system with domains in a split-expander configuration. The failure scenario involves executing 'hpost' from the command line on a split-expander domain, followed by a "setkeyswitch on" for another domain that shares the split-expander, resulting in a POST failure and a cascading Dstop. NOTE: The following example is for illustrative purposes only; the failure is not dependent on Expander locations or DomainIds. The key factors are split-expanders, 'hpost', and 'setkeyswitch on'. Split-Expander Configuration Domain Status Domain A - EX15 *On Domain B EX17 *On Domain C EX14, EX15, EX16, EX17 *Standby mode and running hpost from the command line: hpost -d c l64 Domain D EX16 *setkeyswitch -d d on Domain C is in 'standby' mode and running hpost from the command line. A 'setkeyswitch on' is executed for Domain D, which will generate a flood of POST logs reporting: "Failure creating lockfile /var/opt/SUNWSMS/.lock/hpost.lock.16 ....etc" due to the hpost already running on Domain C. Domain C will fail post during asic_config_darb(): phase for EX16 which is shared by Domain D, followed by EX15, and EX17, resulting in a Dstop for Domain A and Domain B. The POST log contents of the EX16 failure are: Performing ASIC config with bus config a/d/r = 333... Slot0 in domain: 04000 Slot1 in domain: 3C000 EXBs in use: 3BFFF asic_config_axq(): WARNING: Configuring Slot1 of in-use AXQ EX16, but it is not enabled for Slot0 asic_config_sdi(): WARNING: Configuring Slot1 of in-use SDI EX16/S0, but it is not enabled for Slot0 asic_config_darb(): EXB EX16 in use by another domain is not configured in DARB C0 as shared_exp, can't be changed. FAIL EXB EX16: PORT 16 - DARB configuration failure POST log contents of EX15 and EX17 failures: asic_config_axq(): Status of AXQ EX15, in use by another domain for Slot SB15 while this code is configuring Slot IO15, is detected to have an expander-global Dstop. Policy is to FAIL this exp for this domain. Err2[25]: D 1E AMX 0-3 hio flow control didn't arrive simultaneously Domain D will successfully complete POST upon the release of the hpost lock. A 'setkeyswitch' calls enqueueHpost() when it interacts with an expander. This is the "Waiting on exclusive access to EXB(s):"messages seen during 'setkeyswitch'. These messages are generated by the Task Management Daemon(tmd) which provides task management services such as scheduling for SMS. The purpose of this service is to reduce the number of conflicts that can arise during concurrent invocations of the hardware tests and configuration software. When a 'setkeyswitch' is executed, it contacts 'tmd' to obtain exclusive access to the expander(s). The 'setkeyswitch' calls a powerOnDomain() routine that does some initial reset/deconfig of the ASICs. Executing 'hpost' from the command line circumvents the Task Management Daemon(tmd), thus not obtaining exclusive access to the expanders it is testing. This will result in the command line executed 'hpost' failing during "asic_config_darb phase EXB EXxx in use by another domain is not configured in DARB C0 as shared_exp, can't be changed". Do not execute 'hpost' from the command line. 'hpost' should only be run from within the "setkeyswitch" operations. See the workaround for this case provided below. Symptoms Resolution 1. Do not execute 'hpost' from the command line for domains in split-expander configurations. 'hpost' should only run within "setkeyswitch" operations. 2. If 'hpost' needs to be manually run in such a configuration to test a FRU, use this procedure: Example: A level 64 hpost is required to test a replacement FRU
Modification History Date: 19-JUL-2005
Date: 14-JUL-2006
Previously Published As 100486 Internal Comments None. Internal Eng Business Unit Group KE Authors Internal Eng Responsible Engineer scott.barnard@sun.com Internal Kasp FAB Legacy ID 100486, I0965-1 (FIN) Internal Sun Alert & FAB Admin Info Critical Category: Significant Change Date: Avoidance: Patch Responsible Manager: null Original Admin Info: null Product_uuid 077fd4c5-df8f-4320-ad69-7d01603a674d|Sun Fire 12K Server 29e4659c-0a18-11d6-9fa1-e67bbc033df8|Sun Fire 15K Server Attachments This solution has no attachment |
||||||||||||
|