Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1007698.1
Update Date:2009-03-18
Keywords:

Solution Type  Technical Instruction Sure

Solution  1007698.1 :   Examining Red Hat Linux kernel state using Sysrq key combinations  


Related Items
  • Sun Fire X4200 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>x64 Servers
  •  

PreviouslyPublishedAs
210667


Description
This document is a guide to the information gathering features available using the alt-sysrq key combination under Red Hat linux on Sun Fire[TM] X4100, X4200, V20z, V40z, V65x, B100x and B200x platforms.
Each option is listed with an example.


Steps to Follow
The internal state of a kernel based on Unix(R) can provide valuable information on current system state. If a user process, or the kernel, is hanging, then the more information that can be gathered at that point, the greater the chance of a good diagnosis.

Under Solaris[TM] ON THE SPARC(R) platform there are well known mechanisms for gathering stack traces, processor states and memory states. Under Linux, this can appear to be more of a black art.

This document sets out to document the information that can be captured, hopefully as early as possible, to improve the chances of a good diagnosis.

Comparisons with Sun SPARC systems.

For a Sun system, the Stop-A key sequence (or send break from a serial console) will drop a system to the ok prompt. From this point, crash dumps can be forced, or register/cpu states can be examined.

Under Linux, this ability is integrated in the kernel, and triggered using alt-sysrq key sequences.

Enabling Sysrq.

The sysrq feature needs to be enabled before it can be used. It is disabled by default on RHEL 3 and 4.

To enable the feature, edit /etc/sysctl.conf and set the value below to equal 1

 # Controls the System Request debugging functionality of the kernel
kernel.sysrq = 1

Forcing sysrq

On X4200/X4100 Servers, once connected to the SP console (start /SP/console from ILOM prompt), and then press Esc followed by shift+b to send break, and then press the key corresponding to the sysrq-command to send.

On V65x Servers, send a break to the console, and then press the key corresponding to the sysrq-command to send.

On V20z and V40z, once connected via the platform console, press ^Ecl0<letter> to send the sysrq-command.

On Blades (B100x, B200x) send a break from the SC console, then press the letter corresponding to the sysrq-command in the serial console session to the blade.

This letter keystroke needs to be performed within 5 seconds of the break being sent. A ? character will print the menu of available options.

List of current (Linux-2.4.21) valid key presses

SysRq : HELP : loglevel0-8 reBoot Crash tErm kIll saK showMem Off showPc unRaw Sync showTasks Unmount shoWcpus

Note: Although the above menu displays characters in upper-case as the key to selection and are shown below in square brackets. They should be entered as lower-case to the 'sysrq' command, as it does not accept upper-case characters and will display something similar to the above menu above if upper-case is sent.

The correct keypress is in the square brackets

reBoot ? [B] ? This will reboot the system

Crash ? [C] ? This will force panic the system, by defererencing a pointer then reading from that address.

If diskdump or netdump are configured (see Technical Instruction 210668) then a crash dump can be forced.

 va64-v20zc-gmp02 login: [halt sent]
SysRq : Crashing the kernel by request
Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000
printing rip:
ffffffff801f66b0
PML4 8a1c7067 PGD 89f8e067 PMD 0
Oops: 0002
CPU 0
Pid: 0, comm: swapper Not tainted
RIP: 0010:[<ffffffff801f66b0>]{sysrq_handle_crash+0}
RSP: 0018:ffffffff805e6280  EFLAGS: 00010292
RAX: 000000000000001f RBX: ffffffff80445cd0 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff80619f18 RDI: 0000000000000063
RBP: 0000000000000000 R08: 000000000000000d R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000063
R13: 0000000000000000 R14: ffffffff80619f18 R15: 0000000000000006
FS:  0000002a969654c0(0000) GS:ffffffff805e1440(0000) knlGS:0000000000000000
CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0
 Call Trace: [<ffffffff801f6d12>]{__handle_sysrq_nolock+146}
[<ffffffff801f6c48>]{handle_sysrq+72} [<ffffffff801eedd5>]{receive_chars+485}
[<ffffffff801ef2b6>]{rs_interrupt_single+150} [<ffffffff8011317f>]{handle_IRQ_event+95}
[<ffffffff80113422>]{do_IRQ+274} [<ffffffff8010de20>]{default_idle+0}
[<ffffffff8010de20>]{default_idle+0} [<ffffffff80110807>]{common_interrupt+95}
<EOI> [<ffffffff8011fb45>]{thread_return+0} [<ffffffff8010de3e>]{default_idle+30}
[<ffffffff8010de20>]{default_idle+0} [<ffffffff8010dec9>]{cpu_idle+73} 
 <SNIP>
 CPU frozen: #0#1
CPU#0 is executing diskdump.
start dumping

tErm - [E] - Send Term (sig 15) to all processes except init

kIll - [I] - Send Kill (sig 9) to all processes except init

saK - [K] - Kill all processes on currently active virtual console. Should give a login prompt, that is secure (e.g. not a user process trying to look like a login prompt).

ShowMem ? [M] - This will dump the following information ? the system will continue running.

 SysRq : Show Memory
 Mem-info:
Zone:DMA freepages:     0 min:     0 low:     0 high:     0
Zone:Normal freepages:358380 min:  1246 low:  8923 high: 12889
Zone:HighMem freepages:     0 min:     0 low:     0 high:     0
Zone:DMA freepages:  2529 min:     0 low:     0 high:     0
Zone:Normal freepages:382475 min:  1278 low:  9149 high: 13212
Zone:HighMem freepages:     0 min:     0 low:     0 high:     0
Free pages:      743384 (     0 HighMem)
( Active: 28480/8679, inactive_laundry: 2665, inactive_clean: 0, free: 743384 )
aa:0 ac:0 id:0 il:0 ic:0 fr:0
aa:676 ac:12917 id:7391 il:2262 ic:0 fr:358381
aa:0 ac:0 id:0 il:0 ic:0 fr:0
aa:0 ac:0 id:0 il:0 ic:0 fr:2529
aa:1446 ac:13441 id:1288 il:403 ic:0 fr:382475
aa:0 ac:0 id:0 il:0 ic:0 fr:0
17981*4kB 51522*8kB 28603*16kB 10636*32kB 2040*64kB 123*128kB 2*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 1433524kB)
Swap cache: add 0, delete 0, find 0/0, race 0+0
210925 pages of slabcache
82 pages of kernel stacks
123 lowmem pagetables, 115 highmem pagetables
Free swap:       2040244kB
1032047 pages of RAM
746589 free pages
33834 reserved pages
27394 pages shared
0 pages swap cached
Buffer memory:    74448kB
Cache memory:    76640kB
CLEAN: 3301 buffers, 13183 kbyte, 67 used (last=3301), 0 locked, 0 dirty 0 delay
 Red Hat Enterprise Linux AS release 3 (Taroon Update 4)
Kernel 2.4.21-27.ELsmp on an x86_64

Off - [O] - Turn the system off (if supported by hardware)

showPc ? [P] (example from i386 Xeon) - shows register state (program counter)

 SysRq : Show Regs
 Pid/TGid: 0/0, comm:              swapper
EIP: 0060:[<c0109129>] CPU: 3
EIP is at default_idle [kernel] 0x29 (2.4.21-27.ELsmp)
ESP: 080b:c01091c2 EFLAGS: 00000246    Not tainted
EAX: 00000000 EBX: c0109100 ECX: c043c680 EDX: c4956000
ESI: c4956000 EDI: c4956000 EBP: c0109100 DS: 0068 ES: 0068 FS: 0000 GS: 0000
CR0: 8005003b CR2: b75f7000 CR3: 062e1f40 CR4: 000006f0
Call Trace:   [<c01091c2>] cpu_idle [kernel] 0x42 (0xc4957fb0)
[<c01295e3>] printk [kernel] 0x153 (0xc4957fcc)

showTasks ? [T] - shows all tasks running with stack traces

SysRq : Show State

                          free                        sibling
task             PC    stack   pid father child younger older
init          S 00000002  2604     1      0     6       2       (NOTLB)
Call Trace:   [<c0123f14>] schedule [kernel] 0x2f4 (0xc61f1ea0)
[<c0134f65>] schedule_timeout [kernel] 0x65 (0xc61f1ee4)
[<c015910c>] __get_free_pages [kernel] 0x1c (0xc61f1eec)
[<c0179071>] __pollwait [kernel] 0x31 (0xc61f1ef0)
[<c0134ef0>] process_timeout [kernel] 0x0 (0xc61f1f04)
[<c017933b>] do_select [kernel] 0x13b (0xc61f1f1c)
[<c01797de>] sys_select [kernel] 0x34e (0xc61f1f60)
 migration/0   S 00000000  5500     2      0             3     1 (L-TLB)
Call Trace:   [<c0123f14>] schedule [kernel] 0x2f4 (0xc4955f68)
[<c01258f0>] migration_task [kernel] 0x0 (0xc4955f9c)
[<c0125bfb>] migration_task [kernel] 0x30b (0xc4955fac)
[<c01258f0>] migration_task [kernel] 0x0 (0xc4955fc4)
[<c01258f0>] migration_task [kernel] 0x0 (0xc4955fe0)
[<c01095ad>] kernel_thread_helper [kernel] 0x5 (0xc4955ff0)

<SNIP>

Contains full stack for every process on the system, and lists what each cpu is running

unRaw - [R] - Forces raw terminal mode

Sync - [S] - syncs all mounted file systems, flushes all pending writes

Unmount - [U] - Syncs, unmounts and then remounts all filesystems as read only.

shoWcpus ? [W] (example from dual proc, HT enabled Xeon)

 SysRq : Show CPUs
CPU2:
c63f5e74 00000002 c01cea1f 00000000 c03b2d34 00000077 00000006 c01cecaa
00000077 c63f5f7c 00000000 00000000 00000000 00000000 c63f5f7c c01cec0d
00000077 c63f5f7c 00000000 00000000 f66d6000 c03ad438 c63f5f1c f7ee1d80
Call Trace:   [<c01cea1f>] sysrq_handle_showcpus [kernel] 0xf (0xc63f5e7c)
[<c01cecaa>] __handle_sysrq_nolock [kernel] 0x7a (0xc63f5e90)
[<c01cec0d>] handle_sysrq [kernel] 0x5d (0xc63f5eb0)
[<c01c5f06>] receive_chars [kernel] 0x1d6 (0xc63f5ed4)
[<c0134933>] update_process_time_intertick [kernel] 0x53 (0xc63f5ef0)
[<c01c64ca>] rs_interrupt_single [kernel] 0x12a (0xc63f5f04)
[<c010dd39>] handle_IRQ_event [kernel] 0x69 (0xc63f5f30)
[<c010df79>] do_IRQ [kernel] 0xb9 (0xc63f5f50)
[<c010dec0>] do_IRQ [kernel] 0x0 (0xc63f5f74)
[<c0109100>] default_idle [kernel] 0x0 (0xc63f5f7c)
[<c0109100>] default_idle [kernel] 0x0 (0xc63f5f90)
[<c0109129>] default_idle [kernel] 0x29 (0xc63f5fa4)
[<c01091c2>] cpu_idle [kernel] 0x42 (0xc63f5fb0)
[<c01295e3>] printk [kernel] 0x153 (0xc63f5fcc)
 CPU3:
c4957f64 00000003 c011c91f 00000000 00001f7c c03f2caa c0109100 00000000
c4956000 c4956000 c4956000 c0109100 00000000 00000068 00000068 fffffffb
c0109129 00000060 00000246 c01091c2 0702080b 00000000 00000000 00000000
Call Trace:   [<c011c91f>] smp_call_function_interrupt [kernel] 0x2f (0xc4957f6c)
[<c0109100>] default_idle [kernel] 0x0 (0xc4957f7c)
[<c0109100>] default_idle [kernel] 0x0 (0xc4957f90)
[<c0109129>] default_idle [kernel] 0x29 (0xc4957fa4)
[<c01091c2>] cpu_idle [kernel] 0x42 (0xc4957fb0)
[<c01295e3>] printk [kernel] 0x153 (0xc4957fcc)
 CPU0:
c03f1f88 00000000 c011c91f 00000000 00001fa0 c03f2caa c0109100 c043b280
c03f0000 c03f0000 c03f0000 c0109100 00000000 00000068 00000068 fffffffb
c0109129 00000060 00000246 c01091c2 0002080b 00099800 c0107000 0008e000
Call Trace:   [<c011c91f>] smp_call_function_interrupt [kernel] 0x2f (0xc03f1f90)
[<c0109100>] default_idle [kernel] 0x0 (0xc03f1fa0)
[<c0109100>] default_idle [kernel] 0x0 (0xc03f1fb4)
[<c0109129>] default_idle [kernel] 0x29 (0xc03f1fc8)
[<c01091c2>] cpu_idle [kernel] 0x42 (0xc03f1fd4)
[<c0107000>] stext [kernel] 0x0 (0xc03f1fe0)
 CPU1:
c63f7f64 00000001 c011c91f 00000000 00001f7c c03f2caa c0109100 c043b280
c63f6000 c63f6000 c63f6000 c0109100 00000000 00000068 00000068 fffffffb
c0109129 00000060 00000246 c01091c2 0102080b 00000000 00000000 00000000
Call Trace:   [<c011c91f>] smp_call_function_interrupt [kernel] 0x2f (0xc63f7f6c)
[<c0109100>] default_idle [kernel] 0x0 (0xc63f7f7c)
[<c0109100>] default_idle [kernel] 0x0 (0xc63f7f90)
[<c0109129>] default_idle [kernel] 0x29 (0xc63f7fa4)
[<c01091c2>] cpu_idle [kernel] 0x42 (0xc63f7fb0)
[<c01292b3>] call_console_drivers [kernel] 0x63 (0xc63f7fc4)
[<c01295e3>] printk [kernel] 0x153 (0xc63f7ffc)


Product
Sun Fire X4200 Server

Hang After Boot, sysrq, hang, panic, netdump, diskdump
Previously Published As
80735

Change History
Date: 2008-01-08
User Name: 95826
Action: Approved
Comment: publishing to allow IBIS migration
Version: 10
Date: 2008-01-08
User Name: 95826
Action: Accept
Comment:
Version: 0

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback