Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition | |||
|
|
Solution Type Technical Instruction Sure Solution 1007698.1 : Examining Red Hat Linux kernel state using Sysrq key combinations
PreviouslyPublishedAs 210667 Description This document is a guide to the information gathering features available using the alt-sysrq key combination under Red Hat linux on Sun Fire[TM] X4100, X4200, V20z, V40z, V65x, B100x and B200x platforms. Each option is listed with an example. Steps to Follow The internal state of a kernel based on Unix(R) can provide valuable information on current system state. If a user process, or the kernel, is hanging, then the more information that can be gathered at that point, the greater the chance of a good diagnosis. Under Solaris[TM] ON THE SPARC(R) platform there are well known mechanisms for gathering stack traces, processor states and memory states. Under Linux, this can appear to be more of a black art. This document sets out to document the information that can be captured, hopefully as early as possible, to improve the chances of a good diagnosis. Comparisons with Sun SPARC systems. For a Sun system, the Stop-A key sequence (or send break from a serial console) will drop a system to the ok prompt. From this point, crash dumps can be forced, or register/cpu states can be examined. Under Linux, this ability is integrated in the kernel, and triggered using alt-sysrq key sequences. Enabling Sysrq. The sysrq feature needs to be enabled before it can be used. It is disabled by default on RHEL 3 and 4. To enable the feature, edit /etc/sysctl.conf and set the value below to equal 1 # Controls the System Request debugging functionality of the kernel kernel.sysrq = 1 Forcing sysrq On X4200/X4100 Servers, once connected to the SP console (start /SP/console from ILOM prompt), and then press Esc followed by shift+b to send break, and then press the key corresponding to the sysrq-command to send. On V65x Servers, send a break to the console, and then press the key corresponding to the sysrq-command to send. On V20z and V40z, once connected via the platform console, press ^Ecl0<letter> to send the sysrq-command. On Blades (B100x, B200x) send a break from the SC console, then press the letter corresponding to the sysrq-command in the serial console session to the blade. This letter keystroke needs to be performed within 5 seconds of the break being sent. A ? character will print the menu of available options. List of current (Linux-2.4.21) valid key presses SysRq : HELP : loglevel0-8 reBoot Crash tErm kIll saK showMem Off showPc unRaw Sync showTasks Unmount shoWcpus Note: Although the above menu displays characters in upper-case as the key to selection and are shown below in square brackets. They should be entered as lower-case to the 'sysrq' command, as it does not accept upper-case characters and will display something similar to the above menu above if upper-case is sent. The correct keypress is in the square brackets reBoot ? [B] ? This will reboot the system Crash ? [C] ? This will force panic the system, by defererencing a pointer then reading from that address. If diskdump or netdump are configured (see Technical Instruction 210668) then a crash dump can be forced. va64-v20zc-gmp02 login: [halt sent] SysRq : Crashing the kernel by request Unable to handle kernel NULL pointer dereference at virtual address 0000000000000000 printing rip: ffffffff801f66b0 PML4 8a1c7067 PGD 89f8e067 PMD 0 Oops: 0002 CPU 0 Pid: 0, comm: swapper Not tainted RIP: 0010:[<ffffffff801f66b0>]{sysrq_handle_crash+0} RSP: 0018:ffffffff805e6280 EFLAGS: 00010292 RAX: 000000000000001f RBX: ffffffff80445cd0 RCX: 0000000000000000 RDX: 0000000000000000 RSI: ffffffff80619f18 RDI: 0000000000000063 RBP: 0000000000000000 R08: 000000000000000d R09: 0000000000000000 R10: 0000000000000000 R11: 0000000000000001 R12: 0000000000000063 R13: 0000000000000000 R14: ffffffff80619f18 R15: 0000000000000006 FS: 0000002a969654c0(0000) GS:ffffffff805e1440(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 000000008005003b CR2: 0000000000000000 CR3: 0000000000101000 CR4: 00000000000006e0 Call Trace: [<ffffffff801f6d12>]{__handle_sysrq_nolock+146} [<ffffffff801f6c48>]{handle_sysrq+72} [<ffffffff801eedd5>]{receive_chars+485} [<ffffffff801ef2b6>]{rs_interrupt_single+150} [<ffffffff8011317f>]{handle_IRQ_event+95} [<ffffffff80113422>]{do_IRQ+274} [<ffffffff8010de20>]{default_idle+0} [<ffffffff8010de20>]{default_idle+0} [<ffffffff80110807>]{common_interrupt+95} <EOI> [<ffffffff8011fb45>]{thread_return+0} [<ffffffff8010de3e>]{default_idle+30} [<ffffffff8010de20>]{default_idle+0} [<ffffffff8010dec9>]{cpu_idle+73} <SNIP> CPU frozen: #0#1 CPU#0 is executing diskdump. start dumping tErm - [E] - Send Term (sig 15) to all processes except init kIll - [I] - Send Kill (sig 9) to all processes except init saK - [K] - Kill all processes on currently active virtual console. Should give a login prompt, that is secure (e.g. not a user process trying to look like a login prompt). ShowMem ? [M] - This will dump the following information ? the system will continue running. SysRq : Show Memory Mem-info: Zone:DMA freepages: 0 min: 0 low: 0 high: 0 Zone:Normal freepages:358380 min: 1246 low: 8923 high: 12889 Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Zone:DMA freepages: 2529 min: 0 low: 0 high: 0 Zone:Normal freepages:382475 min: 1278 low: 9149 high: 13212 Zone:HighMem freepages: 0 min: 0 low: 0 high: 0 Free pages: 743384 ( 0 HighMem) ( Active: 28480/8679, inactive_laundry: 2665, inactive_clean: 0, free: 743384 ) aa:0 ac:0 id:0 il:0 ic:0 fr:0 aa:676 ac:12917 id:7391 il:2262 ic:0 fr:358381 aa:0 ac:0 id:0 il:0 ic:0 fr:0 aa:0 ac:0 id:0 il:0 ic:0 fr:2529 aa:1446 ac:13441 id:1288 il:403 ic:0 fr:382475 aa:0 ac:0 id:0 il:0 ic:0 fr:0 17981*4kB 51522*8kB 28603*16kB 10636*32kB 2040*64kB 123*128kB 2*256kB 1*512kB 0*1024kB 0*2048kB 1*4096kB = 1433524kB) Swap cache: add 0, delete 0, find 0/0, race 0+0 210925 pages of slabcache 82 pages of kernel stacks 123 lowmem pagetables, 115 highmem pagetables Free swap: 2040244kB 1032047 pages of RAM 746589 free pages 33834 reserved pages 27394 pages shared 0 pages swap cached Buffer memory: 74448kB Cache memory: 76640kB CLEAN: 3301 buffers, 13183 kbyte, 67 used (last=3301), 0 locked, 0 dirty 0 delay Red Hat Enterprise Linux AS release 3 (Taroon Update 4) Kernel 2.4.21-27.ELsmp on an x86_64 Off - [O] - Turn the system off (if supported by hardware) showPc ? [P] (example from i386 Xeon) - shows register state (program counter) SysRq : Show Regs Pid/TGid: 0/0, comm: swapper EIP: 0060:[<c0109129>] CPU: 3 EIP is at default_idle [kernel] 0x29 (2.4.21-27.ELsmp) ESP: 080b:c01091c2 EFLAGS: 00000246 Not tainted EAX: 00000000 EBX: c0109100 ECX: c043c680 EDX: c4956000 ESI: c4956000 EDI: c4956000 EBP: c0109100 DS: 0068 ES: 0068 FS: 0000 GS: 0000 CR0: 8005003b CR2: b75f7000 CR3: 062e1f40 CR4: 000006f0 Call Trace: [<c01091c2>] cpu_idle [kernel] 0x42 (0xc4957fb0) [<c01295e3>] printk [kernel] 0x153 (0xc4957fcc) showTasks ? [T] - shows all tasks running with stack traces SysRq : Show State free sibling task PC stack pid father child younger older init S 00000002 2604 1 0 6 2 (NOTLB) Call Trace: [<c0123f14>] schedule [kernel] 0x2f4 (0xc61f1ea0) [<c0134f65>] schedule_timeout [kernel] 0x65 (0xc61f1ee4) [<c015910c>] __get_free_pages [kernel] 0x1c (0xc61f1eec) [<c0179071>] __pollwait [kernel] 0x31 (0xc61f1ef0) [<c0134ef0>] process_timeout [kernel] 0x0 (0xc61f1f04) [<c017933b>] do_select [kernel] 0x13b (0xc61f1f1c) [<c01797de>] sys_select [kernel] 0x34e (0xc61f1f60) migration/0 S 00000000 5500 2 0 3 1 (L-TLB) Call Trace: [<c0123f14>] schedule [kernel] 0x2f4 (0xc4955f68) [<c01258f0>] migration_task [kernel] 0x0 (0xc4955f9c) [<c0125bfb>] migration_task [kernel] 0x30b (0xc4955fac) [<c01258f0>] migration_task [kernel] 0x0 (0xc4955fc4) [<c01258f0>] migration_task [kernel] 0x0 (0xc4955fe0) [<c01095ad>] kernel_thread_helper [kernel] 0x5 (0xc4955ff0) <SNIP> Contains full stack for every process on the system, and lists what each cpu is running unRaw - [R] - Forces raw terminal mode Sync - [S] - syncs all mounted file systems, flushes all pending writes Unmount - [U] - Syncs, unmounts and then remounts all filesystems as read only. shoWcpus ? [W] (example from dual proc, HT enabled Xeon) SysRq : Show CPUs CPU2: c63f5e74 00000002 c01cea1f 00000000 c03b2d34 00000077 00000006 c01cecaa 00000077 c63f5f7c 00000000 00000000 00000000 00000000 c63f5f7c c01cec0d 00000077 c63f5f7c 00000000 00000000 f66d6000 c03ad438 c63f5f1c f7ee1d80 Call Trace: [<c01cea1f>] sysrq_handle_showcpus [kernel] 0xf (0xc63f5e7c) [<c01cecaa>] __handle_sysrq_nolock [kernel] 0x7a (0xc63f5e90) [<c01cec0d>] handle_sysrq [kernel] 0x5d (0xc63f5eb0) [<c01c5f06>] receive_chars [kernel] 0x1d6 (0xc63f5ed4) [<c0134933>] update_process_time_intertick [kernel] 0x53 (0xc63f5ef0) [<c01c64ca>] rs_interrupt_single [kernel] 0x12a (0xc63f5f04) [<c010dd39>] handle_IRQ_event [kernel] 0x69 (0xc63f5f30) [<c010df79>] do_IRQ [kernel] 0xb9 (0xc63f5f50) [<c010dec0>] do_IRQ [kernel] 0x0 (0xc63f5f74) [<c0109100>] default_idle [kernel] 0x0 (0xc63f5f7c) [<c0109100>] default_idle [kernel] 0x0 (0xc63f5f90) [<c0109129>] default_idle [kernel] 0x29 (0xc63f5fa4) [<c01091c2>] cpu_idle [kernel] 0x42 (0xc63f5fb0) [<c01295e3>] printk [kernel] 0x153 (0xc63f5fcc) CPU3: c4957f64 00000003 c011c91f 00000000 00001f7c c03f2caa c0109100 00000000 c4956000 c4956000 c4956000 c0109100 00000000 00000068 00000068 fffffffb c0109129 00000060 00000246 c01091c2 0702080b 00000000 00000000 00000000 Call Trace: [<c011c91f>] smp_call_function_interrupt [kernel] 0x2f (0xc4957f6c) [<c0109100>] default_idle [kernel] 0x0 (0xc4957f7c) [<c0109100>] default_idle [kernel] 0x0 (0xc4957f90) [<c0109129>] default_idle [kernel] 0x29 (0xc4957fa4) [<c01091c2>] cpu_idle [kernel] 0x42 (0xc4957fb0) [<c01295e3>] printk [kernel] 0x153 (0xc4957fcc) CPU0: c03f1f88 00000000 c011c91f 00000000 00001fa0 c03f2caa c0109100 c043b280 c03f0000 c03f0000 c03f0000 c0109100 00000000 00000068 00000068 fffffffb c0109129 00000060 00000246 c01091c2 0002080b 00099800 c0107000 0008e000 Call Trace: [<c011c91f>] smp_call_function_interrupt [kernel] 0x2f (0xc03f1f90) [<c0109100>] default_idle [kernel] 0x0 (0xc03f1fa0) [<c0109100>] default_idle [kernel] 0x0 (0xc03f1fb4) [<c0109129>] default_idle [kernel] 0x29 (0xc03f1fc8) [<c01091c2>] cpu_idle [kernel] 0x42 (0xc03f1fd4) [<c0107000>] stext [kernel] 0x0 (0xc03f1fe0) CPU1: c63f7f64 00000001 c011c91f 00000000 00001f7c c03f2caa c0109100 c043b280 c63f6000 c63f6000 c63f6000 c0109100 00000000 00000068 00000068 fffffffb c0109129 00000060 00000246 c01091c2 0102080b 00000000 00000000 00000000 Call Trace: [<c011c91f>] smp_call_function_interrupt [kernel] 0x2f (0xc63f7f6c) [<c0109100>] default_idle [kernel] 0x0 (0xc63f7f7c) [<c0109100>] default_idle [kernel] 0x0 (0xc63f7f90) [<c0109129>] default_idle [kernel] 0x29 (0xc63f7fa4) [<c01091c2>] cpu_idle [kernel] 0x42 (0xc63f7fb0) [<c01292b3>] call_console_drivers [kernel] 0x63 (0xc63f7fc4) [<c01295e3>] printk [kernel] 0x153 (0xc63f7ffc) Product Sun Fire X4200 Server Hang After Boot, sysrq, hang, panic, netdump, diskdump Previously Published As 80735 Change History Date: 2008-01-08 User Name: 95826 Action: Approved Comment: publishing to allow IBIS migration Version: 10 Date: 2008-01-08 User Name: 95826 Action: Accept Comment: Version: 0 Attachments This solution has no attachment |
||||||||||||
|