Sun Microsystems, Inc.  Sun System Handbook - ISO 3.4 June 2011 Internal/Partner Edition
   Home | Current Systems | Former STK Products | EOL Systems | Components | General Info | Search | Feedback

Asset ID: 1-71-1007054.1
Update Date:2011-05-31
Keywords:

Solution Type  Technical Instruction Sure

Solution  1007054.1 :   How to handle Microsoft Windows panics on x64 platforms  


Related Items
  • Sun Ultra 25 Workstation
  •  
  • Sun Blade X6220 Server Module
  •  
  • Sun Blade X8420 Server Module
  •  
  • Sun Fire X4200 M2 Server
  •  
  • Sun Java Workstation W2100z
  •  
  • Sun Ultra 20 Workstation
  •  
  • Sun Blade X6450 Server Module
  •  
  • Sun Fire V65x Server
  •  
  • Sun Fire V60x Server
  •  
  • Sun Fire X4440 Server
  •  
  • Sun Fire V20z Compute Grid Rack System
  •  
  • Sun Ultra 20 M2 Workstation
  •  
  • Sun Fire X2200 M2 Server
  •  
  • Sun Blade X8450 Server Module
  •  
  • Sun Fire X4600 Server
  •  
  • Sun Blade X8440 Server Module
  •  
  • Sun Fire X4100 Server
  •  
  • Sun Fire X4500 Server
  •  
  • Sun Blade X6250 Server Module
  •  
  • Sun Fire V20z Server
  •  
  • Sun Java Workstation W1100z
  •  
  • Sun Ultra 40 Workstation
  •  
  • Sun Blade X6420 Server Module
  •  
  • Sun Fire V40z Server
  •  
  • Sun Fire X4100 M2 Server
  •  
  • Sun Fire X4540 Server
  •  
  • Sun Ultra 40 M2 Workstation
  •  
  • Sun Fire X2100 M2 Server
  •  
  • Sun Fire X2100 Server
  •  
  • Sun Fire X4200 Server
  •  
  • Sun Fire X4600 M2 Server
  •  
Related Categories
  • GCS>Sun Microsystems>Servers>x64 Servers
  •  

PreviouslyPublishedAs
209735


Description
How to handle Microsoft Windows panics on x64 platforms

Summary

This document is about Windows panics/crashes applicable to Windows versions 2k,2k3,XP, NT - it contains the key concepts needed to handle system panics.

NOTE: Core dump analysis is out of scope of this document.

For a detailed checklist on how to identify and manage Windows crashes,
see <Document: 1017889.1> "Analyzing System panics on x64 platforms running Microsoft Windows"


To discuss this information further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Community - Sun x86 Systems

Symptoms

Windows panics, windows crashes, blue screen of death




Steps to Follow
Windows crashes, core dumps, and configuration

Overview

Why does Windows crash?

Windows is capable of running on several x86 platforms.

This HW capability requires protection mechanisms built into the OS that lets multiple programs run simultaneously without corrupting each other data or resources.

For this reason, a process should work only on its own resources.
Because of the need to dialog with HW devices and other OS components, in Windows different privileges levels are defined.

The most important are kernel and user mode levels.
Kernel mode is reserved to key OS components like drivers, and it is the most privileged level.

All the other common applications run in user mode.

Problems can happen when a kernel mode process, due to bugs in the code, interferes with the resources of a process running in the lower user mode, or in another process running in kernel mode. Other causes may be:

  • Exceptions (illegal instructions, illegal addresses

  • Hardware errors

  • Trying to reference paged out memory at interrupt level

  • We can also force a system crash for example to investigate an hang situation.

This kind of crash is recognizable via bugcheck 0xe2:MANUALLY_INITIATED_CRASH)

According to Microsoft more than 70% of panics are caused by third-party drivers.

When corruption happens (and is detected) the OS initiates a protection procedure to stop the entire system and avoid additional corruption.

This procedure is necessary to save data about the fault (crash dump) for further analysis as well.

What happens when Windows crashes?

When a crash condition is detected, the KeBugChechEx function is invoked.
When called, it takes five arguments:

  • A stop code also called bugcheck code

  • Four stop-code defined parameters.

KeBugCheckEx takes care of stopping the CPUs, paints the famous Blue Screen of Death (BsoD) and dumps (if this can be done safely) system kernel/memory status into a dump file.

Examples of BSoD in Windows NT can be found at the link below:
http://www.microsoft.com/technet/archive/winntas/tips/techrep/bsod.mspx?mfr=true

How many kinds of dump files are there ?

Windows Server 2003, 2000 and XP can create three types of dumps:

  • Complete or full dump:

A full dump contains all possible data and executables the memory has. This is equal to the amount of RAM in the box.
Therefore, a machine with 1Gb of RAM creates approximately a 1Gb dump file.
Windows 2000 produces a full dump by default. This file is overwritten each time!

IMPORTANT:
Due to Windows limitation, complete memory dumps are not available on computers that have 2 or more gigabytes of RAM

Usually, a smaller kernel dump is enough.

  • Kernel dump:

This is equal to the amount of RAM occupied by the operating system's kernel. Dimensions of this file may vary.
For most purposes, this crash dump is the most useful. This file is overwritten each time!

  • Small or mini dump:

A mini dump is a 64Kb file for 32bit and 128Kb for 64bit systems. It doesn't contain any of the binary or
executable files that are in memory at the time of a system crash.
Therefore, mini dumps are of limited value but can give an indication to start the analysis.
XP and Server 2003 produce mini dumps by default, one for each crash event, as well as a full dump file.

As already said, the full dump is overwritten every time.

How to check and configure the dump type and location.

Default settings can be changed via:

Control Panel --> (Performance and maintenance) -->System-->Advanced--> Startup and recovery
For XP the default settings are : C:\WINDOWS\MEMORY.DMP and C:\WINDOWS\MINIDUMP\

A complete and detailed description of dump files kinds and options can be found in the following documents:

Can system crash without dumping to a file?

Sometimes the dump can fail even if the setting is correct.
Common reasons are:

  • The paging file is not large enough. This often happens with system with a lot of memory and configured to
    collect a full memory dump.

  • Not enough free space for dump extractions in the specified location (i.e. c:\windows\)

  • Spontaneous reboot

  • Crash occurs at the very beginning of the boot process before paging file creation.

  • Components involved in the dumping process got corrupted.

  • System hardware errors

For a detailed description and possible solutions / workaround, check the articles below

How to collect useful configuration information about your system

In Windows there are a couple of useful tool designed to collect system configuration data like OS version, drivers, events, etc.

Msinfo32 (bundled in Windows) and Microsoft Product Support's reporting tools that need to be downloaded from Microsoft website.

msinfo32:

You can start it manually: [Start]->[Run]->msinfo32
Then export system informations into a text file.
File->Export...

MPS report (Microsoft Product Support's Reporting Tools):

It is a compressed software package that contains one or more scripts and other utilities that you can use
to capture critical system, diagnostic, and configuration information about your system.

Go to http://support.microsoft.com/kb/818742 for a detailed description and download links.

Also check the following documents:

<Document: 1010936.1> "Microsoft Windows and Linux operating systems: How to obtain troubleshooting information"

How to collect platform specific hardware logs - Ipmitool, Sel logs, DMI logs, tdulogs.

If you suspect an hardware problem you need to collect platform specific data as well.
Of course the available commands depends from the system you are working on.
Look for “Related documentation” link after selecting your system in the SunSystemHandbook page:

https://support.oracle.com/handbook_private/index.html

The following documents may be useful:

<Document: 1008335.1> "Sun [TM] X64/X86 Guide to System Troubleshooting"

<Document: 1009698.1> "How to collect data from a x86/x64 platform using Intelligent Platform Management Interface (IPMI)"

Core dump analysis

As specified at the beginning, Windows crash dump analysis is out scope of this document.
You should contact your OS provider for this.

References
Below you can find some good links to start




Internal Comments
This document contains normalized content and is managed by the the Domain Lead(s) of the respective domains. To notify content owners of a knowledge gap contained in this document, and/or prior to updating this document, please contact the domain engineers that are managing this document via the "Document Feedback" alias(es) listed below:
Normalization team alias: tsc-emea-x64@sun.com

normalized, microsoft, windows, panic, crash, dump, minidump, x86, x64, windbg, BSoD
Previously Published As
91520
20100906: Removed broken link http://support.microsoft.com/kb/274598/

Attachments
This solution has no attachment
  Copyright © 2011 Sun Microsystems, Inc.  All rights reserved.
 Feedback