Sometimes one of the most useful resources at your disposal when troubleshooting a hang or other issues is the memory dump file Windows will write out during a blue screen. If a system is hung and you are not able to get to it locally, pressing Ctrl+ScrollLock, ScrollLock isn't going to be a feasible solution. If the server is an HP server with an iLO card (Integrated Lights Out), and you've set a registry key in Windows ahead of time, you can force the system to blue screen, write the memory dump, and restart.
The key to doing this is generating what's called a nonmaskable interrupt or NMI. The long and short of it is that NMIs are hardware interrupts which have to be serviced immediately. Windows has a concept of IRQ levels, or IRQLs. The highest IRQL is always serviced, preempting any lower level interrupts which are currently being serviced. The preemptive behavior here is called masking the interrupt. So, an NMI is an interrupt which must be serviced immediately. Generally you get an NMI when there's a major hardware fault that prevents the operating system from continuing. This is exactly what happens if we trigger one manually in the iLO.
This post covers working with an iLO2. If you're using an iLO1, visit http://briandesmond.com/blog/forcing-a-blue-screen-via-ilo/.
The first step to getting this functionality working is setting a registry key outlined in KB 927069. Don't mind the part about this only applying to Windows 2000 Server. This works on 2000 and newer. Here's the registry info:
Path: HKLM\System\CurrentControlSet\Control\CrashControl
Value: NMICrashDump
Data: 1
Type: REG_DWORD
You'll need to reboot for the change to take effect. If you don't reboot after making this change, you'll see the effect illustrated here.
To crash the box, these are the steps. I shot these screens on a DL365 G5 which is recent hardware with an iLO version 2. If you are using an older server with an iLO version 1, visit this link.
1. Login to the ILO and then proceed to the "Diagnostics" link on the left hand navigation:
2. Click the "Generate NMI to System" button:
3. You will get a warning dialog to make sure you're really certain this is what you want to happen. Remember, doing this will HALT your system!
4. The iLO will write a status message to the status bar in IE:
5. At this point Windows will crash with a 0x80 bugcheck and reboot (assuming your machine is configured to automatically reboot after a blue screen). You can hopefully use the memory dump to assist in troubleshooting the problem at hand.
After rebooting, my test server stopped at a Press F1 prompt to tell me there was a critical error prior to this power-up. I assume there is a BIOS setting somewhere to disable this although I don't know where it is offhand: