Sometimes one of the most useful resources at your disposal when troubleshooting a hang or other issues is the memory dump file Windows will write out during a blue screen. If a system is hung and you are not able to get to it locally, pressing Ctrl+ScrollLock, ScrollLock isn't going to be a feasible solution. If the server is an HP server with an iLO card (Integrated Lights Out), and you've set a registry key in Windows ahead of time, you can force the system to blue screen, write the memory dump, and restart.

The key to doing this is generating what's called a nonmaskable interrupt or NMI. The long and short of it is that NMIs are hardware interrupts which have to be serviced immediately. Windows has a concept of IRQ levels, or IRQLs. The highest IRQL is always serviced, preempting any lower level interrupts which are currently being serviced. The preemptive behavior here is called masking the interrupt. So, an NMI is an interrupt which must be serviced immediately. Generally you get an NMI when there's a major hardware fault that prevents the operating system from continuing. This is exactly what happens if we trigger one manually in the iLO.

This post covers working with an iLO2. If you're using an iLO1, visit http://briandesmond.com/blog/forcing-a-blue-screen-via-ilo/.

The first step to getting this functionality working is setting a registry key outlined in KB 927069. Don't mind the part about this only applying to Windows 2000 Server. This works on 2000 and newer. Here's the registry info:

Path: HKLM\System\CurrentControlSet\Control\CrashControl
Value: NMICrashDump
Data: 1
Type: REG_DWORD

You'll need to reboot for the change to take effect. If you don't reboot after making this change, you'll see the effect illustrated here.

Note: If you have the Automated System Recovery (ASR) functionality enabled on the server and you need to get a full memory dump, you will need to turn it off as it can interfere with this process. This is a BIOS setting which I don't have the steps to change readily available. If there's demand (leave a comment), I can track them down.

To crash the box, these are the steps. I shot these screens on a DL365 G5 which is recent hardware with an iLO version 2. If you are using an older server with an iLO version 1, visit this link.

1. Login to the ILO and then proceed to the "Diagnostics" link on the left hand navigation:

2. Click the "Generate NMI to System" button:

Warning: I can't guarantee that this button generates a warning when you click it on all versions of the iLO firmware. Generating an NMI will HALT your system. Don't click this button just to see what happens!

3. You will get a warning dialog to make sure you're really certain this is what you want to happen. Remember, doing this will HALT your system!

    

4. The iLO will write a status message to the status bar in IE:

    

5. At this point Windows will crash with a 0x80 bugcheck and reboot (assuming your machine is configured to automatically reboot after a blue screen). You can hopefully use the memory dump to assist in troubleshooting the problem at hand.

Note: If you don't get a traditional blue screen and instead get the nontraditional blue screen pictured here, you've made an error entering the registry setting described earlier or you did not reboot.

After rebooting, my test server stopped at a Press F1 prompt to tell me there was a critical error prior to this power-up. I assume there is a BIOS setting somewhere to disable this although I don't know where it is offhand:

Note: This capability is present in the Dell DRAC3 cards. I spoke with Dell and they advised me that this functionality is not available in newer generation DRAC cards.