SCOM titan Dujon Walsham was one of the bold contestants that took on the crazy challenge of building a community requested SCOM management pack (MP) in just 24 hours during HackaSCOM.
He was given difficult task of creating a SCOM management pack that already exists, but with much more flexibility and detail. Dujon’s challenge was to build an MP that alerted on unexpected shutdowns like blue screens of death (BSOD).
Although this BSOD management pack has been done many times on a simple level, Dujon wanted to add more context around why the unexpected shutdown happened to make it more insightful.
Most of the MPs already available tackle a hung server after the event has occurred. The interesting part to Dujon was how to tackle the proactive side by enriching the process. He wanted to build a management pack that helps users figure out why a server hung, even if they can’t easily catch it beforehand to prevent it happening.
There is some due diligence that can happen with collecting information from before the event happens and after it happens to help with this. So, Dujon’s plan was to get the error codes and bug codes and break it down so users can understand why the server hung.
Dujon launched into the HackaSCOM MP build, though it wasn’t without some challenges along the way.
He built a BSOD – blue screen of death – management pack with the aim of adding more context to the outages.
Inside the SCOM management pack are a few monitors. One is a bug check to get the code for the error that caused the blue screen of death. However, once the machine comes back on again, that event is in the past and so isn’t always caught in the monitoring and remains unseen. So, Dujon wanted to resurface this information.
So, he then built a secondary monitor that works alongside a rule in the background, an enhanced error script he built with PowerShell. The logic script looks for the newest event that happened in the same time frame as the script rule runs. It’s set to about 60 seconds, so you don’t have to wait too long for it to show up.
This rule finds the blue screen of death event in the past and uses logic to compile all the bug check error codes, parses the information in that event log, finds the actual bug check code, and compares that with a library of codes that is in the PowerShell script rule for a match. Once the match is found, another event log will be created and the additional monitor for the MP grabs that to show it in the SCOM Console.
In addition, Dujon added a couple of diagnostic tasks. Usually, when a machine goes down, it’s related to a faulty driver or the latest software install. Therefore, the diagnostic tasks run when the monitor is triggered to discover which drivers and software were the last to be installed before the machine went down. These have also been added as additional agent tasks so that there’s some due diligence that can be done before a server is hung in checking what software and drivers are installed.
The task list of all the latest software and drivers installed
The judges were impressed with the final result and had very few questions as the build was clear and the output was solid.
Stoyan Chalakov said, “I really liked the flexibility that Dujon showed here in order to get the event from the past. Great job!”
With just 24 hours to build a complete SCOM management pack, Dujon Walsham delivered an impressive final product that will be loved by the community. This BSOD MP goes much deeper than others already available to provide richer content for trouble shooting the cause of the blue screen of death.