Jump to content
Sign in to follow this  
t-zwck

Restart monitoring of OpsMgr environment

Recommended Posts

Problem Description:

 

There may be legitimate situations where a customer needs to reset many health monitors at once. For example after a network outage, there might be a significant number of alerts which may have been generated as well as the health state of various items becomes unhealthy. Another case is incorrect approach to Maintenance Mode may cause similar outcome, especially when manual reset monitors or alerts generated without “auto-resolve” feature are present in instances involved with maintenance.

 

To address this type of situation, the bulk of alerts from the outage need to be closed (which can be done with a PowerShell script). Also, resetting of the health state for multiple systems is required but not viable and manual intervention is needed.

 

His proposal was that it should be possible to select multiple servers and force their health back to green. Specifically, the health model for those instances would be walked and each monitor not Healthy is reset. This would “restart” the environment to green so that only real issues would resurface as alerts recurred and the states would be updated.

 

 

 

Analyzing proposal:

 

It is already possible to use SDK tasks to accomplish this proposal. It is event achievable to “speed” up the recognition of real issues by submitting additional “recalculate” state task for given instance (where this task forces to recalculate what the state of given instance should be (at the time of the execution) by working with on-demand detection (assuming that such detection is defined for monitor types used for monitoring of that same instance)).

 

My approach to implementing this proposal was little different than stated above. I’m not finding every unhealthy monitor, but crawl relationship tree for selected instance recursively adding each instance contributing to the overall health. While making sure instance is present just once, result of reset request against each of those instances affects the health state of all other instances that depend on its state either directly or indirectly.

 

Note:

Following post contains video trying to describe the difference between Reset and Recalculate tasks. It also touches bases on what does “on-Demand” detection means etc. Please contact me thru comments if I should try to provide additional/different explanation of those monitor features.

 

 

 

Solution:

Attached, you can find source code for my solution as well as installers for deployment of already built binaries. I provide two types of integration with our operations console.

 

First is having a task associated with managed entity “Microsoft.SystemCenter.ComputerGroup”. This will become present when installation of “RestartMonitoringSetup” for particular SKU succeeds. Following is screenshot providing self-descriptive use of the task:

 

Hidden Content

    Give reaction to this post to see the hidden content.

 

Second possible integration is using the fact that console is able to act like a browser. Deployment is performed by RestartMonitoringWebSetup and consists of creating Web application and MP import. Web application allows regular web browser to act as the tool which triggers requested restart action. MP associated with this approach contains following WEB view to allow integration with console:

 

Hidden Content

    Give reaction to this post to see the hidden content.

 

Choosing option with group allows “restart” of the monitoring for all instances contained within all selected groups. Such operation may become rather consuming, as I hinted above, instance space is crawled and all necessary instances (contributing directly or indirectly) are asked to reset and then recalculate their state.

 

Hidden Content

    Give reaction to this post to see the hidden content.

 

Hidden Content

    Give reaction to this post to see the hidden content.

 

Option to restart monitoring for which active alert is present is doing similar operation as the one made for group, only difference is that likely-hood of having many instances contributing to overall health state is smaller that it is with group (or multiple groups for that matter).

 

Hidden Content

    Give reaction to this post to see the hidden content.

 

Hidden Content

    Give reaction to this post to see the hidden content.

 

 

FILE TO DOWNLOAD

 

Hidden Content

    Give reaction to this post to see the hidden content.

Share this post


Link to post
Share on other sites

thanks for posting

Hidden Content

    Give reaction to this post to see the hidden content.

 

is the tool working with SCE2010 ?

 

and which one of the whole msi packets i must install

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...