Jump to content
Sign in to follow this  
t-zwck

Alert Storm Recognition (possible rule misconfiguration)

Recommended Posts

Alert storm mitigation at glance:

 

 

I need to clarify we are not trying to solve generic data storm problem – that is vNext scenario. We were only addressing possible “rogue” alert generating rule to flood our operational DB and/or raise too many notifications.

 

Settings to recognize such problem are per agent (across all targeting instances) per individual management group (there are multiple groups settings in registry in multi-homed scenario). Default throttle settings are 50/60/10. This means that if one rule generates more than 50 alerts within 60s, such rule is suspended for 10 minutes (alert generation is disabled)

 

Option to customize threshold values still exist … Customization will not work in very special deployment scenario – having OpsMgr2007 R2 agent multi-homed to at least one management group monitored by OpsMgr2007 SP1 server (reason is that such agent is forced to use SP1 management packs – and those obviously miss new configuration required when threshold customization was requested). In order for runtime to recognize customized values, health service must be restarted!

 

When runtime recognizes that possible storm is happening, event 5399 is raised. Following is English snap of such event:

 

 

Hidden Content

    Give reaction to this post to see the hidden content.

 

Hidden Content

    Give reaction to this post to see the hidden content.

 

Hidden Content

    Give reaction to this post to see the hidden content.

 

 

OpsMgr 2007 R2 health monitoring will recognize this event and will raise an alert to notify operator about this problem. Alert needs to be manually closed when corrective action is taken or when conditions causing possible storm are mitigated

 

Following is an example of customized threshold values. It shows customization 15/30/5 (15 alerts within 30 seconds will cause suspension for 5 minutes (300 seconds). It also shows where in registry such customization should be done. One must create “Alert Count”, “Alert Count Interval” and ”Alert Suspend Interval” under “HKLMSystemCurrentControlSetServicesHealthServiceParametersManagement Groups”.

 

 

Hidden Content

    Give reaction to this post to see the hidden content.

Share this post


Link to post
Share on other sites

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

Sign in to follow this  

×
×
  • Create New...