Today I am going to talk about tweaking Active Directory SCOM Management Pack. Primary reason Why I tweaked it was this alert:
AD Replication is occurring slowly
Rule was generating multiple Alerts. In fact alert number was so high that rule was repetitively unloaded. In Operations Manager view also got an alert:
A rule has generated 50 alerts in the last 60 seconds. Usually, when a rule generates this many alerts, it is because the rule definition is misconfigured. Please examine the rule for errors. In order to avoid excessive load, this rule will be temporarily suspended until…
It turned out that in my case maximum replication cycle was 180 minutes. Since AD replication latency alert is triggered when IntersiteExpectedMaxLatency (Default = 15) is more than 3x its value alert Will be triggered if latency is above 45 minutes.
Since our expected latency was above 45 minutes value. Multiple alerts were generated.
After modification alert went away 🙂
We must override the value of IntersiteExpectedMaxLatency not only in “AD Replication is occurring slowly” rule but also these:
AD Replication is occurring slowly (there are three rules with this name)
One or more domain controllers may not be replicating (there are three rules with this name)
DC has failed to synchronize naming context with its replication partner (there are three rules with this name)
All of the replication partners failed to replicate.
AD Replication Performance Collection – Metric Replication Latency
AD Replication Performance Collection – Metric Replication Latency:Minimum
AD Replication Performance Collection – Metric Replication Latency:Maximum
AD Replication Performance Collection – Metric Replication Latency:Average
But, to override one rule value we must change it on 14 rules per AD version (if we monitor mixed environment AD we must override it for 2000, 2003 and 2008).
Another interesting thing that I found with AD MP is that it produced almost 100.000 events in AD Event view. After fast glance I saw laaarge number of event ID 17 which in short collects: ‘AD Monitor Trusts’ script status. It turns out that This script by default ran every 5 minutes on domain controller that is:
12 per hour x 24 hours x 7 Days of retention = 2016 Events per domain controller.
And if you have say 40 DCs in your environment that is: 80640 Events that SCOM holds in Database. Since this event has no importance in my case the collecting rule “AD Monitor Trusts Event 2” was overridden for all Instances:
If you want to deep dive a bit more you can read the long version 🙂
Once upon a time the AD MP was created.. 🙂 Later on I imported it and got multiple alerts “AD Replication is occurring slowly”. In fact alert number was so high that rule triggering this alerts was unloaded. So I started troubleshooting…
First I have checked which rule is generating this alerts. AD “Replication is occurring slowly” was causing it. So I exported AD MP and started digging into the code. I found out that SCOM name for this rule was:
Wait what it ends with XXX? No actually there were three rules with the same friendly name. XML code revealed that names are actually:
But all three rules had same friendly name: AD Replication is occurring slowly.
OK this is strange. I decided to dig a bit more. I said let’s see the script. How hard can it be..
So rule is using: AD_Replication_Monitoring.DataSource; and it seemed like the script that was checking replication was named: AD_Replication_Monitoring.vbs
Let’s look into it… Oh but script has over 3000 lines… 😦
I decided to Bing/Google it and found this old but great Jimmy Harper’s post. Actually on its blog I found out that there are 14 rules per DC version that needs to be configured in order for it to be successfully overridden.
On Jimmy’s blog there is also link to even Older AD Management Pack Technical Documentation. I started reading this document and found out that script generates alert when:
- The value in adminDesc on a monitoring object is older than ObjectUpdateThreshold (which is one day by default).
- Intrasite replication latency for a monitoring object is greater than three times the IntraSiteMaxExpectedLatency threshold.
- Intersite replication latency for a monitoring object is greater than three times the InterSiteMaxExpectedLatency threshold.
Script calculates replication latency by calculating the difference between the values for whenChanged and adminDesc
Default value of IntersiteExpectedMaxLatency is 15 minutes:
Once again I Will explain these values as I understand them:
IntersiteExpectedMaxLatency => Expected latency between sites (alert triggered if value is 3x above this value)
IntrasiteExpectedMaxLatency => Expected latency within site (alert triggered if value is 3x above this value)
ChangeInjectionFrequency => When adminDesc value is changed. By default it is every 6 time the script is ran. The script runs every 15 minutes by default so value si changed every 90 minutes.
ObjectUpdateThreshold => If value adminDesc is older than 24 hours by default => generate an alert.
I have checked this values on certain domain controller and found that difference is more than 45 minutes. To check it you must open OpsMgrLatencyMonitors container. In it you Will see that Container is created for each Domain controller. You can check DCs containers and compare whenChanged and adminDesc values.
In my case there were multiple occasions when replication latency went over 45 minutes. So I set it according our replication intervals. For example if maximum value when replication occurs in your environment is 180 minutes, you would set IntersiteExpectedMaxLatency to 180/3 = 60
Let’s also check rule which is collecting event ID 17. Here are interesting sections of AD_Monitor_Trusts_Pass_through_2 rule XML:
<Value>AD Monitor Trusts</Value>
So rule with name AD_Monitor_Trusts_Pass_through_2 collects each Event which is generated in Operations Manager log and it has to have:
Event ID 17
“AD Monitor Trusts” string in its Event Data.
I Added the whole XML for this rule in the end of this post.
I could not find detailed description how replication monitoring Works on newest AD MP version. Since I do not want to have alerts which cause noise I decided to increase the threshold. Even if I went a bit to high for our environment it is OK. I got an alert that replication is occurring slowly and the reason was maintenance of one of sites. This is what I want from this rule and I am happy 🙂
If I am not mistaken default intersite replication interval is 180 minutes. So I can not understand why the value of latency is set to 3×15=45 Minutes. If that is the case we should always get this alerts with default SCOM and AD values. Maybe there is something I am missing… If someone has a better idea let me know and I Will gladly update this post.
Regarding the event collection rule for ID 17 and “AD Monitor Trusts”. This event was causing so much noise that I decided to turn it off. In fact there were few occasions that crashed SCOM console when searching within AD Event view. Now I can actually focus on events that have some added value.
I always try to understand why something is collecting, alerting,.. in SCOM. If I cannot see any added value then this events Will be ignored. So why collecting it in the first place. I also try to set threshold so in case if Alert/Event,.. is raised I Will actually take some action or have some added value in case of troubleshooting.
Sometimes is best to set threshold a bit less conservative otherwise it might happen when real problem arises (for example with replication) you just say: “Oh it is that stupid replication alert” and ignore it…
Hope this helps you
AD_Monitor_Trusts_Pass_through_2 Rule XML:
<Rule ID=”AD_Monitor_Trusts_Pass_through_2″ Enabled=”onStandardMonitoring” Target=”AD2008Core!Microsoft.Windows.Server.2008.AD.DomainControllerRole” ConfirmDelivery=”false” Remotable=”false” Priority=”Normal” DiscardLevel=”100″>
<DataSource ID=”EventDS” TypeID=”Windows!Microsoft.Windows.EventProvider”>
<Value>AD Monitor Trusts</Value>
<WriteAction ID=”WriteToDB” TypeID=”SC!Microsoft.SystemCenter.CollectEvent” />
<WriteAction ID=”WriteToDW” TypeID=”SCDW!Microsoft.SystemCenter.DataWarehouse.PublishEventData” />