Last month, I provided some context for how I feel about Active Directory Monitoring and Domain Controller Monitoring. I wrote that monitoring solutions should not treat Domain Controllers as mere ‘application servers’ or ‘nodes’, as many Active Directory Monitoring solutions, like SolarWinds’ do.
However, organizations may have varying requirements towards potential Domain Controller Monitoring solutions. Some organizations already have certain functionality as part of another solution. Some organizations accept certain risks when it comes to (some of) their Domain Controllers.
As a follow-up, I decided to provide a checklist of functionality a great Domain Controller Monitoring solution should provide and why every piece of functionality is essential to make sure that Domain Controllers meet the organization’s confidentiality, integrity and availability (CIA) needs. These areas of monitoring should be checked against a baseline in the solution:
Monitoring the Domain Controllers’ core services
Any respectable Domain Controller Monitoring solution should monitor the status of the services that any Domain Controller requires to run. These include:
- Active Directory Domain Services
- AD Web Services
- DFS Replication
- DHCP client
- DNS Client
- DNS Server
- Intersite messaging
- Kerberos Key Distribution center
- Remote call procedure
- Windows Event log
- Windows Time, Workstation
When one or more of the above services stop, a notification should be sent. The DS Role Service, in this regard, is an interesting service. Active Directory admins can choose to stop and disable this service and change its permissions so only members of the Enterprise Admins security group can demote Domain Controllers. A good Domain Controller Monitoring solution should be able to detect and properly display this information as part of the Domain Controller baseline. Of course, an alert when this particular service is started would be a great addition, in this case. Having a graph that displays the status of Domain Controllers' core services over time is a pré.
Notifications through email should be a basic requirement. However, attackers may delete or modify public DNS records. Email notifications may not be delivered in these situations. Multiple notification methods is something monitoring solutions should offer today. These may be in the form of text messages and web hooks.
Monitoring generic metrics
Domain Controllers provide lots of metrics. Basic metrics can be compared against the performance baseline for the Domain Controller to detect anomalous performance behavior. Removing any bottlenecks may lead to higher Active Directory performance.
Processor utilization across all CPUs and cores is important to monitor. Domain Controllers would not have high percentages in normal situations. Typical situations where you may expect high processor utilization would be when applying Windows Updates, building indices, performing anti-malware scans, performing backups and/or restores. When creating the baseline for processor utilization, special care is advised towards the Domain Controller holding the PDC Emulator FSMO role. This Domain Controller may display overall higher processor utilization. The FSMO role can be transferred to another Domain Controller. Good Domain Controller monitoring solutions have logic to detect the role and apply the specific baseline. In large environment, the Domain Controller holding the PDC Emulator FSMO role may be overburdened. The Processor Queue length provides information on the threads that are waiting on the processors. If the queue is long (at times with high processor utilization), the processor is a bottleneck and may hinder replication of password changes, Group Policy settings and reliable time.
When a Domain Controller reads and/or writes memory to disk, it means its 'memory swapping'. Each Domain Controller tries to cache the entire Active Directory database in memory to be able to perform its tasks without needing IO to the (slower) disk(s). When a Domain Controller is memory swapping, it means it's incapable of offering the best performance towards end-users and applications. Memory swapping can occur when applying Windows Updates, building indices, performing anti-malware scans, performing backups and/or restores, but should not happen all the time. If it does, upgrade memory for the Domain Controller. You may not have a memory bottleneck just yet. However, as a base monitoring area, the available memory should be monitored. Good Domain Controller solutions are able to display the available memory in a graph over time, so trends can be discovered and proactively remediated. Great solution can filter on the Active Directory-specific process (lsass) and report on sudden memory increases, specifically.
Next to memory swapping, disk performance impacts Domain Controllers' performance in other ways, too. Slow disks can be discovered through the (average) disk queue length. When the disk queue length is long, the disk is trying to catch up on read and/or write requests. The disks' idle time provide information over time whether the Domain Controllers burst on their disks or if the disk is busy full-time.
- Network interface(s)
When a Domain Controller processes a lot of Active Directory queries, it may send and/or receive large amounts of data over the network, next to having high processor utilization. Network congestion may ultimately lead to people no longer being able to sign in. Avoid this situation by monitoring the throughput and comparing it to the maximum throughput available. Here, too, graphs of historic network activity may lead to trend discoveries and proactive remediation.
Monitoring Active Directory-specific metrics
On top of these generic performance metrics, a good Domain Controller Monitoring solution supports Active Directory-specific metrics.
The number of Kerberos authentications and NTLM authentications per sec provide information on the overall use a Domain Controller is getting. The number of Kerberos authentication vs. NTLM authentications is useful as a graph to provide information on how far the organization is on leaving NTLM behind, and should be offered by any good Domain Controller monitoring solutions. Great solutions would be able to provide information on NTLMv1 vs. NTLMv2 authentications, drill down specifically at KDC AS requests and KDC TGS requests (useful when changing ticket lifetimes) for Kerberos and provide information on the Kerberos encryption types used. Synthetic authentications might also add value on performance of authentications.
- LDAP applications
Two typical metrics to monitor LDAP performance is to monitor LDAP searches/sec, LDAP client sessions. These metrics provide information on the use a Domain Controller is getting from applications. The metrics should be fairly uniform across all Domain Controllers. If it's not, it may mean that Domain Controllers are specifically targeted based on hostname or IP address instead of the domain name, that in certain Active Directory sites Domain Controllers are getting piled on with LDAP traffic, or that LDAP is no longer functioning and clients are failing over to other Domain Controllers. Synthetic LDAP queries might also add value on performance of application authentications.
To monitor replication, the network traffic for the directory replication agent (DRA) can be monitored as the traffic flows indicate the amount of replication data flowing between Domain Controllers inside their Active Directory site and between Active Directory sites (compressed). Sudden changes in these metrics indicate a replication topology change or significant changes in Active Directory. Great Domain Controller monitoring solutions would use synthetic replications to measure replication performance, but might also be able to interpret the output of built-in tools like repadmin.exe and nltest.exe.
Monitoring Active Directory logs
The event logs on Domain Controllers provide a wealth of information on Active Directory and Domain Controller health. Good Domain Controller monitoring solutions would check for replication errors and sudden increases in errors in the specific Active Directory logs. Great solutions, however, would be able to provide a graph for Active Directory database whitespace over time, based on the daily events in the log.
Domain Controller registry (changes)
Many of the Domain Controller behavior are controlled by registry keys in HKLM:\SYSTEM\CurrentControlSet\Control\Lsa and the KDC, NTDS and Netlogon keys underneath HKLM:\System\CurrentControlSet\Services\. There is also a lot of information to be gained from these registry locations, for instance when the Domain Controller was last restored from backup, or was successfully cloned or not.
Being able to monitor changes to these registry keys, while the Domain Controller runs but especially while the Domain Controllers starts is essential to pinpointing changes to Domain Controller configurations. Great Domain Controller solutions will provide this information and notify admins when there is a significant change.
When you monitor for network congestion (see above) you can go the next step and monitor the availability of services at certain network ports. We all know that LDAP(S) uses TCP389 and TCP636. Good Domain Controller monitoring solutions will monitor these ports, as well as the other common Domain Controller network ports. It's not that hard. Great monitoring solutions will query the port to determine whether the right service is actually listening and perform these checks from all Domain Controllers to all Domain Controllers regularly. That way, potential attackers can be stopped in their tracks and changes in firewall rules can be detected fast and remediated.
DNS Server and DNS Record monitoring
To locate Domain Controllers, domain-joined devices use DNS. Domain Controllers register SRV records in DNS for this purpose. The netlogon.dns file on each Domain Controller specifies the records for it to register. By monitoring the DNS Server configuration per Domain Controller, the availability of the configured DNS Servers and the records the Domain Controller registers, situations where Domain Controllers are accidentally multi-homed, isolated or otherwise borked in the DNS arena, are detected fast and remediated. Great Domain Controller solutions know what SRV records each Domain Controller would register based on the location of the domain in the forest and the FSMO roles for the Domain Controller and can report on any deviations.
Domain Controller Backup verification
Monitoring is merely the first part of an organization's disaster recovery strategy. It avoids cascading events that would eventually lead to a disaster. Backup of Domain Controllers is another big disaster recovery measure. Good Domain Controller monitoring solutions need to be able to report on this over time. Great solutions might even integrate with backup solutions to provide insights. Veeam's SureBackup feature comes to mind here, as it allows backups to be checked for consistency. Flowing back this information into the one Domain Controller monitoring console provides perfect insights in the status of Domain Controller backups. (However, further steps are required to assure complete Active Directory forest restores.)
Domain Controller drivers and firmware
Drivers and firmware are essential to have Domain Controllers utilize the available (virtual) hardware. For virtual Domain Controllers on top of VMware, for instance, for performance it is essential that the right virtual network interface and the most recent stable VMware Tools are installed. With recent Virtualization-based Security (VBS) investments, it is also a good idea to monitor the firmware versions of TPM chips and other security-related hardware. Any changes should be reported on and a good Domain Controller monitoring solution offers this functionality.
Another networking aspect that Domain Controllers are involved in is accurate time. By default, Domain Controllers offer a time hierarchy that is used by domain-joined hosts to gather accurate time. The Domain Controller holding the PDC Emulator FSMO role is the only Domain Controller that synchronizes time from a reliable outside time source and functions at the peak of the Active Directory time hierarchy. By monitoring time and time differences between Domain Controllers, situations can be avoided where 'last write wins' scenarios don't end up in overruling some other admin's or application's changes.
There are differences between good and great Domain Controller monitoring solutions. Use the above list to determine whether advertised monitoring solutions offer the functionality your Active Directory admins need to perform their jobs.