Replication considerations for Domain Controllers running on VMware vSphere

This entry is part 4 of 6 in the series Virtualizing Domain Controllers on vSphere

Virtualizing Domain Controllers

Active Directory utilizes a multi-master replication model. It’s great that each Domain Controller provides read and write access to the Active Directory database, but it comes with a big drawback: Domain Controllers need to be in sync to provide consistent data to clients, independent of the Domain Controller communicated to. A big question to ask when virtualizing Domain Controller is:

What’s the impact of virtualizing Domain Controllers on VMware vSphere in terms of replication?

Let’s find out…

 

Active Directory replication

Active Directory utilizes a multi-master replication model. This means that changes (called ‘writes’) to the database can originate from every Domain Controller.

Note:
Read-only Domain Controllers are special, as they refer write operations to (read/write) Domain Controllers.

However, some writes are special. For instance, schema update operations, targeting the schema partition happen only on the Schema Master, or to be exact: the Domain Controller holding the Schema Master Flexible Single Master Operations (FSMO) role. Another examples are changes to passwords. A password can be changed on every Domain Controller, but is replicated to the Domain Controller holding the PDC Emulator (PDCe) FSMO role and then replicated out from there.

In multi-domain environments, replication is extended through the infrastructure master and Global Catalog servers.

 

Components of Active Directory replication

There are a couple of main components of Active Directory replication:

  1. The Directory Service Agent GUID
  2. The InvocationID
  3. The Update Sequence Number
  4. Timestamps

The Directory Service Agent GUID

The Directory Service Agent (DSA) globally-unique identifier (GUID) is unique to a Domain Controller. This value is created during promotion of the Windows Server installation to Domain Controller and persists over the life of the Domain Controller. The DSA GUID is used with USNs and is, therefore, useful to track the Domain Controller on which the update originated.

The InvocationID

The InvocationID is used by replication partners of Domain Controller to identify the Domain Controller’s instance of the Active Directory database. That’s right! Domain Controller don’t have replication partnerships based on hostnames or IP addresses. These can be changed, and (when performed properly) will have no negative impact on Active Directory replication.

Unlike the DSA GUID, the InvocationID can change over time. For instance, when a Domain Controller is properly restored from a backup, the InvocationID is reset to trigger replication as a new replication partner, allowing to replicate changes in that might have originated from the Domain Controller, but were lost.

Update Sequence Numbers

Update Sequence Numbers (USNs) can be seen as Domain Controllers’ internal logical clocks. Every time a write occurs to the Active Directory database on a Domain Controller, it adds the number of writes to the USN.

Domain Controllers can have different USNs. This is logical; a Domain Controller that has been around for a longer period of time might have seen a lot of password changes for a user, resulting in separate writes, whereas a relatively new Domain Controller might not yet have seen any password changes for the user and would only have written the object once.

Every Domain Controller keeps records of the last-seen USN with the InvocationIDs of its replication partners. This information is stored in the high watermark table, as depicted below:

Active Directory Replication explained, part 1 (click for larger picture)

Timestamps

As mentioned in the previous blogpost on Managing Active Directory Time Synchronization on VMware vSphere, only when two writes in Active Directory cross replication, the time stamp is used to make the last write win.

If you experience replication problems for objects, check the time stamps: The version of the object plus the originating time and the originating DSA GUID will show you which Domain Controller to check first.

 

Replication types

There are two replication types:

  1. Intra-site replication, within Active Directory sites, and;
  2. Inter-site replication, between Active Directory sites.

Intra-site replication

Intra-site replication works with change notifications. Building upon the previous picture, when 12 objects are created on DC1, its UPN is upped by 12, going from 400 to 412:

Active Directory Replication explained, part 2 (click for larger picture)

Now, DC1 will notify its replication partners that writes were made and that these changes may be replicated. DC2 performs replication, performs 12 writes in the database too, and updates its high watermark table with the last seen USN for the InvocationID for DC1, as depicted below:

Active Directory Replication explained, part 3 (click for larger picture)

Now, when we create an object on DC2, its UPN is upped after the write, a change notification is sent, DC1 would perform replication, and its USN is upped by the same number during this replication, and the USN is updated in the high watermark table for the InvocationID of DC2 on DC1, too:

Active Directory Replication explained, part 4 (click for larger picture)

As DC1 knows the write it performed in its database is for an object that didn’t originate from itself, it will not send a change notification to DC2. Yes, there is a gap that remains between USNs on Domain Controllers and USNs in high watermark tables on its replication partners. That’s fine.

Inter-site Replication

Active Directory sites govern access and replication. They can be used to define locations of high interconnectivity. In organizations with multiple locations and low (available) bandwidth between these locations, authentication traffic doesn’t have to travel across the low bandwidth connections, but stays within the location. Active Directory site links connect Active Directory sites.

Note:
Connections with bandwidth below 10Mbit/second and unreliable connections are considered reasons to create Active Directory sites.

Inter-site replication is different from intra-site replication. It doesn’t use change notifications to initiate replication (unless you enable the Inter-site Change Notification feature). Instead, it uses a replication schedule.

Another big difference is the functionality of the Bridgehead Server as the only Domain Controller taking care of replication to a Domain Controller on the other side of an Active Directory sitelink.

 

Challenges with Active Directory Replication

The virtualization platform hosting virtualized Domain Controllers offers new functionality to admins, including easy snapshots. When using these features with virtualized Domain Controllers, two challenges typically emerge:

  1. USN Rollbacks
  2. Lingering Objects

USN Rollbacks

Update Sequence Numbers (USNs) assume linearity of time. With non-virtualized Domain Controllers, the popular Varonis imaging products caused a lot of problems for Active Directory admins. That’s because when you reimage a Domain Controller to a previous state, without telling Active Directory, you reset the USN to a previous state. As the USN is logged in the high water mark table by its replication partners, they’ll know something is wrong:

USN Rollbacks (click for larger picture)

Starting with Windows Server 2003 Service Pack 1, or when you install KB875495, the replication partners of the improperly reset Domain Controller will:

  1. Stop replicating to the improperly reset Domain Controller
  2. Log Event ID 2095

This prevents writes inside the USN ‘bubble’ that will not replicate out from the improperly reset Domain Controller.

Lingering Objects

Lingering objects are another challenge. Reverting to a previous state is the cause of this problem, too. However, in the case of lingering objects, the previous state is a state that is beyond Active Directory’s tombstone lifetime.

Up till this point in this blogpost, we’ve discussed creating objects. Lingering Objects have to do with deleting objects. Contrary to what you may expect, when an object is deleted on a Domain Controller, the object is not deleted. Instead, it is marked as deleted. This allows all replication partners of the Domain Controller to replicate this change; it is tombstoned. After the tombstone lifetime has expired, the object is actually deleted (and its Distinguished Name Tag and Security Identifier cannot be reused).

Note:
When the Active Directory Recycle Bin is enabled, a separate state for object is introduced, allowing for a recycle lifetime period, identical to and preceding the tombstone lifetime period.

The tombstone lifetime is enforced per Domain Controller through the Garbage Collection Process. This process, that runs every 12 hours, is responsible for ‘cleaning up’ deleted objects, as shown below:

The Garbage Collection Process (click for larger picture)

When a Domain Controller is improperly restored to a point in time exceeding the tombstone lifetime, a deleted object may reappear and remain, because the deletion was already fully processed by its replication partners.

However, the replication partners will not have knowledge of the object and do not have the logic to update the object. (the object’s originating DSA. This might leave to integrity problems where a person may or may not be able to log on using a user object, depending on the Domain Controller that is used as the logon server:

Lingering Object (click for larger picture)

It gets awkward when the lingering object is the user object for a domain admin that was escorted off the premises, but suddenly regains administrative privileges in the environment…

 

Recommendations

Design for replication

Active Directory site links allow for directing replication traffic. Schedules and cost allow for further customization and minimizing of replication traffic between Active Directory sites.

Designing the Active Directory sites and site links from the start offers the best results. Processes in which Active Directory admins are kept up-to-date by networking admins on changes in the networking infrastructure aid in keeping the replication topology in the best shape.

Apply the defaults

Many people talk of the ‘tyranny of the defaults”, but in case of Active Directory replication, the default settings provide a robust mechanism for keeping Domain Controllers in synchronization.

Make sure to keep the following default settings:

  1. The Bridge all site links option allows for the complete demise of an Active Directory site in a fully-routed networking environment.
  2. The Knowledge Consistency Checker (KCC) and Inter-site Topology Generator (ISTG) allow for automatically generated and automatically updated replication partnerships between Domain Controllers and for automatically assigned BridgeHead Servers.
  3. Strict Replication Consistency prohibits replication of Active Directory objects beyond the Tombstone Lifetime.

Take advantage of new features

Make sure to make the following changes if the Active Directory environment has been running Windows Server 2003 Domain Controllers in the past:

  1. Set the Tombstone Lifetime period value to 180 days, as it might still be interpreted as 60 days, allowing for a timeframe in which to introduce Lingering Objects.
  2. Migrate SYSVOL replication from NTFRS to DFS-R.
  3. Enable the Active Directory Recycle Bin feature

Monitor Active Directory replication

By monitoring Active Directory replication, replication problems can be identified fast and effortlessly. Historical data might prove instrumental in pinpointing root causes of replication problems and moments in time when Lingering Object creation happened. You can use the following tools:

  1. Repadmin.exe (built-in)
  2. Active Directory Replication Status Tool

Don’t revert Domain Controllers to snapshots

Use Active Directory-aware Disaster Recovery solutions, use them to make Active Directory-aware backups of Domain Controllers and use them to restore Domain Controllers properly. Upon restore, these solutions will:

  1. Invalidate the RID Pool for the Domain Controller
  2. Create a new InvocationID for the Domain Controller, effectively proposing it as a new replication partner to its former replication partners
  3. Perform an initial replication to guarantee Active Directory object integrity

Don’t restore Domain Controllers beyond the Tombstone Lifetime

When you restore Domain Controllers beyond the tombstone lifetime period, lingering objects may be introduced. The tombstone lifetime period is 60 days, by default, for environments originally set up with Windows Server 2003 R2, or older and is 180 days, by default for environments originally set up with Windows Server 2008, or up.

Get rid of lingering objects

​Use the Lingering Objects Liquidator (LoL) to discover and remove lingering objects.

 

Concluding

This blogpost now features a pretty good primer on Active Directory replication. It’s not the most fun stuff to read, but it helps in explaining the replication considerations for Domain Controllers running on VMware vSphere to avoid USN rollbacks and lingering objects.

Further reading

Active Directory Replication Concepts
Download Active Directory Replication Status Tool
Download Lingering Object Liquidator (LoL)
2498185 How to diagnose Active Directory replication failures

Series Navigation

<< Managing Active Directory Time Synchronization on VMware vSphereActive Directory Virtualization Safeguards with VM-GenerationID on VMware vSphere >>

leave your comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.