One of the benefits of virtualizing machines is the built-in resiliency of the underlying virtualization platform. In many vSphere environments consisting of multiple datacenters, this resiliency is expanded with Site Recovery Manager.
There are, however some things you’ll want to know about using Site Recovery Manager in combination with virtualized Domain Controllers. As usual, not every configuration opportunity necessarily adds resilience.
About Site Recovery Manager
VMware Site Recovery Manager (SRM) is a business continuity and disaster recovery solution that helps you to plan, test, and run the recovery of virtual machines between a protected vCenter Server site and a recovery vCenter Server site. It features automated orchestration of fail-over and fail-back of virtual machines to optimize availability.
For large environments, SRM offers policy-driven automation to protect thousands of virtual machines easily using centralized recovery plans managed from the vSphere Web Client.
VMware Site Recovery Manager (SRM) is a separately licensed and versioned VMware product, available as a Standard and Enterprise license. Both versions are fully featured, but Standard edition is limited to 75 virtual machines per site.
Before you get any fancy ideas on SRM’ing your Domain Controllers to AWS, for virtualized Domain Controllers, the VMware documentation on Site Recovery Manager and Active Directory Domain Controllers is really clear:
Active Directory provides its own replication technology and restore mode.
Do not use Site Recovery Manager to protect Active Directory domain controllers. Use the Active Directory replication technology and restore mode technologies to handle disaster recovery situations.
However, I feel there’s three viable ways to use VMware Site Recovery Manager (SRM) with virtualized Domain Controllers:
- Recovery of Operations Masters
- Using a Domain Controller from the primary site during Disaster Recovery testing
- Cloning a Recovery Site Domain Controller during Disaster Recovery testing
1. Recovery of FSMO Role Masters during Disaster Recovery
In this configuration, the recovery site contains production Domain Controllers, but the Domain Controllers holding the Flexible Single Master Operations (FSMO) roles are all located at the primary site:
The Domain Controllers holding the FSMO roles are then protected and included in the SRM recovery plan. The recovery plan is configured to recover the Domain Controller holding the RID Pool Master role first. This gives that Domain Controller sufficient time to start performing its role.
Recovery in this scenario is relatively easy, because the Domain Controller holding the RID Master role starts without problems, because it is able to talk to existing Domain Controllers at the recovery site. The Domain Controller holding the RID Pool Master role is ready to provide RID Pools for other Domain Controllers that are being recovered.
2. Using a Domain Controller from the primary site during Disaster Recovery testing
In this configuration, DC1 (located in Primary site) is protected, along with other workloads. Although there is a DC2 in the Recovery site, there is no projected protected workload in the recovery site, so we need a Domain Controller from the primary site to maintain the availability of two Domain Controllers during an outage:
One recovery plan is created to cover all protected workloads.
This is a straight-forward configuration. During test failover, DC1 will also be recovered and the other workloads can communicate with it. There is also no special configuration required for this scenario and there is no impact on Active Directory, because any changes performed on DC1 during test failover are discarded and not written to production DC1.
3. Cloning a Recovery Site Domain Controller during Disaster Recovery testing
When an organization doesn’t want to include Domain Controllers in Recovery Plans, because there is already a Domain Controller (DC2) in the recovery site, this scenario provides a good alternative for Disaster Recovery testing.
In this configuration, the Recovery Plan workflow includes a call-out to clone DC2 before proceeding with recovery of the protected workloads:
Once the clone operation is completed, the clone is connected to a special “Recovery Network” that has been previously provisioned. Recovery of the rest of the workloads in the Recovery Plan then proceeds as normal to the designated recovery network.
The recovery network where DC01 is cloned to must not be reachable by the production network to avoid pollution of the information in Active Directory by the disaster recovery test.
Above, three distinct Recovery Plans are mentioned that allow organizations to use Site Recovery Manager (SRM) with Active Directory; one scenario for a true disaster recovery failover, and two scenarios to test disaster recovery failovers.
Using these outlines for your Recovery Plans towards the virtualized Domain Controllers will greatly benefit the availability of Active Directory in Disaster Recovery scenarios.