How not to create redundancy in your Exchange

Reading Time: 2 minutes

When I was at a client the other day I encountered the following:

Tekening1

As you can see the Exchange environment in itself already contains a single point of failure. Namely the Exchange-01 server who solemnly functions as a client access and transport hub. The two database servers however are both made high available through the use of the failover-cluster feature introduced in Windows server 2008. This in itself is a good idea. Beside the fact that this way you can create redundancy within your database hosts this also allows you to  use multiple redundant databases on both servers in a database availability group. You can even reboot one in the middle of production. For instance to  update some compromised certificates. The production reboot should notify clients to restart their outlook, but hey, your exchange is safe and up to date again.

It is a bad idea to install this failover cluster on a failover VMware cluster. The problem arises when an actual failover needs to take place. In a perfect world (where you wouldn’t even need failover since your servers would never break) failover would happen automatically if one server for whatever reason stops functioning. In the case of my client something very interesting happens.

First the database is going to be transferred to the other exchange server. All is well. At the same time, VMware steps in and fails over the Exchange server to another host or whatever it is that VMware does to keep guests alive and restores the system to it’s previous state. So the server that went down is restored with database connection while the windows failover-cluster transferred database access to the other exchange database server. With both servers wanting to access the database neither will be able to and that’s when your exchange database failover-cluster fails. This usually results in a lot of people calling the helpdesk to ask why they can’t access their mail.

This is not due to the fact that either VMware or Failover-clustering is a poor feature, this is because someone implemented a solution without proper testing.

So if you want to make Exchange redundant, only use one method and not two or more stacked methods or it will come around to byte you like an attack dog.