Mailbox Replication Service unexpectedly quits when moving mailboxes from other Exchange server
I’ve just completed a transition from Exchange 2003 directly to Exchange 2010. This is a supported scenario, but I did ran into some trouble. Luckily, I also have a solution.
The installation of the Exchange 2010 server and upgrading of several items of the Exchange 2003 environment went without any hick-ups. The project was actually ahead of schedule and just when you think you could go home early…
Exchange 2010 now calls a mailbox move “Local Move Request”. The Exchange Mailbox Replication Service will then handle the mailbox move from one database to another. The source database can be Exchange 2003, 2007 or 2010. There are some cool benefits with this new approach, but I digress.
Very quickly after I requested a move of all 2003 mailboxes to the new Exchange 2010 server, the move of the mailboxes seemed to stall. I could see it with this Exchange Management Shell command:
Get-MoveRequest | ft Alias,Status,PercentComplete
Around 20-25%, this percentage suddenly dropped to 0%. After that it didn’t start up again (which it should).
Some research in the eventviewer showed this message:
Source: Service Control Manager
Event ID: 7031
The Microsoft Exchange Mailbox Replication service terminated unexpectedly. It has done this 1 time(s). The following corrective action will be taken in 5000 milliseconds: Restart the service.
Sure enough the Mailbox Replication service started again after five seconds. But after a while it crashed again and again. There where no other relevant errors or warnings which where related to this issue. I decided to increase the logging level of the Replication Service with this cmdlet:
Get-EventlogLevel -Identity “MSExchange Mailbox Replication\*” | Set-EventlogLevel -Level Expert
Note: decrease it again after you don’t need it anymore with “–Level Lowest”.
The expert logging increased my insight in what was happening:
Event ID: 9660
User JOSH (/o=Test/ou=First Administrative Group/cn=Recipients/cn=JOSH) failed to log on because their mailbox is in the process of being moved.
Source: MSExchange Mailbox Replication
Event ID: 1101
Mailbox move for ‘Test.nu/Test/DOCENT/JOSH’ (b0de20a6-72ee-47ef-8123-123e123e123e) encountered a transient failure. The operation will be retried (1 out of 60). Error code: -2147467259 MapiExceptionMailboxInTransit: Unable to open message store. (hr=0x80004005, ec=1292)
The above message appear several times. This is happening after the replication service crashed and is probably the lock put in place as with the normal mailbox move lock. Not really surprising. It is nice to see that the replication service is willing to try 60 times.
Source: MSExchange Mailbox Replication
Event ID: 1104
Mailbox Replication service started initial seeding stage for ‘Test.nu/Test/DOCENT/JOSH(b0de20a6-72ee-47ef-8123-123e123e123e). Total number of messages in mailbox: 854 (103.7 MB (108,744,421 bytes)).
And then it goes ahead again, but very soon the replication service crashes again and this starts from scratch (events not shown). There were no further clues on what was happening. Some mailboxes went over perfectly though, others didn’t but another time they went over without a hitch. I did get the feeling that with more mailboxes moving at one time the chance of the service crashing increased, but because of time constraints this was not further investigated.
There were no antivirus clients interfering, no antispam, the servers were in the same forest and site within the same subnet. Both servers were virtualized via vSphere. Yes, for the Exchange 2003 server not a supported situation but the problem was with the Exchange 2010 server which is supported.
Even so the virtual machines were not on the same node and the VMWare tools were up-to-date. Also the Windows Servers and Exchange servers were up-to-date, even with Exchange 2010 Update Rollup 1.
At this time I decided to call Microsoft Support. After some checks with ADSIedit, disabling firewalls, changing the type of NIC and other checks, Microsoft advised running a integrity check of the Exchange 2003 mailbox store. As I had ran out of ideas, this looked like a good suggestion. So I started ISINTEG on the Exchange 2003 server with:
exchsrvr\bin\isinteg -s <servername> -fix -test alltests
After about half an hour this summary appears:
. . . . . SUMMARY . . . . .
Total number of tests : 21
Total number of warnings : 222
Total number of errors : 0
Total number of fixes : 541
Total time : 0h:33m:26s
Yep, that database was bad…
After that the mailbox move requests completed without any relevant problems.
(I did have a lot of errors described in KB940012 resulting in the move request CompletingWithWarnings, but they are not a problem.)
My troubleshooting focused on the new Exchange 2010 installation, because it’s Mailbox Replication Service repeatedly unexpectedly quits. I also had to take the source server in consideration (something with assumptions…).
Although the problem ultimately was caused by a corrupted Exchange 2003 mailbox store, I hope that Microsoft will make the Mailbox Replication Service somewhat more robust or let it generate relevant error messages. It would have saved me quite some time troubleshooting.
Anyway, If you are not familiar with the Exchange environment and it’s history, it is a good idea to also check the integrity of the source exchange database before moving mailboxes. You can check it with ESEUTIL and ISINTEG:
ESEUTIL /G <databasename.edb>
ISINTEG -s <servername> -verbose -test alltests
/me makes mental note 😉