Exchange Recovery: Failed DAG Member in Exchange Server 2010

In this tutorial I will demonstrate a recovery scenario for a failed Mailbox server that is a member of an Exchange 2010 Database Availability Group.  In this scenario the DAG has two members, EX1 and EX2.  EX2 has suffered a serious hardware failure and needs to be recovered.

With server EX2 down the each mailbox database in the DAG has switched over to EX1 and shows the following status information.

[PS] C:\>Get-MailboxDatabaseCopyStatus -Identity "Mailbox Database 01"

Name                                          Status          CopyQueue
                                                              Length
----                                          ------          ---------
Mailbox Database 01\EX1                       Mounted         0
Mailbox Database 01\EX2                       ServiceDown     0

The Exchange recovery process begins by reinstalling Windows Server 2008 R2 on the new server.

Installing Windows Server 2008 R2

Installing Windows Server 2008 R2

Because this Exchange recovery is for a member of an Exchange 2010 DAG the server must be installed with the Enterprise edition of Windows Server 2008 R2.

Exchange 2010 DAG members require the Enterprise edition of Windows Server

Exchange 2010 DAG members require the Enterprise edition of Windows Server

After Windows Server 2008 R2 is finished installing log on to the server and complete the following tasks:

  • Configure the Timezone settings
  • Configure the Automatic Update settings
  • Configure the server with the same TCP/IP configuration as the previous server
  • Configure the server with the same name as the previous server (in this case EX2)
  • Join the server to the Active Directory domain

The next step is to install the Exchange 2010 pre-requisites for the Mailbox server role.  From an elevated PowerShell prompt run the following commands.

Import-Module ServerManager

Add-WindowsFeature NET-Framework,RSAT-ADDS,Web-Server,Web-Basic-Auth,Web-Windows-Auth,Web-Metabase,Web-Net-Ext,Web-Lgcy-Mgmt-Console,WAS-Process-Model,RSAT-Web-Server -Restart

After the server has restarted we also need to install the Exchange Server 2010 SP1 hotfixes for Windows Server 2008 R2. These updates require another restart of the server.

Before installing Exchange Server 2010 on the server being recovered we first need to remove it from the DAG. On another Exchange 2010 server open the Exchange Management Shell and run the following commands.

First, determine which mailbox databases the server was hosting a copy of, the activation preferences, and any replay lag that was configured. In this example server EX2 hosted copies of Mailbox Database 01 and Mailbox Database 02.

[PS] C:\>Get-MailboxDatabase | fl name, servers, activ*, *lag*

Name                 : Mailbox Database 02
Servers              : {EX2, EX1}
ActivationPreference : {[EX2, 1], [EX1, 2]}
ReplayLagTimes       : {[EX2, 00:00:00], [EX1, 00:00:00]}
TruncationLagTimes   : {[EX2, 00:00:00], [EX1, 00:00:00]}

Name                 : Mailbox Database 01
Servers              : {EX1, EX2}
ActivationPreference : {[EX1, 1], [EX2, 2]}
ReplayLagTimes       : {[EX1, 00:00:00], [EX2, 00:00:00]}
TruncationLagTimes   : {[EX1, 00:00:00], [EX2, 00:00:00]}

Name                 : Archive Mailboxes
Servers              : {EX1}
ActivationPreference : {[EX1, 1]}
ReplayLagTimes       : {[EX1, 00:00:00]}
TruncationLagTimes   : {[EX1, 00:00:00]}

Next, remove the failed server from each of the mailbox databases that it held a copy of.

[PS] C:\>Remove-MailboxDatabaseCopy "Mailbox Database 01\EX2"

[PS] C:\>Remove-MailboxDatabaseCopy "Mailbox Database 02\EX2"

Warnings will appear because the failed Exchange server EX2 can’t be communicated with, however the change can be confirmed by repeating the earlier command.

[PS] C:\>Get-MailboxDatabase | fl name, servers, activ*, *lag*

Name                 : Mailbox Database 02
Servers              : {EX1}
ActivationPreference : {[EX1, 1]}
ReplayLagTimes       : {[EX1, 00:00:00]}
TruncationLagTimes   : {[EX1, 00:00:00]}

Name                 : Mailbox Database 01
Servers              : {EX1}
ActivationPreference : {[EX1, 1]}
ReplayLagTimes       : {[EX1, 00:00:00]}
TruncationLagTimes   : {[EX1, 00:00:00]}

Name                 : Archive Mailboxes
Servers              : {EX1}
ActivationPreference : {[EX1, 1]}
ReplayLagTimes       : {[EX1, 00:00:00]}
TruncationLagTimes   : {[EX1, 00:00:00]}

Next, remove the failed server from the Database Availability Group. Run the following command in the Exchange Management Shell.

[PS] C:\>Remove-DatabaseAvailabilityGroupServer -Identity DAG -MailboxServer EX2

Note: in some DAG topologies this action will fail with an error “A quorum of cluster nodes was not present to form a cluster”. If that error occurs use the solution in this article – Unable to Remove Failed Server from DAG Membership in Exchange Server 2010

When you are ready to proceed with the Exchange 2010 install open a command prompt and run the following command from the directory that has the Exchange setup files located within.

setup /m:recoverserver

When setup has complete and the server has been rebooted, add the recovered server back in to the Database Availability Group.

[PS] C:\>Add-DatabaseAvailabilityGroupServer -Identity DAG -MailboxServer EX2

Then, taking note of any replay or truncation lag times, and activation preferences that were earlier identified, re-add the mailbox database copies to the recovered server. This process can take a long time depending on the size of the mailbox databases that need to be reseeded.

[PS] C:\>Add-MailboxDatabaseCopy -Identity "Mailbox Database 01" -MailboxServer EX2
[PS] C:\>Add-MailboxDatabaseCopy -Identity "Mailbox Database 02" -MailboxServer EX2 -ActivationPreference 1

You can now verify that the databases have the same settings that were identified earlier.

[PS] C:\>Get-MailboxDatabase | fl name, servers, activ*, *lag*

Name                 : Mailbox Database 02
Servers              : {EX2, EX1}
ActivationPreference : {[EX2, 1], [EX1, 2]}
ReplayLagTimes       : {[EX2, 00:00:00], [EX1, 00:00:00]}
TruncationLagTimes   : {[EX2, 00:00:00], [EX1, 00:00:00]}

Name                 : Mailbox Database 01
Servers              : {EX1, EX2}
ActivationPreference : {[EX1, 1], [EX2, 2]}
ReplayLagTimes       : {[EX1, 00:00:00], [EX2, 00:00:00]}
TruncationLagTimes   : {[EX1, 00:00:00], [EX2, 00:00:00]}

Name                 : Archive Mailboxes
Servers              : {EX1}
ActivationPreference : {[EX1, 1]}
ReplayLagTimes       : {[EX1, 00:00:00]}
TruncationLagTimes   : {[EX1, 00:00:00]}

The failed DAG member has now been recovered and the Exchange 2010 Database Availability Group is back to normal operation.

About Paul Cunningham

Paul is a Microsoft Exchange Server MVP and publisher of Exchange Server Pro. He also holds several Microsoft certifications including for Exchange Server 2007, 2010 and 2013. Connect with Paul on Twitter and Google+.

Comments

  1. Thanks for this article as it saved me lots of work.

  2. Nithyanandham.s says:

    It’s really an good article ,thanks a lot …….because it helped me a lot while at the time of recovering the failed server in dag ….. Keep posting

  3. Great article, really helped me out, had to reinstall our broken DAG memeber.

  4. thanks a lot … one of my DAG members suddenly failed … got to recover it and this article was a lifesaver.
    BTW Keep up with the good work …this site rocks

  5. Matthias Koller says:

    Hi all

    This is an excellent article, thanks to Paul.
    But there are some unclear steps for me, because we have a diffent dessign:
    - All servers are Exchange 2010
    - 3 Mailboxservers (all members of one DAG): installed on VmWare ESX
    - 2 servers with the CAS and HUB trasport roll on it (both are members of one Cas arrey): installed on VmWare ESX
    So as you can see, all our Exchange servers are installed on VmWare ESX. And this is it why the recovery of a DAG member would be different.
    I know that MS disadvises DAG on VmWare. But it is our dessign now, which I am not able to change for now.

    The firest steps for recovery are logic.
    - …
    - Remove the failed server from each DB.
    - Remove the fialed server from the DAG.

    Is there a practicable way to process a recover from a VmWare snapshot or do I have to rebuild the whole server first?
    The snapshot recovery is easy but then how to remove the DB copies from that recovered server completley?

    Any answer is highly appreciate. Sine I could not find any relayable ansers on this in the Internet.

    Kind regards

    • Snapshots are not supported. You should not take a snapshot or recover from snapshot for Exchange servers.

      Whether your servers are virtualized or not makes no difference to the recovery process for a completely failed DAG member except that you could deploy the new VM from template rather than manually reinstall the OS I guess.

  6. Hi Paul,

    My server is with SP1. When we have to install SP1 on servers? I am recovering two DAG members only with Mailbox role. I have separate HUB/CAS server configured.

    Thanks,

  7. Hi Paul:

    Maybe this is an obvious question, but there is something that is not enough clear for me. I’m recovering a failed DAG member, I did exactly what the article says, with only a little difference, I still have the old files in the DB and log volumes. I’m facing some problems adding the copy of the databases in the recovered server, so, my question is: Do I need to delete the old files, I just moved them to another volume in the same server)? … Sorry for my english, I’m still learning :)

    • Yes you need to re-add the database copies to the recovered server. As far as the DAG is concerned those copies were removed when that DAG member was removed. The existing files on your volumes will cause it to fail to add the new copy, and should be removed/moved out of the way first.

  8. Hi Paul,

    Great article.
    Question….When left with one surviving DAG member, how would you remove this server from DAG safely and ensure all databases get mounted on it so that it becomes a stand alone mailbox server?

  9. Hi Paul,

    I had to demonstrate Disaster recovery of Exchange 2010 in my company in cold disaster site. I have performed Point-In-time recovery to recover all exchange servers one HUB/CAS server and two Mailbox Server in DAG. All went fine I was able to access my blank mailbox. I have faced couple of issues which is not mentioned in above steps or may be it is not required if you are recovering one DAG member. Please correct me if I am wrong.

    1. After recovering server from scratch you are missing permission on your Admin account/service account on local server. You have to add your admin account/service account and other Exchange groups to appropriate local group before you go further in re-configuring your exchange server.for example Exchange Server, Exchange Server Services groups were missing from local groups.
    2. If you have installed your exchange server in customized folder like on D:\ drive instead of using native installation folder, this recovery will not select customized folder in this case you will have mismatch in standard if you are following any in your company.
    3. You have to have all drives added to new server what was there on your old mailbox server before you run setup /m:recoverserver switch otherwise setup will fail.

    I have faced all these issues in my recovery procedure, please let me know if all these are practical issue one can face during recovery?
    Also let me know if this is the right solution for cold disaster site solution?

Leave a Comment

*

We are an Authorized DigiCert™ SSL Partner.