Exchange Server 2013 Database Availability Groups

The high availability feature for Exchange Server 2013 Mailbox servers is the Database Availability Group. Exchange 2013 Database Availability Groups (DAGs) are very similar to Exchange 2010 DAGs, but also deliver a series of improvements and new features for customers. In this series of articles we will walk through an overview of Database Availability Group concepts, demonstrate how to deploy a new Database Availability Group, and explore some of the operational tasks associated with running and maintaining a DAG. See also:

Overview of Exchange Server 2013 Database Availability Groups

A Database Availability Group consists of up to 16 Exchange 2013 Mailbox servers, and optionally one or more additional non-Exchange servers that may be required to act as a File Share Witness (more on this shortly). The Mailbox servers within a DAG are capable of hosting a copy of a mailbox database from another DAG member; up to the Exchange 2013 limit of 100 mailbox databases per server (that includes both active and passive database copies). A simple example of a Database Availability Group would be as follows.

Exchange 2013 Database Availability Group Simple Example
A simple example of an Exchange 2013 Database Availability Group

In the example above the server EXMB1 hosts the active copy of database DB1, and the other DAG members EXMB2 and EXMB3 host passive copies of the database. The DAG members work together to maintain the availability of the mailbox database. If the server that hosts the active database copy experiences a problem, for example a hardware failure, one of the remaining DAG members is able (under the right conditions) to make it’s copy of the database active so clients are still able to connect to their mailbox data.

Exchange 2013 DAG member down
DAG member EXMB1 has failed causing database to become active on EXMB2

A Mailbox server that is a member of a DAG can host a mixture of active and passive database copies for which it participates in replication. Whether a given database is active or passive on a particular DAG member is independent of the active/passive status of other databases that are also hosted on that DAG member.

Exchange 2013 multiple databases in a DAG
Multiple databases within an Exchange 2013 DAG

In the above example a DAG with three members and three mailbox databases is shown with the active database copies evenly distributed across the available DAG members.

Continuous Replication in Exchange Server 2013 Database Availability Groups

Each DAG member hosting a copy of a given mailbox database participates in a process of continuous replication to keep the copies consistent. Database replication occurs between Exchange Server 2013 DAG members using two different methods:

File Mode replication – each transaction log is fully written (a 1MB log file) and then then copied from the DAG member hosting the active database copy to each DAG member that host a passive database copy of that database.

The other DAG members then replay the transaction log file into their own passive copy of the database to update it. File mode replication has an obvious downside in that a transaction log that hasn’t already been copied to the other DAG members may be lost if the DAG member hosting the active database copy becomes unavailable. Although there are other recovery mechanisms to minimise the impact of this scenario, this is a reason why file mode replication is used only during the initial seeding of a database copy.

After seeding is complete the database switches automatically to block mode replication.

Block mode replication – as each database transaction is written to the log buffer on the active server and also sent to the log buffer of DAG members hosting passive copies of the database. As the log buffer becomes full member of the DAG is then able to build their own transaction log file for replay into their passive database copy. Block mode replication has advantages compared to file mode replication when there is a failure in the DAG, because less transaction log data is likely to be lost.

Quorum for Exchange Server 2013 Database Availability Groups

An Exchange 2013 DAG utilizes Windows Failover Clustering and the quorum model. This underlying cluster is managed automatically for you by Exchange, so you don’t need to worry about it much other than to be aware of how quorum works. If the concept of quorum is new to you just think of it as a voting process in which a majority of voting members must be present to make a decision. The decision in the case of a DAG is basically whether the DAG should be online of offline. Because a majority of votes is required for quorum there are two different quorum models used depending on how many DAG members you have. For a DAG with an odd number of members the Node Majority quorum mode is used.

Exchange 2013 DAG quorum example
Impact of failures in Exchange 2013 DAG using Node Majority quorum mode

In the above example a three member DAG is able to maintain quorum during a single server failure, but quorum is lost when two servers are unavailable. For a DAG with an even number of members the Node and File Share Majority quorum mode is used. This mode involves an additional server referred to as the File Share Witness. It is typically another Exchange server located in the same site as the DAG members.

Exchange 2013 DAG quorum example
Impact of failures in Exchange 2013 DAG using Node and File Share Majority quorum mode

In the above example a four member DAG is using an additional server as the File Share Witness (FSW). The DAG is able to maintain quorum with up to two server failures, but quorum is lost when three servers are down.

DAGs deployed on Windows Server 2012 can be more resilient to multiple node failures thanks to a new feature called dynamic quorum. For more information see Improving Resilience of Exchange Server 2013 Database Availability Groups with Windows Server 2012 Cluster Dynamic Quorum

Database Availability Networks

A DAG network refers to a collection of one or more IP subnets that the DAG members are connected to and are used for client and replication traffic.

Exchange 2013 DAG with a single network
Exchange 2013 DAG with a single network

Every DAG has one network for client traffic, and then it can also optionally have a number of networks dedicated to replication traffic.

Exchange 2013 DAG with multiple networks
Exchange 2013 DAG with multiple networks

Dedicated replication networks can help reduce bandwidth utilization on the client-facing network which may prevent network-related performance issues for the clients.

Exchange Server 2013 will attempt to auto-configure DAG networks but may not be able to if the network adapter configurations are not correct. For more info see Misconfigured Subnets Appear in Exchange Server 2013 DAG Network

High Availability and Site Resilience

Exchange Server 2013 Database Availability Groups can be deployed to provide both high availability and site resilience. A DAG deployed for high availability will typically exist within a single Active Directory Site, or datacenter.

Exchange 2013 DAG High Availability
Exchange 2013 DAG in a single datacenter

A DAG deployed for site resilience will span multiple datacenters. The objectives of a Database Availability Group deployed for site resilience are usually to provide availability of mailbox services after the complete failure of the primary datacenter. In other words, a true disaster.

Exchange 2013 DAG in multiple datacenters
Exchange 2013 DAG in multiple datacenters

As such there are a lot more technical and business considerations for a site resilient Database Availability Group. There is also less automation and more administrator attention required for a full site failover scenario. For the purposes of this article series we’ll be focusing on Database Availability Groups deployed within a single datacenter for high availability.

Installing an Exchange Server 2013 Database Availability Group

The next article in this series will begin demonstrating the deployment of a Database Availability Group in Exchange Server 2013.

Comments

  1. AB says

    Hi Paul, as Exchange DAG utilizes Windows Failover Clustering and the quorum model, how well new feature of windows 2012 “Failover Clustering Dynamic Quorum” translates into managing Exchange DAG. Does that mean Exchange DAG can now now survive even if number of votes goes below minimum?

  2. Santhosh Sivaraman says

    Hi Paul,

    If I am installing Exchange Server 2013 across my two different data center (diff AD Site) as mentioned below, does the data center failover will happens automatically.

    Primary Datacenter
    ***********************
    Two collocated CAS& Mailbox Servers in Single DAG named DAG-1
    Baracuda HW Load Balancer for Load Balacncing Client Access Traffic.

    Secondary Datacenter
    **************************
    Single Collocated CAS & Mailbox Server as a part of Primary Data Center DAG “DAG-1”.

    Your advice is requierd on this.

    Thanks,
    Santhosh

  3. JR says

    Hi Paul, I noted your statement that the FSW must be in the same site as a DAG. I’ve also read another article stating if your going to have site resilience across 2 locations, you can place the FSW in a 3rd site to provide quorum in case either site fails… Is that accurate?

    • says

      I can’t see where I said it must be in the same site. I did say it is *typically* within the same site.

      Exchange 2013 supports FSW in a third location for multi-site DAGs but that introduces additional considerations around ensuring quorum in the various network failure scenarios.

  4. kiko lopez says

    Are we able to utilize the 2010 DAG with 2013? or do we need to create a different one? How about the cluster IP for load balancing we use in 2010 Hub/CAS servers? Need to figure this our before I break something….

    Thank You

  5. Fredrik says

    Hi, what is a normal failover time? ( We are running 4 multirole servers and HLB )
    If we do a database switchover the outlook is disconnected for about 4 secs.
    If we do a server restart so all databases currently mounted on that server mounts up on other servers the outlook client is disconnected for about 2 minutes.

    We can see that the database our test client is hosted on gets mounted after 5-6 seconds on another serer but it takes roughly about another 2 minutes before the outlook clients gets connected again.

    If we test to restart outlook after 30 sec, it takes about one more minute before its connected again.

    Even though we proxy the request (using hostfile to pin-point a connection to srv04) and initiate a serverfailure on srv01 so the databases mounts on srv02, the clients takes about 2 minutes to re-connect to the “database” since it’s still trying via srv04.

    hlb is running ssl offload and on our exchange servers we have “allow ssl offload” checked. we have tried without this setup but no luck there.

    any hints? comments?

    • says

      The client connection is proxied through the Client Access server role. You haven’t described where your Client Access servers sit in your topology, but I assume you have installed multi-role servers and are using a hardware load balancer, meaning the clients are connecting to the load balancer, which then connects to one of the available pool of Client Access servers.

      So you have multiple points to consider here.

      1) the database failover/switchover speed. You are reading logs that say the failover takes just a few seconds. That is fine, but I would hope that you are doing a managed switchover of the server using maintenance mode, not just restarting it and letting the DAG failover.

      2) the Client Access servers. If you’ve got multi-role servers and you’re restarting the server that a connection happens to be going through, then some end user impact is probably going to be natural.

      3) The load balancer. If you are not draining the Client Access server out of the pool before restarting it, then its possible the load balancer is taking those few minutes to timeout for the restarted server before it sends client connections to a different Client Access server (depending on how you’ve configure health monitoring in your server pool)

      So to summarise, if you’re just restarting the server and hoping for a seamless client experience then I think your expectations are a bit off. However if you are managing maintenance mode properly, and removing the server from the load balancer pool properly, then I would expect things to run smoother than that.

      • Fredrik says

        Hi, thx for replying. Yes correct, one site with multirole servers and a HLB.
        The customer want to initiate a server failure by powering down the server. thats why this is not a normal controlled “switch” over.

        The HLB removes the server if it cannot access the port 443 (health check) for 15 sec. then its out of that pool for 360 sec. then a new health check is performed.

        So this 2 minutes might be normal if we initiate a server reboot? you said “a bit off” is that kind of too long?

  6. says

    Dear Brother,

    We just did Exchange Migration from 2007 to 2013. In 2013 we have two Mail Box Servers and Two Client Access Server. And we configured high availability between two MB and TWO CA servers.

    But Now we are facing two issues with MB Servers

    1) Both mailbox servers are restarting frequently in every one or two days
    2) During mailbox server 1 offline, outlook disconnected from exchange (no high availability)

    Can you please advice me to resolve this issue as fast as possible.

    Thank you,
    Mohamed Shuaib

  7. Andre says

    Hi Paul,

    Couldn’t find an appropriate place for the following question so I thought why not post it on the first topic.
    I have the exact same setup ( 2 exchange servers running cas and mailbox roles and 1 dc which I use as witness ). Also, i do have just like you 2 networks, 1 for the mapi and 1 for the heartbeat (replication). However, I never specified which network it should use for replication… Where can I see which network it uses and where can I configure it.. This is I think the last missing piece of the puzle for me so your help will be much appreciated.
    In 1 sentence: where to configure exchange DAG to use the replication network and NOT the mapi network?
    Ps.: I gave the cluster an ip address of the mapi network as adviced, is this where I went wrong?

  8. Rob Derbyshire says

    We currently have 4 x Exchange Server 2013 SP1 Mailbox Servers forming in 1 DAG, all 4 are on Windows Server 2012.
    For many reasons, we are considering building our Exchange Server 2013 Infrastructure up from Scratch on Windows Server 2012 R2
    We want to try and do this now as we have only migrated about 10% of our Mailboxes, so doing it now would minimize the impact to the Business.
    My question is, that to make the process as simple as possible, we would like to introduce 4 new Windows Server 2012 R2 Servers (built with either E2013 SP1 or CU5) to the existing DAG, instead of creating a new DAG. This would only be temporary whilst database copies are created and then the Server 2012 members would then be removed from the DAG. Is this possible? I understand that it might not be a supported platform, but could it be used to serve its purpose on a temporary basis

    • says

      DAGs use an underlying Windows Failover Cluster. All members of a cluster must be running the same version of Windows. It is not possible to mix versions of Windows in a cluster. Therefore, it is not possible to mix versions of Windows in a DAG.

      You will need to deploy a new, separate DAG to achieve what you are trying to achieve.

  9. Gilberto Silva says

    Please , I´d like know if is possible have 4 mailbox server with 5 databases and enable DAG for Exchange Server 2013 STD or I need buy Exchange Server 2013 Enterprise?

    Thank´s

    Gilberto Silva.

  10. Natesh says

    Hi Paul,

    I’am looking more details on “Exchange 2013 DAG in multiple datacenters in a singal AD site across the WAN”

    what are the network consideration to Build DAG across WAN between US and India?
    I have 2 MBX at US and 2 MBX in India with 2 nic’s each.
    What type of networks to be configured between both the DC?

    Do we need MPLS with data?
    VPN?
    P2P link between both the DC?

    Many thank in advance.

    Regards
    Natesh M

  11. Edwin says

    Does anybody know if in the case of 2 datacenters on the exchange 2013 passive mailbox servers the CAS role also needs/should/recommended to be installed?

    I can’t find an answer anywhere.

    • says

      The recommendation is to deploy multi-role servers, so your servers would have both roles installed anyway if you follow that best practice.

      If you decide to split your roles, then you’d still need to deploy CAS in the second site to have a fully HA/site resilient solution. Without CAS the clients can’t connect to their mailboxes.

  12. Alex Driver says

    Hi Paul,

    I have two mailbox databases that replicate between two servers.
    Would you recommend having all mailboxes on a single database or should I split the total number of mailboxes evenly on the two databases?
    i.e.

    Name Active on server Server with copies Number of mailboxes
    Database 1 Server 1 Server 1, Server 2 50
    Database 2 Server 2 Server 2, Server 1 50

    or

    Name Active on server Server with copies Number of mailboxes
    Database 1 Server 1 Server 1, Server 2 100
    Database 2 Server 2 Server 2, Server 1 0

    Thanks Alex

    • says

      Depends on the size of the mailboxes. Smaller databases are easier to manage, backup, recover, etc etc.

      Standard Edition of Exchange gives you up to 5 mailbox databases. You can use all 5 or use fewer, whichever suits your overall data sizes.

      • Alex Driver says

        Hi Paul,

        Thanks for getting back to me.
        We are talking about 150 mailboxes, average size 1GB.
        In this scenario would you recommend putting them all on one database, or spread them across two?

        Thanks.
        Alex

  13. Laurent says

    Hi Paul,
    I’m currently building up Exchange 2013 environments and I use the principle of building blocks. Each building block is made of 4 servers (4 copies of each DB) and can host 4000 mailboxes.
    My current customer would require 4 building blocks (+- 16000 mailboxes) .
    I wonder if it would be better to create a DAG for each building block (so 4 DAGs in total) or to have only one DAG for all of them?
    Thanks
    Laurent

  14. Kenn Thomsen says

    Hi, i a DAG cluster where we contemplate server 2 in a different datacenter connected via MPLS, can you downsize the HW requirements for server2? e.g. reduced RAM and Lower performing storage ? Currently we are running Exch.2010 but could upgrade to 2013 if one or the other are better at supporting a differentiated model in performance (in case of disaster, more capacity can be provided to the virtual environment i datacenter2)

  15. Ali says

    Hi all, Thanks for this great forum. I have question please if someone can help.
    I have a DAG with 2 servers ex1 , ex2 ( Mail , Hub And CAS role ) each and a witness with a HT role.
    The witness is located in a domain controller backup. ( DCB)

    When ever I turned off the witness (DCB ) , All my clients will be disconnected. while clients are pointed to ex1. ALL EX1 , EX2 and DCB Are win 2008 r2 VM and all in one site with one domain.

    I have tried to change the witness to another server but the problem is still exist. ( I can see the new witness directory and server in the cluster failover in both ex2 , ex2 ).

    1. suggestions why this happened and how to prevent it.
    2. Any suggestions to have a good CAS ARRY solution as a software ( I don’t want to buy hardware layer as it will cost $$ ).

    Thanks and regards.

    Ali

  16. Jose Perez says

    Hi, Paul I have a question for you.
    We have two sites with 2 mailbox servers per site one dag per site and running exchange 2010,
    we want to migrate to Exchange 2013 , 2 mailbox servers per site one dag and the witness server on a third site , what will happen with the DAG if third site is down?
    Thanks,
    Jose Perez

    • says

      Your question is actually already answered in the article above. You’ve got 4 DAG members plus an FSW. Consider quorum in any failure scenario and the answer should be clear.

  17. Brooks Barnes says

    I am in the process of moving over to a 2013 SP1 environment from 2010. In 2010, because of some backup jobs, I had to increase the crosssubnetdelay and threshold to prevent a failover when my backup job was closing out. We use DAG for a DR scenario for the most part so the passive copies always stay that way unless we are in a maintenance window or something bad has happened. Anyway, my question is, is there somewhere to adjust timeout settings for failover in a 2013 “IP-less DAG”? Since there’s no DNS registration for the IP-less DAG, I get a RPC error if I try to modify the cluster settings within windows (not exchange).

  18. Rick says

    Hello Paul,

    If I want HA with CAS and a MBX DAG…..Can I do it with only 2 MBX/CAS servers or do I have to install 4 servers….2 with only CAS roles and 2 with only MBX roles?

    Thank you very much.

  19. Rick says

    Ok, after researching…this is what I´ve came out with….

    Split DNS with Round Robin for CAS balancing
    2 servers MBX/CAS in DAG
    1 FS Witness
    1 SAN for active Database
    1 SAN for Passive Database
    All over Windows 2012 R2

    How I am going, you think this will do? :)

    I have to present this to my boss…Ireally wanna buy your book, but time is eating me up.

    Many thanks in advance.

    Rick

    • says

      You don’t need SAN for Exchange, local disk/DAS will do just fine for most deployments and can be a lot cheaper and easier to deploy.

      I can’t comment on whether that solution “will do” for your environment since I know nothing about your environment or business requirements. What you describe is a pretty basic and common HA deployment though.

      • Rick says

        Excelent,

        The enviroment is very simple…

        1 multirole exchange 2007 with databases in a SAN in 1 AD site
        2 DC in that same site “A”
        1 DC in another site “B”
        1 DC in Site “C”

        And they want a solution with HA…yes you r right….is very basic, but is what we need…..as far a SAN goes….we have the two of them to use…..that why i´ve came up with that database redundancy.

        Thanks a lot for your help Paul. :)

Leave a Reply

Your email address will not be published. Required fields are marked *