Exchange Server 2013 Database Availability Groups

The high availability feature for Exchange Server 2013 Mailbox servers is the Database Availability Group. Exchange 2013 Database Availability Groups (DAGs) are very similar to Exchange 2010 DAGs, but also deliver a series of improvements and new features for customers. In this series of articles we will walk through an overview of Database Availability Group concepts, demonstrate how to deploy a new Database Availability Group, and explore some of the operational tasks associated with running and maintaining a DAG. See also:

Overview of Exchange Server 2013 Database Availability Groups

A Database Availability Group consists of up to 16 Exchange 2013 Mailbox servers, and optionally one or more additional non-Exchange servers that may be required to act as a File Share Witness (more on this shortly). The Mailbox servers within a DAG are capable of hosting a copy of a mailbox database from another DAG member; up to the Exchange 2013 limit of 100 mailbox databases per server (that includes both active and passive database copies). A simple example of a Database Availability Group would be as follows.

Exchange 2013 Database Availability Group Simple Example

A simple example of an Exchange 2013 Database Availability Group

In the example above the server EXMB1 hosts the active copy of database DB1, and the other DAG members EXMB2 and EXMB3 host passive copies of the database. The DAG members work together to maintain the availability of the mailbox database. If the server that hosts the active database copy experiences a problem, for example a hardware failure, one of the remaining DAG members is able (under the right conditions) to make it’s copy of the database active so clients are still able to connect to their mailbox data.

Exchange 2013 DAG member down

DAG member EXMB1 has failed causing database to become active on EXMB2

A Mailbox server that is a member of a DAG can host a mixture of active and passive database copies for which it participates in replication. Whether a given database is active or passive on a particular DAG member is independent of the active/passive status of other databases that are also hosted on that DAG member.

Exchange 2013 multiple databases in a DAG

Multiple databases within an Exchange 2013 DAG

In the above example a DAG with three members and three mailbox databases is shown with the active database copies evenly distributed across the available DAG members.

Continuous Replication in Exchange Server 2013 Database Availability Groups

Each DAG member hosting a copy of a given mailbox database participates in a process of continuous replication to keep the copies consistent. Database replication occurs between Exchange Server 2013 DAG members using two different methods:

File Mode replication – each transaction log is fully written (a 1MB log file) and then then copied from the DAG member hosting the active database copy to each DAG member that host a passive database copy of that database.

The other DAG members then replay the transaction log file into their own passive copy of the database to update it. File mode replication has an obvious downside in that a transaction log that hasn’t already been copied to the other DAG members may be lost if the DAG member hosting the active database copy becomes unavailable. Although there are other recovery mechanisms to minimise the impact of this scenario, this is a reason why file mode replication is used only during the initial seeding of a database copy.

After seeding is complete the database switches automatically to block mode replication.

Block mode replication – as each database transaction is written to the log buffer on the active server and also sent to the log buffer of DAG members hosting passive copies of the database. As the log buffer becomes full member of the DAG is then able to build their own transaction log file for replay into their passive database copy. Block mode replication has advantages compared to file mode replication when there is a failure in the DAG, because less transaction log data is likely to be lost.

Quorum for Exchange Server 2013 Database Availability Groups

An Exchange 2013 DAG utilizes Windows Failover Clustering and the quorum model. This underlying cluster is managed automatically for you by Exchange, so you don’t need to worry about it much other than to be aware of how quorum works. If the concept of quorum is new to you just think of it as a voting process in which a majority of voting members must be present to make a decision. The decision in the case of a DAG is basically whether the DAG should be online of offline. Because a majority of votes is required for quorum there are two different quorum models used depending on how many DAG members you have. For a DAG with an odd number of members the Node Majority quorum mode is used.

Exchange 2013 DAG quorum example

Impact of failures in Exchange 2013 DAG using Node Majority quorum mode

In the above example a three member DAG is able to maintain quorum during a single server failure, but quorum is lost when two servers are unavailable. For a DAG with an even number of members the Node and File Share Majority quorum mode is used. This mode involves an additional server referred to as the File Share Witness. It is typically another Exchange server located in the same site as the DAG members.

Exchange 2013 DAG quorum example

Impact of failures in Exchange 2013 DAG using Node and File Share Majority quorum mode

In the above example a four member DAG is using an additional server as the File Share Witness (FSW). The DAG is able to maintain quorum with up to two server failures, but quorum is lost when three servers are down.

DAGs deployed on Windows Server 2012 can be more resilient to multiple node failures thanks to a new feature called dynamic quorum. For more information see Improving Resilience of Exchange Server 2013 Database Availability Groups with Windows Server 2012 Cluster Dynamic Quorum

Database Availability Networks

A DAG network refers to a collection of one or more IP subnets that the DAG members are connected to and are used for client and replication traffic.

Exchange 2013 DAG with a single network

Exchange 2013 DAG with a single network

Every DAG has one network for client traffic, and then it can also optionally have a number of networks dedicated to replication traffic.

Exchange 2013 DAG with multiple networks

Exchange 2013 DAG with multiple networks

Dedicated replication networks can help reduce bandwidth utilization on the client-facing network which may prevent network-related performance issues for the clients.

Exchange Server 2013 will attempt to auto-configure DAG networks but may not be able to if the network adapter configurations are not correct. For more info see Misconfigured Subnets Appear in Exchange Server 2013 DAG Network

High Availability and Site Resilience

Exchange Server 2013 Database Availability Groups can be deployed to provide both high availability and site resilience. A DAG deployed for high availability will typically exist within a single Active Directory Site, or datacenter.

Exchange 2013 DAG High Availability

Exchange 2013 DAG in a single datacenter

A DAG deployed for site resilience will span multiple datacenters. The objectives of a Database Availability Group deployed for site resilience are usually to provide availability of mailbox services after the complete failure of the primary datacenter. In other words, a true disaster.

Exchange 2013 DAG in multiple datacenters

Exchange 2013 DAG in multiple datacenters

As such there are a lot more technical and business considerations for a site resilient Database Availability Group. There is also less automation and more administrator attention required for a full site failover scenario. For the purposes of this article series we’ll be focusing on Database Availability Groups deployed within a single datacenter for high availability.

Installing an Exchange Server 2013 Database Availability Group

The next article in this series will begin demonstrating the deployment of a Database Availability Group in Exchange Server 2013.

About Paul Cunningham

Paul is a Microsoft Exchange Server MVP and publisher of Exchange Server Pro. He also holds several Microsoft certifications including for Exchange Server 2007, 2010 and 2013. Find Paul on Twitter, LinkedIn or Google+, or get in touch for consulting/support engagements.

Comments

  1. Article is cool & Thanks!

    Do you want to elaborate more on below line….?

    - After seeding is complete the database switches automatically to block mode replication.

  2. Hi Paul, as Exchange DAG utilizes Windows Failover Clustering and the quorum model, how well new feature of windows 2012 “Failover Clustering Dynamic Quorum” translates into managing Exchange DAG. Does that mean Exchange DAG can now now survive even if number of votes goes below minimum?

  3. Santhosh Sivaraman says:

    Hi Paul,

    If I am installing Exchange Server 2013 across my two different data center (diff AD Site) as mentioned below, does the data center failover will happens automatically.

    Primary Datacenter
    ***********************
    Two collocated CAS& Mailbox Servers in Single DAG named DAG-1
    Baracuda HW Load Balancer for Load Balacncing Client Access Traffic.

    Secondary Datacenter
    **************************
    Single Collocated CAS & Mailbox Server as a part of Primary Data Center DAG “DAG-1″.

    Your advice is requierd on this.

    Thanks,
    Santhosh

  4. Hi Paul, I noted your statement that the FSW must be in the same site as a DAG. I’ve also read another article stating if your going to have site resilience across 2 locations, you can place the FSW in a 3rd site to provide quorum in case either site fails… Is that accurate?

    • I can’t see where I said it must be in the same site. I did say it is *typically* within the same site.

      Exchange 2013 supports FSW in a third location for multi-site DAGs but that introduces additional considerations around ensuring quorum in the various network failure scenarios.

  5. kiko lopez says:

    Are we able to utilize the 2010 DAG with 2013? or do we need to create a different one? How about the cluster IP for load balancing we use in 2010 Hub/CAS servers? Need to figure this our before I break something….

    Thank You

  6. Hi, what is a normal failover time? ( We are running 4 multirole servers and HLB )
    If we do a database switchover the outlook is disconnected for about 4 secs.
    If we do a server restart so all databases currently mounted on that server mounts up on other servers the outlook client is disconnected for about 2 minutes.

    We can see that the database our test client is hosted on gets mounted after 5-6 seconds on another serer but it takes roughly about another 2 minutes before the outlook clients gets connected again.

    If we test to restart outlook after 30 sec, it takes about one more minute before its connected again.

    Even though we proxy the request (using hostfile to pin-point a connection to srv04) and initiate a serverfailure on srv01 so the databases mounts on srv02, the clients takes about 2 minutes to re-connect to the “database” since it’s still trying via srv04.

    hlb is running ssl offload and on our exchange servers we have “allow ssl offload” checked. we have tried without this setup but no luck there.

    any hints? comments?

    • The client connection is proxied through the Client Access server role. You haven’t described where your Client Access servers sit in your topology, but I assume you have installed multi-role servers and are using a hardware load balancer, meaning the clients are connecting to the load balancer, which then connects to one of the available pool of Client Access servers.

      So you have multiple points to consider here.

      1) the database failover/switchover speed. You are reading logs that say the failover takes just a few seconds. That is fine, but I would hope that you are doing a managed switchover of the server using maintenance mode, not just restarting it and letting the DAG failover.

      2) the Client Access servers. If you’ve got multi-role servers and you’re restarting the server that a connection happens to be going through, then some end user impact is probably going to be natural.

      3) The load balancer. If you are not draining the Client Access server out of the pool before restarting it, then its possible the load balancer is taking those few minutes to timeout for the restarted server before it sends client connections to a different Client Access server (depending on how you’ve configure health monitoring in your server pool)

      So to summarise, if you’re just restarting the server and hoping for a seamless client experience then I think your expectations are a bit off. However if you are managing maintenance mode properly, and removing the server from the load balancer pool properly, then I would expect things to run smoother than that.

      • Hi, thx for replying. Yes correct, one site with multirole servers and a HLB.
        The customer want to initiate a server failure by powering down the server. thats why this is not a normal controlled “switch” over.

        The HLB removes the server if it cannot access the port 443 (health check) for 15 sec. then its out of that pool for 360 sec. then a new health check is performed.

        So this 2 minutes might be normal if we initiate a server reboot? you said “a bit off” is that kind of too long?

  7. Adam Aladdin says:

    I have client for 3398 mailboxes for how many mailboxes DAG I need to propose to them?

    • I think what you’re asking me is how you should plan the database layout for that customer. For that you should use the Exchange 2013 server sizing calculator that Microsoft provides.

  8. Dear Brother,

    We just did Exchange Migration from 2007 to 2013. In 2013 we have two Mail Box Servers and Two Client Access Server. And we configured high availability between two MB and TWO CA servers.

    But Now we are facing two issues with MB Servers

    1) Both mailbox servers are restarting frequently in every one or two days
    2) During mailbox server 1 offline, outlook disconnected from exchange (no high availability)

    Can you please advice me to resolve this issue as fast as possible.

    Thank you,
    Mohamed Shuaib

  9. Ex2013 cu2 gave max 100DB per server not 50DB as previous.

  10. Hi Paul,

    Couldn’t find an appropriate place for the following question so I thought why not post it on the first topic.
    I have the exact same setup ( 2 exchange servers running cas and mailbox roles and 1 dc which I use as witness ). Also, i do have just like you 2 networks, 1 for the mapi and 1 for the heartbeat (replication). However, I never specified which network it should use for replication… Where can I see which network it uses and where can I configure it.. This is I think the last missing piece of the puzle for me so your help will be much appreciated.
    In 1 sentence: where to configure exchange DAG to use the replication network and NOT the mapi network?
    Ps.: I gave the cluster an ip address of the mapi network as adviced, is this where I went wrong?

  11. Rob Derbyshire says:

    We currently have 4 x Exchange Server 2013 SP1 Mailbox Servers forming in 1 DAG, all 4 are on Windows Server 2012.
    For many reasons, we are considering building our Exchange Server 2013 Infrastructure up from Scratch on Windows Server 2012 R2
    We want to try and do this now as we have only migrated about 10% of our Mailboxes, so doing it now would minimize the impact to the Business.
    My question is, that to make the process as simple as possible, we would like to introduce 4 new Windows Server 2012 R2 Servers (built with either E2013 SP1 or CU5) to the existing DAG, instead of creating a new DAG. This would only be temporary whilst database copies are created and then the Server 2012 members would then be removed from the DAG. Is this possible? I understand that it might not be a supported platform, but could it be used to serve its purpose on a temporary basis

    • DAGs use an underlying Windows Failover Cluster. All members of a cluster must be running the same version of Windows. It is not possible to mix versions of Windows in a cluster. Therefore, it is not possible to mix versions of Windows in a DAG.

      You will need to deploy a new, separate DAG to achieve what you are trying to achieve.

  12. Gilberto Silva says:

    Please , I´d like know if is possible have 4 mailbox server with 5 databases and enable DAG for Exchange Server 2013 STD or I need buy Exchange Server 2013 Enterprise?

    Thank´s

    Gilberto Silva.

Leave a Comment

*

We are an Authorized DigiCert™ SSL Partner.