Improving Resilience of Exchange Server 2013 Database Availability Groups with Windows Server 2012 Cluster Dynamic Quorum

Exchange Server 2013 can be installed on either Windows Server 2008 R2 or Windows Server 2012.

Some organizations may decide to install on Windows Server 2008 R2 because that is their standard server build and to remain consistent with the rest of their server fleet. However, doing that will mean they miss out on the new features of Windows Server 2012.

One of those new features is a cluster quorum management option known as dynamic quorum.

As TechNet explains:

When this option is enabled, the cluster dynamically manages the vote assignment to nodes, based on the state of each node. Votes are automatically removed from nodes that leave active cluster membership, and a vote is automatically assigned when a node rejoins the cluster.

With dynamic quorum management, it is also possible for a cluster to run on the last surviving cluster node. By dynamically adjusting the quorum majority requirement, the cluster can sustain sequential node shutdowns to a single node.

In an Exchange context, dynamic quorum can make database availability groups more resilient to multiple node failures.

To demonstrate this, here is what happens to an Exchange Server 2010 DAG when it suffers multiple node failures.

To begin with the DAG is healthy and all nodes and resources are online.

exchange-2010-dag-quorum-02

[PS] C:\>Get-Cluster | Get-ClusterNode

Name          State
----          -----
ho-ex2010-mb1    Up
ho-ex2010-mb2    Up

[PS] C:\>Get-MailboxDatabase | Test-MAPIConnectivity

MailboxServer      Database           Result    Error
-------------      --------           ------    -----
HO-EX2010-MB1      MB-HO-01           Success
HO-EX2010-MB1      MB-HO-02           Success
BR-EX2010-MB       MB-BR-01           Success
HO-EX2010-MB1      MB-HO-03           Success
HO-EX2010-PF       MB-HO-Archive      Success

Next, I take down the file share witness, and then a short time later one of the DAG members as well.

exchange-2010-dag-quorum-01

[PS] C:\>Get-Cluster | Get-ClusterNode

Name          State
----          -----
ho-ex2010-mb1    Up
ho-ex2010-mb2  Down

After a few moments the cluster determines that quorum has been lost, and the remaining node stops as well.

Log Name: System
Source: Microsoft-Windows-FailoverClustering
Date: 5/27/2013 8:12:22 PM
Event ID: 1177
Task Category: Quorum Manager
Level: Critical
Keywords:
User: SYSTEM
Computer: HO-EX2010-MB1.exchangeserverpro.net
Description:
The Cluster service is shutting down because quorum was lost. This could be due to the loss of network connectivity between some or all nodes in the cluster, or a failover of the witness disk.

The entire cluster is now down, taking the mailbox databases with it, even though a single DAG member was still online.

[PS] C:\>Get-MailboxDatabase | Test-MAPIConnectivity

MailboxServer      Database           Result    Error
-------------      --------           ------    -----
HO-EX2010-MB1      MB-HO-01           *FAILURE* Database is dismounted.
HO-EX2010-MB1      MB-HO-02           *FAILURE* Database is dismounted.
BR-EX2010-MB       MB-BR-01           Success
HO-EX2010-MB1      MB-HO-03           *FAILURE* Database is dismounted.
HO-EX2010-PF       MB-HO-Archive      Success

Now let’s take a look at what happens to an Exchange 2013 DAG running on Windows Server 2012 with dynamic quorum enabled (which is the default setting). This Exchange 2013 DAG in my lab happens to have more Mailbox servers as members than my Exchange 2010 DAG, but that does not impact the demonstration.

Again, to begin with the cluster resources are all healthy and online.

exchange-2013-dag-quorum-01

[PS] C:\>Get-MailboxDatabase | Test-MAPIConnectivity
Creating a new session for implicit remoting of "Get-

MailboxServer      Database           Result    Error
-------------      --------           ------    -----
E15MB1             Mailbox Database 1 Success
E15MB1             Mailbox Database 2 Success

[PS] C:\>Get-Cluster | Get-ClusterNode | Select Name,DynamicWeight,NodeWeight,Id,State

Name   DynamicWeight NodeWeight Id State
----   ------------- ---------- -- -----
E15MB1             1          1 1     Up
E15MB2             1          1 2     Up
E15MB3             1          1 3     Up

Each node currently has 1 vote (shown as DynamicWeight in the output above). Two of three votes (a majority) is required to achieve quorum, which the cluster has.

First I’ll shut down one of the DAG members. Now let’s take another look at the nodes.

[PS] C:\>Get-Cluster | Get-ClusterNode | Select Name,DynamicWeight,NodeWeight,Id,State

Name   DynamicWeight NodeWeight Id State
----   ------------- ---------- -- -----
E15MB1             0          1 1     Up
E15MB2             1          1 2     Up
E15MB3             0          1 3   Down

As this article explains, dynamic quorum kicks in and removes the vote from one of the remaining cluster nodes. Now only one node has a vote, and quorum is maintained.

If this were a Windows Server 2008 R2 cluster quorum would also be maintained, however the difference is in what happens on the next node failure.

Next I take down another DAG member. With one remaining DAG member the Exchange 2010 cluster and databases went offline. However the Exchange 2013 DAG stays online thanks to dynamic quorum.

[PS] C:\>Get-Cluster | Get-ClusterNode | Select Name,DynamicWeight,NodeWeight,Id,State

Name   DynamicWeight NodeWeight Id State
----   ------------- ---------- -- -----
E15MB1             1          1 1     Up
E15MB2             0          1 2   Down
E15MB3             0          1 3   Down

[PS] C:\>Get-MailboxDatabase | Test-MAPIConnectivity

MailboxServer      Database           Result    Error
-------------      --------           ------    -----
E15MB1             Mailbox Database 1 Success
E15MB1             Mailbox Database 2 Success

While this is only a simple demonstration it does show the potential of dynamic quorum for making Exchange 2013 database availability groups more resilient.

Although there are other failure scenarios that may still cause the DAG to go offline (eg multiple simultaneous server failures), with the right cluster design and operational procedures for managing the cluster you can achieve a good outcome.

For more on dynamic quorum in Windows Server 2012:

About Paul Cunningham

Paul is a Microsoft Exchange Server MVP and publisher of Exchange Server Pro. He also holds several Microsoft certifications including for Exchange Server 2007, 2010 and 2013. Connect with Paul on Twitter and Google+.

Comments

  1. itworkedinthelab says:

    Thanks Paul
    really interesting
    I missed that server 2012 feature somehow:)

  2. Hi Paul,
    Great article, I know I wasn’t loosing my mind when testing this in my Exchange 2013/ Windows Server 2012 POC. Everything/one was telling me I should be loosing quorum when testing the “last man standing” concept. I can now confidently explain why we don’t and we can sustain multiple node failures in a 4 member DAG with 1 FSW!

    Ryan

  3. nithyanandham says:

    Hi paul ,
    Please clarify some of my doubts regarding dynamic quorum configuration in exchange 2013 dag .

    In my lab environment i have two mailboxservers in exchange 2013 dag and i had kept my file share witness in my cas server .Both the mailboxservers and cas server are in the same ad site.

    To test the dynamic quorum ,first i had shutdown my file share witness and then i had shutdown my mailbox server (which is holding passive database copies) with two minutes gap.

    After that i went and checked the server which is holding the active database copies it shows all the databases are in mounted state and I felt happy about dynamic quorum role .But aftersometime all of the sudden it shows all the databases are in dismounted state . i dod’nt know what i had done wrong ?

  4. Is the use of the dynamic quorum model officially supported? I cannot find any kind of statement about this in the Exchange 2013 TechNet library.

    • Good question. Yes it is supported, though I can’t find a specific mention of that on TechNet. So here is the next best thing, the EXL322 session slides from TechEd Australia 2013 spell out the situation with DAGs and Dynamic Quorum nice and clearly.

      http://video.ch9.ms/sessions/teched/au/2013/EXL322.pptx

      “- Dynamic quorum does not change quorum requirements for DAGs
      - Dynamic quorum does work with DAGs
      - All internal DAG testing is performed with dynamic quorum enabled
      - Dynamic quorum is enabled in Office 365 for DAG members on Windows Server 2012
      - Exchange is not dynamic quorum-aware

      Exchange team guidance on dynamic quorum:
      - Leave it enabled for majority of DAG members
      - Don’t factor it into availability plans

      The advantage is that, in some cases where 2008 R2 would have lost quorum, 2012 can maintain quorum; this only applies to a few cases, and should not be relied upon when planning a DAG”

Leave a Comment

*

We are an Authorized DigiCert™ SSL Partner.