It’s Friday again and time for another set of tips. It’s 5 Tips for Friday! This week I want to give you some best practices for managing and administering Availability Groups.
- Don’t skip past quorum: How always on your Always On* solution is often depends on how you have configured quorum. Quorum is often overlooked, but is a very critical component in your overall solution. If you don’t get quorum right, you are headed for some late nights.
Study up on quorum at Alan Hirt’s (@SQLHA website: http://sqlha.com/category/quorum/.
- Need help? Use the right terminology: Always On is an umbrella term (marketing term) that covers a group of features. It is not a feature in itself. When you ask for help on #SQLHelp or a forum or some other medium, it can be very confusing when you use generic terms like “Always On” or “cluster”. To get the best answer, you need to provide the best information.
Terms you should use:
WSFC: Windows Server Failover Clustering. This is the Windows cluster upon which both FCIs and AGs are built.
FCI: Failover Clustering Instance. This is the SQL Server cluster that is the actual feature often referred to as a cluster that protects at the instance level and traditionally uses shared storage.
AG: Availability Group. This is the SQL Server feature that provides high availability at the database level and does not use shared storage.
- Know where you are: When you create SQL jobs for databases in an AG, you need to be aware that the job will likely need to exist on all nodes, and the safest way to handle that is by including a check to see that the job is on the right server you want it to run on (i.e., primary or secondary). You can easily verify that for a database by querying the AG DMVs.
Declare @databaseName sysname = N'Database name'; SELECT IsPrimary = Case ags.primary_replica When @@SERVERNAME Then 'primary' Else 'secondary' End FROM sys.databases AS d INNER JOIN sys.availability_replicas AS ar ON d.replica_id = ar.replica_id INNER JOIN sys.dm_hadr_availability_group_states AS ags ON ar.group_id = ags.group_id WHERE d.name = @DatabaseName;
- Watch that lag: A quick way to see how far behind the secondary partners for a database are is to query the DMVs on the primary node.
Declare @databaseName sysname = N'Database name'; WITH ReplicaDbInfo AS(SELECT ar.replica_server_name AS serverName, hars.role_desc AS roleType, hdrs.last_commit_time AS lastCommitTime FROM sys.dm_hadr_database_replica_states AS hdrs INNER JOIN sys.databases AS d ON hdrs.database_id = d.database_id INNER JOIN sys.availability_replicas AS ar ON hdrs.replica_id = ar.replica_id INNER JOIN sys.dm_hadr_availability_replica_states AS hars ON hdrs.replica_id = hars.replica_id WHERE d.name = @databaseName), commitTime AS(SELECT s.serverName As SecondaryServer, p.lastCommitTime AS primaryCommitTime, s.lastCommitTime AS secondaryCommitTime FROM ReplicaDbInfo AS p CROSS JOIN ReplicaDbInfo AS s WHERE p.roleType = 'PRIMARY' And s.roleType = 'SECONDARY' ) SELECT SecondaryServer, Lag_ms = ABS(DATEDIFF(ms, secondaryCommitTime, primaryCommitTime)) FROM commitTime;
- There are no timeouts in this dojo, is there?: No sensei! If you find that you experience disconnections of your AGs (timeouts) due to network blips or latency, you can make your cluster network configuration more resilient. My team makes the following adjustments to the WSFC cluster (via PowerShell) to make it more resilient:
$cluster = get-cluster $cluster.CrossSubnetDelay = 4000 $cluster.CrossSubnetThreshold = 10 $cluster.SameSubnetDelay = 2000 $cluster.SameSubnetThreshold = 10 $cluster.RouteHistoryLength = 20 get-cluster | fl *Subnet*, RouteHistoryLength
* AlwaysOn is now Always On as of SQL Server 2016 (I think). Though I won’t be mad at you if you use either one.