Thursday, October 31, 2019

Availability Groups: What to do without MultiSubNetFailover = True

I received an email from a client who is having issues with third-party applications connecting to their three-subnet SQL Server Availability Group. After an exchange with Microsoft Support, they discovered that the applications weren't specifying MultiSubNetFailover = True in their connection strings. As a result, because RegisterAllProvidersIP = 1 in the cluster, connections were randomly experiencing high latency upon connecting, as client-side DNS queries over time had a 66% chance of returning the wrong IP from the listener.

They set RegisterAllProvidersIP = 0, but before you take that as advice keep reading. That fixed the connectivity latency problem for now. I'm not sure whether the application connection strings can't or won't be changed to include MultiSubNetFailover = True. This decision was made either because of vendor limitations and/or because of vendor reliance on old data providers. Here was my guidance regarding Microsoft's recommendation to specify RegisterAllProvidersIP = 0 and lower the HostrecordTTL to 120 (seconds):

I guess this is good quick fix guidance but the problem really is on the application side. Assuming those connection strings or providers can’t change, sure. But you lose the ability to failover fast and you will have increased load on DNS servers.
With RegisterAllProvidersIP disabled on the client access point (the cluster network), only one IP address is made available for the listener: the IP of the Listener in the primary replica's subnet. When RegisterAllProvidersIP is enabled, all site IPs for the listener are simultaneously listed. Connection strings using MultiSubnetFailover = True will try all IPs simultaneously and use the one that responds, providing for the fastest possible transition after an availability group failover. That's desirable.
With RegisterAllProvidersIP = 1, caveat remains that any connection strings that aren't using or can't use MultiSubnetFailover will have connection problems. So RegisterAllProvidersIP should be enabled only when MultiSubnetFailover=True can be used in all application connection strings.

With RegisterAllProvidersIP=0, failovers may be problematic. HostrecordTTL = 120 is 2 minutes. Availability Group failovers will result in up to a 2 minute outage, after which time the client OS will query DNS for new information. With RegisterAllProvidersIP =1, failovers can be more or less instantaneous, or at least not hampered by DNS. 

Also by lowering the HostrecordTTL, traffic to and constant load on your DNS server(s) will increase tenfold, as the default is 20 minutes.  What MS has recommended is a short-term solution but it’s not scalable obviously. It’s not getting to the real source of problem – the connection strings. 

It would be much better if we got the third-party applications to change their connection strings to use MultiSubnetFailover = True, and then you could enable RegisterAllProvidersIP =1. This has been around since 2012, it should be available if they are using any modern data provider. If the third party app is still using OLEDB, there is a new data provider for MSOLEDBSQL released in 2018 that works for all previous MS OLE DB implementations. The new ODBC Driver 17 supports Multisubnetfailover, if they’re using ODBC. If the vendor apps are still using something old like SQLNCLI10 (from 2008), or something non-Microsoft, check for newer versions. More info here: https://docs.microsoft.com/en-us/sql/database-engine/availability-groups/windows/always-on-client-connectivity-sql-server?view=sql-server-ver15

No comments: