Tuesday, March 05, 2019

Backup of an Azure SQL DB downloaded outside of Azure

From a dev colleague: “Asking for a client... do you know how to get a backup of an Azure SQL DB downloaded outside of Azure?”

Short answer - You can’t…

Medium answer - … and you don’t really need to for anything related to DR or HA. Backups are automagically handled in Azure, Azure SQL DB georeplication can be set up in a few button clicks. Instead of thinking of Azure SQL DB as a traditional SQL database in a server, think of it as a platform, consider spinning up a stand-alone copy of the production Azure SQL DB in a cheaper tier to set as pre-production. It’d be a better, truer dev/test/qa environment as a result.

Long answers – Okay, so you really want to get the Azure SQL DB to an on-prem SQL Server instance. Sigh.

  1. You can manually script it out into CREATE and INSERT T-SQL statements – schema and data – with various tools, including a compare with SSDT, or an officially supported free crossplatform scripter that is Python-based. Keep in mind this might be a huge file and a lengthy manual process with a large database.
  1. You could also use the “Export” button on the Azure portal for the Azure SQL DB to create a BACPAC on an Azure storage account. The file is locked with a username/password which you specify at the time of export.  It will take a few minutes, even for a small database. Once it's done, you can download the .bacpac file from the blob location with Azure Storage Explorer or something similar, and then import the .bacpac. Keep in mind that this might also be a huge file, and a lengthy manual process with a large database.



Friday, February 01, 2019

"A connection timeout has occurred on a previously established connection to availability replica"

I previously spent some time troubleshooting this issue at one client, and then having encountered it twice more this year, I figured I'd include it in a blog post. Yep, the fix, which is delivered in a CU for SQL 2012, 2014, or 2016 does fix the issue.
Message 35201: A connection timeout has occurred while attempting to establish a connection to availability replica 'replicaname' with id [availability_group_id]. Either a networking or firewall issue exists, or the endpoint address provided for the replica is not the database mirroring endpoint of the host server instance. 
Message 35206: A connection timeout has occurred on a previously established connection to availability replica 'replicaname' with id [availability_group_id]. Either a networking or a firewall issue exists or the availability replica has transitioned to the resolving role.
If you are troubleshooting the above errors, make sure you are on one of these versions (or later)*
  • SQL Server 2016 RTM CU5 or SP1 CU1
  • SQL Server 2014 SP2 CU4
  • SQL Server 2012 SP3 CU7 
*This patch was out before SQL Server 2017 was released, SQL 2017 is not susceptible.

This issue was very problematic because as databases stopped synchronizing, the log files on the primary replica continued to grow. This would eventually create an outage once the volume filled to capacity. Despite all our best efforts, like the KB article says, there's no fix other than rebooting the secondary or removing/recreating the replica. Obviously rebooting/removing the secondary replica doesn't necessarily impact production, but it does impact high availability.

I have encountered this error in multiple environments, once in an Availability Group with 50+ databases, and another also with just 3 databases, one of which had constant high-transactional volume. According to the KB article, "This problem might occur only on very powerful computers and when SQL Server is very busy. For example, in one scenario, this problem occurred on a very busy system with 24 cores."

Tuesday, December 18, 2018

Actual Emails: What is a "soft delete"?

What's a "soft delete", and why should DBAs be aware of such behavior in tables?

So, a DELETE statement is a “hard” delete. The data is gone.

However, it is a common practice to build into tables a set of auditing fields…

For example:
CREATE TABLE dbo.whatever(
Id int identity(1,1) not null primary key,
WhateverInt int not null,
…
Createddate datetimeoffset(0) not null,
Createdby varchar(250) not null,
Modifieddate datetimeoffset(0) null,
Modifiedby varchar(250) not null,
IsActive bit not null CONSTRAINT DF_whatever_IsActive
DEFAULT (1)
)
The IsActive field, or anything similarly named like IsArchived or IsDeleted, is a common strategy to “soft” delete something and remove it from resultsets, but still retain history. All the queries on the table would then use WHERE IsActive = 1 to make sure they only viewed Active data, for example.

So, if you UPDATE a record to IsActive = 0 instead of DELETEing it, it disappears from queries and reports and screens, but it’s still there in case it’s useful. There may be some queries that intentionally want to query IsActive = 0 for historical or auditing purposes. 

Often, a filtered nonclustered index (introduced in SQL 2008) can be used to match the WHERE IsActive = 1, and then the index is smaller as a result and allows SQL Server to dramatically increase performance here. This is really only effective if a majority of the data has actually been soft-deleted.

For example: 
CREATE NONCLUSTERED INDEX IDX_NC_whatever_WhateverInt
ON dbo.whatever (WhateverInt) 
WHERE IsActive = 1
We had one client with an IsArchived flag (or similar) on a table with tens of millions of rows. Only 1% of the data was “active” and waiting to be archived. Their queries specified WHERE IsArchived=0, but still performed very poorly – lots of data still to be accessed. We added and modified some nonclustered indexes as filtered indexes to account for IsArchived = 0, and not only did this greatly reduce the size the indexes, but the query result time went from minutes to instantaneous. Huge difference. We could also have added similarly-filtered nonclustered columnstore indexes.


DBAs need to understand table design and recognize "soft delete" keys and the associated business logic in applications, because the Missing Indexes feature and other common query tuning tools won't include them in recommendations for filtering.

Furthermore, we can even enforce filtered uniqueness using a unique nonclustered index. In this way, we could make sure that there is only one active record for a given key set, but allow many inactive records for the same key set!