Thursday, May 28, 2020

SOLVED! Untrusted SAS certs after SQL startup

If you've tried doing Backup to URL with SQL Server using a Shared Access Signature (SAS) certificate and received this error:

Error: 18204, Severity: 16, State: 1.BackupDiskFile::CreateMedia: Backup device 'https://account.blob.core.windows.net/container/folder/msdb_log_202005170801.trn' failed to create. Operating system error 50(The request is not supported.).Cannot open backup device 'https://account.blob.core.windows.net/container/folder/msdb_log_202005170801.trn'. Operating system error 50(The request is not supported.). [SQLSTATE 42000] (Error 3201) BACKUP LOG is terminating abnormally. [SQLSTATE 42000] (Error 3013). NOTE: The step was retried the requested number of times (1) without succeeding. The step failed. 


You may have received the same error I encountered.

This error popped up only after startup of SQL Server. To resolve the problem, we'd recreate the SAS key, using the same cert in the same script, and the backups would start working again. This affected all types of SQL database backups.

The issue was not our syntax (note the options, the syntax here is standard, but you do need to provide MAXTRANSFERSIZE = 4194304 and BLOCKSIZE = 65536 to backup to Azure blob storage.)
BACKUP DATABASE DB_name
TO URL = 'https://account.blob.core.windows.net/container/server-name/msdb_log_202005170801.trn'
WITH COMPRESSION, CHECKSUM, FORMAT, MAXTRANSFERSIZE = 4194304, BLOCKSIZE = 65536;


The issue was not the SAS key itself, which we tried creating via Storage Explorer, with PowerShell, with various expiration dates, with shared policies, etc.

Rather, the issue was with our SAS-based cert. Here was our syntax:
CREATE CREDENTIAL [https://account.blob.core.windows.net/container/server-name]
WITH IDENTITY='Shared Access Signature'
, SECRET='sv=2019-02-02&xxxxx';
Again, the syntax was correct in the sense that backups would start working again right away. But after a reboot, the cert would not longer be available to SQL. 

On the Azure Storage side, the logged failure was:

AuthenticationErrorDetail:Signed expiry time [Fri, 01 May 2020 13:30:52 GMT] has to be after signed start time [Tue, 12 May 2020 22:30:57 GMT]
Why would the exact same SAS key expire after reboot, but then start working again when it was dropped/created? And why wouldn't ALTER'ing the key work?

Our ticket with Microsoft moved through the storage team, the SQL team, and eventually the SQL engine engineering team, who did a complete stack dump to catch the error.

The issue turned out to be related to the hyphen in our credential creation above, note I included "server-name" because our server names have hyphens in them. Turns out, this is unnecessary and caused SQL to choke on trusting the certificate after startup. Here's the explanation from Microsoft support after analysis of the debugger dump:
When you create the credential, the data as we expected is stored in cache and is also persisted to disk but since the data is available in cache, we directly access with full URL name and it works. But after the SQL Server service is restarted and since the credential wont be available in cache, we try to query it from metadata and the way we do it there seems to be an issue on how we fragment the URL and we incorrectly look for container with foldername causing the issue. This is also why Alter wont work and only DROP\CREATE works since it’s in cache again.
We changed the credential to omit the server-specific folder name in the credential. Note that we had originally included the full path to the backup folder because I had read a little too much into this portion of the documentation: THE CREDENTIAL NAME argument requires that the name match the container path." 

This credential creation worked, and kept working after SQL startup.
CREATE CREDENTIAL [https://account.blob.core.windows.net/container] WITH IDENTITY='Shared Access Signature'
, SECRET='sv=2019-02-02&xxxxx';

Note the credential name ends with the container path but doesn't include the server name-specific subfolder, and still does not end with a '/'. 

The support escalation engineer marked the ticket as a bug and filed a defect with the engineering team, so we may have a solution in place for this soon. The end.




Oh, also, when we were first starting to use Backup to URL, we got this on some of our older instances of SQL Server:
"Backup/Restore to URL device error: Error while decoding the storage key."
Problem was trying to use an SAS key on SQL Server prior to 2016. Not supported! Gotta use the old syntax and a certificate for the  storage account identity and access key. 

No comments: