Tuesday, December 18, 2018

Actual Emails: What is a "soft delete"?

What's a "soft delete", and why should DBAs be aware of such behavior in tables?

So, a DELETE statement is a “hard” delete. The data is gone.

However, it is a common practice to build into tables a set of auditing fields…

For example:
CREATE TABLE dbo.whatever(
Id int identity(1,1) not null primary key,
WhateverInt int not null,
…
Createddate datetimeoffset(0) not null,
Createdby varchar(250) not null,
Modifieddate datetimeoffset(0) null,
Modifiedby varchar(250) not null,
IsActive bit not null CONSTRAINT DF_whatever_IsActive
DEFAULT (1)
)
The IsActive field, or anything similarly named like IsArchived or IsDeleted, is a common strategy to “soft” delete something and remove it from resultsets, but still retain history. All the queries on the table would then use WHERE IsActive = 1 to make sure they only viewed Active data, for example.

So, if you UPDATE a record to IsActive = 0 instead of DELETEing it, it disappears from queries and reports and screens, but it’s still there in case it’s useful. There may be some queries that intentionally want to query IsActive = 0 for historical or auditing purposes. 

Often, a filtered nonclustered index (introduced in SQL 2008) can be used to match the WHERE IsActive = 1, and then the index is smaller as a result and allows SQL Server to dramatically increase performance here. This is really only effective if a majority of the data has actually been soft-deleted.

For example: 
CREATE NONCLUSTERED INDEX IDX_NC_whatever_WhateverInt
ON dbo.whatever (WhateverInt) 
WHERE IsActive = 1
We had one client with an IsArchived flag (or similar) on a table with tens of millions of rows. Only 1% of the data was “active” and waiting to be archived. Their queries specified WHERE IsArchived=0, but still performed very poorly – lots of data still to be accessed. We added and modified some nonclustered indexes as filtered indexes to account for IsArchived = 0, and not only did this greatly reduce the size the indexes, but the query result time went from minutes to instantaneous. Huge difference. We could also have added similarly-filtered nonclustered columnstore indexes.


DBAs need to understand table design and recognize "soft delete" keys and the associated business logic in applications, because the Missing Indexes feature and other common query tuning tools won't include them in recommendations for filtering.

Furthermore, we can even enforce filtered uniqueness using a unique nonclustered index. In this way, we could make sure that there is only one active record for a given key set, but allow many inactive records for the same key set!