Thursday, April 24, 2014

Using a Filtered Index to Enforce Filtered Uniqueness

One of the advantages and uses of Filtered Indexes (introduced with SQL Server 2008) is to provide filtered uniqueness. Obviously this has some implications, so please understand what you're trying to accomplish.

In the below example, we have a compound natural key of the two fields key1 and key2.  The field bit1 is not a member of the natural key, but does inform us as to some table usage. Perhaps bit1 indicates that this record is Active (1) or Inactive (0), or whether the data is Confirmed (1) or Unconfirmed (0), or Deleted (0) or Not Deleted (1).

In any case, we only want to enforce uniqueness for when bit1 = 1, which indicates:
  1. This value is regularly filtered for use where bit1 = 1
  2. We don't care whether there are duplicate records for when bit1 = 0.
In this way, you could "deactivate" (in business terms) a record by setting bit1 = 0, without violating your natural key's uniqueness on (key1, key2).
drop table dbo.tabletestdups 
go
create table dbo.tabletestdups 
( key1 int not null
, key2 int not null
, bit1 bit not null
)
go
create unique nonclustered index idx_nc_u_f_tabletestdups_key1_key2_bit1 on dbo.tabletestdups (key1, key2, bit1) 
go
insert into dbo.tabletestdups (key1, key2, bit1) values (1,1,1) --succeed
insert into dbo.tabletestdups (key1, key2, bit1) values (2,2,1) --succeed
insert into dbo.tabletestdups (key1, key2, bit1) values (3,3,1) --succeed
insert into dbo.tabletestdups (key1, key2, bit1) values (1,1,0) --succeed
insert into dbo.tabletestdups (key1, key2, bit1) values (2,2,0) --succeed
insert into dbo.tabletestdups (key1, key2, bit1) values (3,3,0) --succeed
go
insert into dbo.tabletestdups (key1, key2, bit1) values (1,1,1) --fails
go
insert into dbo.tabletestdups (key1, key2, bit1) values (1,1,0) --fails
go
drop index idx_nc_u_f_tabletestdups_key1_key2_bit1 on dbo.tabletestdups 
go
create unique nonclustered index idx_nc_u_f_tabletestdups_key1_key2_bit1 on dbo.tabletestdups (key1, key2, bit1) WHERE bit1 = 1 --Note the important WHERE clause here at the end of the index.
go
insert into dbo.tabletestdups (key1, key2, bit1) values (1,1,1) --fails
go
insert into dbo.tabletestdups (key1, key2, bit1) values (1,1,0) --succeeds because the unique constraint only enforces bit1 = 1.
go
select * from dbo.tabletestdups











Note that rows 4 and 7 have allowed duplicate combination of key1 =1, key2= 1 and bit1 = 0, but that previous attempts to insert a duplicate combination of key1 =1, key2= 1 and bit1 = 1 failed.

Filtered indexes also provide some powerful tuning and storage advantages for commonly-accessed search predicates, with some caveats. Here is an excellent article explaining the caveats: http://blogs.msdn.com/b/timchapman/archive/2012/08/27/the-drawback-of-using-filtered-indexes.aspx

Wednesday, April 23, 2014

Real heavyweights: Float vs Decimal, the Thrilla in Precision

This is a followup to a previous post where I likened SQL Server float datatype to Muhammad Ali. Specifically, I wanted to address the phrase "negligible data storage increase" to the test for different numeric data types.

In short, I will dismiss the notion that float provides a realistic advantage in storage of large numbers. (Also, I had a blog title idea too catchy to pass on.)

I'll avoid recapping in total my earlier post about the dangerous consequences of rounding with the float data type. The float data type is still popular with developers and database designers because of its supposed advantages in data storage size. (Also, some code-first modelers use float as a default. Also dangerous. And lazy.)

Too often, developers and database designers ignore the large potential for inconsistent rounding with float (and real) data types, whereas a decimal(p,s) data type is not only consistent with storage, but it provides transparent logical intent to the development team. No surprises with truncation of numerals to the right of the decimal point with the decimal data type.

Here's my test lab for a number with 3 digits to the right of the decimal point.

create table realtest (real1 float(24))
create table floattest (float53 float)--float(53) is the default
create table decimal9test (decimal9 decimal(9,3))
create table decimal19test (decimal19 decimal(19,3))
go
insert into realtest (real1) Values (123456.123)
insert into floattest (float53) Values (123456.123)
insert into decimal9test (decimal9) Values (123456.123)
insert into decimal19test (decimal19) Values (123456.123)
go
insert into realtest (real1) select real1 -1. from realtest
insert into floattest (float53) select float53 -1. from floattest
insert into decimal9test (decimal9) select decimal9 -1. from decimal9test
insert into decimal19test (decimal19) select decimal19 -1. from decimal19test
go 21
go
select sizemb = SUM(sizemb), tablename from (
select SizeMb = (p.in_row_reserved_page_count*8.)/1024.,
tablename = o.name, indexname = i.name
from sys.dm_db_partition_stats p
inner join sys.objects o on p.object_id = o.object_id
inner join sys.indexes i on i.object_id = o.object_id and i.index_id = p.index_id
where o.is_ms_shipped = 0
) x
where tablename in ('floattest', 'realtest', 'decimal9test', 'decimal19test')
group by tablename
order by sizemb desc
go
select top 1 * from realtest
select top 1 * from floattest
select top 1 * from decimal9test
select top 1 * from decimal19test
go

First, the resultsets showing the storage of 2+ million rows in a single-column table.

tablenamerow_countsizemb
realtest209715226.382812
floattest209715234.445312
decimal9test209715228.382812
decimal19test209715236.507812

Here's the values that were actually stored, notice the important differences.

real1
123456.1
float53
123456.123
decimal9
123456.123
decimal19
123456.123

You can see that the ~10mb we saved with the real (or float(24)) data type isn't much good because of the catastrophic loss of precision. And even with 2 million rows, we've saved roughly 2mb of space by using float.

Now, let's run the same test, instead with the value 123456.1234567 instead. (The Decimal9 table has been eliminated from this result because it can only store a number with scale 9.)

create table realtest (real1 float(24) )
create table floattest(float53 float) --float(53) is the default
create table decimal19test(decimal19 decimal(19,7) )
...
tablenamerow_countsizemb
realtest209715226.382812
floattest209715234.445312
decimal19test209715236.507812

real1
123456.1
float53
123456.1234567
decimal19
123456.1234567

Note again that real is wholly unacceptable with its transparent loss of precision, while float and a properly-aligned decimal data type store the data appropriately and yet without a significant difference in storage (less than 6% difference).

Now, to slam the door on float.

Let's apply DATA_COMPRESSION = PAGE to this data.
create table realtest (real1 float(24) ) with (data_compression = page)
create table floattest(float53 float) with (data_compression = page)
create table decimal19test(decimal19 decimal(19,7) )with (data_compression = page)
...

tablenamerow_countsizemb
floattest209715226.382812
decimal19test209715224.320312

We can clearly that page compression reverses the small advantage that floattest enjoyed.

Since page data compression is a very popular, very widespread and very useful tool on modern Enterprise edition SQL Servers, we can rule out float as having any space saving advantage. In short, float loses the last ground it stood on.

The real and float data type columns only compressed by 15% and 23% respectively. The fixed decimal data type compressed, in this case, by 43.5% (36.5mb to 24.3mb).

If we scale the page compressed data comparison up to 134 million rows...

tablename
row_count
sizemb
floattest
134217728
1686.070312
decimal19test
134217728
1556.015625

...we confirm that floattest has lost its so-called space savings advantage, and is more than 7% larger than the same data in a proper decimal data type.