Monday, February 04, 2013

Adding the Clustered Key To Your Nonclustered Index?

A colleague of mine was understandably confused when using the almost-perfect missing index engine in SQL 2008. The engine recommended he include the clustered index key in a new nonclustered index. Read that again - you're right, it doesn't make a lot of sense.

From my colleague:
Why would you ever need to have your primary key or clustered index key as a key field on a non-clustered index? That don't make any sense to me. If the SQL engine needs to make a lookup on the clustered index have the primary key as an include on your non-clustered index. ...I researched it and the only difference is that my current nonclustered has the primary key as a key field and now it's suggesting making it a include column.

I've got a wild hunch that the clustered key might make sense to have as a key in the nonclustered index as long as it isn't the first key, and the query is ordering on a nonclustered key. However, I can't think of a representative situation to prove that, so I might be wrong. I'm open to suggestions for a reproducible lab sample, perhaps your starting point would be the sample script below.

That said, I don't think it would make any sense to put the clustered key in the INCLUDE of a nonclustered index.  I'm puzzled as to why the missing index feature would suggest this.

Here's a lab for proof. In this scenario, the index was 19.320312 MB without the include, and 19.320312 with the include. SQL doesn't store an extra copy of the clustered key in the nonclustered index just because you asked for it in the INCLUDE or in the key. That's because the clustered key is already a part of any nonclustered index key.

This reinforces the point that the clustered key should always been as unique (so it's helpful) and narrow (so it doesn't waste space) as possible. That's why for almost all tables, an integer IDENTITY(1,1) key is a perfect clustered index and primary key. 4 bytes per row. And that's why for all tables, a GUID is a less efficient choice for your clustered key. 16 burdensome bytes per row...which is then lugged around by all your nonclustered keys...  </GUIDrant>
use somedatabasenotinproduction 
go
create table dbo.testclusteredinclude 
(      id int identity(1,1) not null primary key 
,      text1 varchar(30) Not null) 
go
insert into dbo.testclusteredinclude (text1) values ('test1'), ('test2') 
go 
insert into dbo.testclusteredinclude (text1) 
select text1 from dbo.testclusteredinclude where text1 = 'test2' 
go 20 
--turn on show actual exec plan 
select id, text1 from dbo.testclusteredinclude where text1 = 'test1' 
go 
create nonclustered index idx_nc_testclusteredinclude_text1  
on dbo.testclusteredinclude (text1) 
go 
select id, text1 from dbo.testclusteredinclude where text1 = 'test1' 
go 
drop index idx_nc_testclusteredinclude_text1 
on dbo.testclusteredinclude 
go 
create nonclustered index idx_nc_testclusteredinclude_text1 
on dbo.testclusteredinclude (text1)  include (id) 
go 
select id, text1 from dbo.testclusteredinclude where text1 = 'test1'
 

1 comment:

w said...

Edit: some formatting issues with the code removed some EOL spaces, apologies.