Tuesday, February 05, 2013

Adding the Clustered Key To Your Nonclustered Index? Part 2

Of course, a rather obvious answer presents itself the next day.  After reviewing the actual environment that my colleague was working in, it popped quickly into mind.

What if the table has a compound primary key?  See comments for a play-by-play.

drop table dbo.testclusteredinclude
go
create table dbo.testclusteredinclude
(             id1 int not null
,             id2 int not null
,             id3 int not null
,             text1 varchar(30) Not null
,             constraint pk_testclusteredinclude primary key  (id1, id2, id3)
)
go
insert into dbo.testclusteredinclude (id1, id2, id3, text1) values (1,2,3,'test1'); --put in our seed row

--filler data of 10000 rows
with cte3pk (id1, id2, id3) as
(select id1=2,id2=3,id3=4
union all
select id1+1, id2+1, id3+1
from cte3pk
where id1 <= 10000
)
insert into dbo.testclusteredinclude (id1, id2, id3, text1)
select id1, id2, id3, 'test2' from cte3pk
OPTION (MAXRECURSION 10000);
go
alter index all on dbo.testclusteredinclude  rebuild
go

--turn on show actual exec plan

--Second key of the Clustered Key can benefit, this easy to understand.
-- We're simply giving SQL a narrower set of data, via the nonclustered index.
select id2, text1 from dbo.testclusteredinclude where
id2 = 1001
go
create nonclustered index idx_nc_testclusteredinclude_id2_text1
on dbo.testclusteredinclude (id2, text1)
go
select id2, text1 from dbo.testclusteredinclude where
id2 = 1001
go
drop index idx_nc_testclusteredinclude_id2_text1 on dbo.testclusteredinclude
go

--Still, putting a subsequent key of a compound clustered key in the include column doesn't help.
-- SQL can still do an index seek on id2, even when the index doesn't contain it (idx_nc_testclusteredinclude_text1).
select id2, text1 from dbo.testclusteredinclude where
text1 = 'test2'
go
create nonclustered index idx_nc_testclusteredinclude_text1
on dbo.testclusteredinclude (text1)
go
select id2, text1 from dbo.testclusteredinclude where
text1 = 'test2'
go
create nonclustered index idx_nc_testclusteredinclude_text1_inc_id2
on dbo.testclusteredinclude (text1) include (id2)
go
select id2, text1 from dbo.testclusteredinclude where
text1 = 'test2'
go

drop index idx_nc_testclusteredinclude_text1 on dbo.testclusteredinclude
drop index idx_nc_testclusteredinclude_text1_inc_id2 on dbo.testclusteredinclude
go

One final note - none of the SELECT statements above generate any missing index suggestions in SQL 2012 SP1 or SQL 2008 SP2, even though without any nonclustered indexes they all generated Clustered Index Scans.

Monday, February 04, 2013

Adding the Clustered Key To Your Nonclustered Index?

A colleague of mine was understandably confused when using the almost-perfect missing index engine in SQL 2008. The engine recommended he include the clustered index key in a new nonclustered index. Read that again - you're right, it doesn't make a lot of sense.

From my colleague:
Why would you ever need to have your primary key or clustered index key as a key field on a non-clustered index? That don't make any sense to me. If the SQL engine needs to make a lookup on the clustered index have the primary key as an include on your non-clustered index. ...I researched it and the only difference is that my current nonclustered has the primary key as a key field and now it's suggesting making it a include column.

I've got a wild hunch that the clustered key might make sense to have as a key in the nonclustered index as long as it isn't the first key, and the query is ordering on a nonclustered key. However, I can't think of a representative situation to prove that, so I might be wrong. I'm open to suggestions for a reproducible lab sample, perhaps your starting point would be the sample script below.

That said, I don't think it would make any sense to put the clustered key in the INCLUDE of a nonclustered index.  I'm puzzled as to why the missing index feature would suggest this.

Here's a lab for proof. In this scenario, the index was 19.320312 MB without the include, and 19.320312 with the include. SQL doesn't store an extra copy of the clustered key in the nonclustered index just because you asked for it in the INCLUDE or in the key. That's because the clustered key is already a part of any nonclustered index key.

This reinforces the point that the clustered key should always been as unique (so it's helpful) and narrow (so it doesn't waste space) as possible. That's why for almost all tables, an integer IDENTITY(1,1) key is a perfect clustered index and primary key. 4 bytes per row. And that's why for all tables, a GUID is a less efficient choice for your clustered key. 16 burdensome bytes per row...which is then lugged around by all your nonclustered keys...  </GUIDrant>
use somedatabasenotinproduction 
go
create table dbo.testclusteredinclude 
(      id int identity(1,1) not null primary key 
,      text1 varchar(30) Not null) 
go
insert into dbo.testclusteredinclude (text1) values ('test1'), ('test2') 
go 
insert into dbo.testclusteredinclude (text1) 
select text1 from dbo.testclusteredinclude where text1 = 'test2' 
go 20 
--turn on show actual exec plan 
select id, text1 from dbo.testclusteredinclude where text1 = 'test1' 
go 
create nonclustered index idx_nc_testclusteredinclude_text1  
on dbo.testclusteredinclude (text1) 
go 
select id, text1 from dbo.testclusteredinclude where text1 = 'test1' 
go 
drop index idx_nc_testclusteredinclude_text1 
on dbo.testclusteredinclude 
go 
create nonclustered index idx_nc_testclusteredinclude_text1 
on dbo.testclusteredinclude (text1)  include (id) 
go 
select id, text1 from dbo.testclusteredinclude where text1 = 'test1'
 

Wednesday, January 09, 2013

Try a Database Design Exercise At Your Next SQL User Group



Tried something different for the January 2013 meeting of the Baton Rouge SQL Server User Group (brssug.org), which meets every second Wednesday at the Louisiana Technology Park (latechpark.com).

I think this was one of the most enjoyable meetings we've ever had, and certainly the most interactive.  As the chapter president, this took very little preparation (printing, writing implements) and really required me to do nothing more than lead the discussion.  My most excellent user group colleagues did all the heavy lifting, with lots of smiles.

I took my Relational Database Design exercise from the October 2012 Regional AITP Conference, printed out a bunch of copies, and handed them out to the attendees.  I'd recommend this to anyone looking for a change-up in their user group.  We couldn't find a speaker for January, so I thought this would be good content for our first meeting of the year.

For about 20 minutes, they worked with pens (not pencils, we're professionals!) on scratch paper for rough drafts of their design.  I reminded them that knowledge of American college football was not required, and that if you were making design decisions based on sports knowledge, you're probably not on the right track.

I hooked up my laptop to the big screen TV and started up a blank database diagram in SQL Server Management Studio 2012, which many of them had not seen.  The Database Diagrams tool built into SSMS is perfect for an exercise like this.  I hit "New Table" and said, "What table is first?"  Later, "Any more columns we need here?"  "What's the next table?" "What would be the best data type for this?" While on the keyboard, I tried to make as few decisions as possible, and encourage discussion of everyone's ideas and suggestions.  A couple times, we backed up the design after changing our minds.  No worries!


The 15 of us got through most (not all) of the design in about 80 minutes, time which flew by.  We had a great mix of very friendly folks.  Some had experience back to SQL 4.2, some with lots of Business Intelligence experience, and some .net devs with only accidental SQL experience.  We covered a ton of topics that were educational for everyone, and stumbled across too many design choice point-counterpoint discussions to remember.  

Do we need audit fields?  What attributes do we need to store per year, per player, or per game?  Are attributes dependent on the primary key?  How should we handle static student data vs student data that is measured annually?  The storage of games, schedules and rosters was a big source of conversation - always polite and professional and with lots of jokes and side conversations.  While most of our design conversation was vendor agnostic, we also touched on data types and indexes in SQL Server.

Along the way, added indexes, unique constraints, computed columns (to record if our football team won/lost/tied), foreign keys, identity columns and more, using only the Management Studio Database Diagrams.

The point of the user group meeting was not to complete the exercise - there wasn't enough time - but to raise all the kind of database design decisions that would come up in "real life."  With more time, we could have completed the design, filled in sample data, or even created some basic reports and a data warehouse to serve them.  

Here is a link to the problem statement for the contest, which I suppose now I can't ever use again for a conference competition:


Database Design Contest


Here's a second panoramic I took, with much less success.  My apologies to the folks whose heads were vanished by the Pano app on my android phone.


Thanks to everyone who attended, see you in February!