Friday, May 03, 2013

Hunt Down Tables Without Clustered Indexes

This one's a gimme, but an underrated utility script when I enter a new environment and look for low-hanging fruit for easy improvements.

You'd be surprised how many vendors ship applications without a single index, much less properly aligned clustered indexes.

Heaps are bad.  You can't defragment a heap by performing index maintenance, despite old myths about the create/drop a clustered index trick.

Similarly, the myth of a heap performing better on pure inserts vs a table with a clustered index is an academic one.  Heaps are impossible to order, which means selecting from a heap is always a scan.

If your table will be constantly written to and rarely read, put a clustered index on it.  The writes will be sequential, resulting in fewer page breaks and less fragmentation, which mean more efficient storage and maintenance.

EDIT: Page "breaks" (I meant splits) aren't possible on heaps, but forwarding pointers, which create a maze of bad performance in heaps with nonclustered indexes. Thanks to commenters for pointing this out!

And when you have to look in that table for a stray logging record, it won't be a full table scan. If your table will be constantly written to and never read, why is it in your database?  Such a table doesn't exist.

This script is an easy one to use for finding heaps in a database, along with the row count and size of jumbled data that is haunting your database.  It is in my DBA toolbox folder alongside many other scripts.

--Doesn't work on 2000 databases or databases in 2000 compatability mode.  Need to change the db_id() syntax if so.

 [Database Name] = db_name()
, [Table Name] = + '.' +
, p.row_count
, SizeMb= (p.reserved_page_count*8.)/1024.
 sys.objects o
inner join  sys.schemas s on o.schema_id = s.schema_id
inner join  sys.dm_db_partition_stats p  on p.object_id = o.object_id 
inner join  sys.indexes si on si.object_id = o.object_ID
WHERE si.type_desc = 'Heap'
and  is_ms_shipped = 0
order by SizeMb desc

As an aside, but a different blog post altogether - a good clustered key is narrow, unique, unchanging and sequential - which is why IDENTITY integer columns are the perfect clustered key.  Compound clustered keys can be a big waste of space.


Anonymous said...

Hi William, I have a question which hopefully you can answer. I am the classic 'accidental DBA', and just trying to pick stuff up as quickly as possible, so don't judge my simple/silly question too harshly! We run Dynamics AX 2009 in our company, and have purchased a Service Management addon for AX.

When I run the above query, it returns hundreds of records, but all for tables from the Service Management addon.

If I am reading this correctly, this query is showing me which tables have heaps - does this also mean these tables dont have an index? The biggest records returned are 811MB, 445MB, 406MB - is this too big for a heap? The database data size is 15GB.

Hopefully you can give me some feedback, as I am very new to SQL, and learning from experts as much as I can


Anonymous said...

Hi, sorry to correct you but joining sys.indexes not using index_id, makes the query to give extra rows if the table have any non clustered index.

If the main point is to highlight which tables are heaps, I guess there's no need to have more than one row per table.

Good article though, I've been working on this topic recently and I came up with a very similar query.


Anonymous said...

Thanks for the post. I am confused by the last statement that 'Compound clustered keys can be a big waste of space.'. Since a table is stored in the single clustered key allowed on a table, why does it matter how many columns are in the the sort order?

Anonymous said...

heaps dont suffer from page splits, clustered tables can at the leaf level if the data is not entered in ascending order. So its not always a no-brainer that every table needs a clustered key

Anonymous said...

"If your table will be constantly written to and never read, why is it in your database?"

I can think of a number of reasons for this, and you can have tables written to at a source and then only ever read at a target. This could be used to optimise writes and have minimal indexes on the source table. The log reader can be used either through replication or cdc to pick up the details from the transaction and apply them to a target table. Hence the source table is write only and the target table can be better optimsied for reads.

Anonymous said...

"IDENTITY integer columns are the perfect clustered key. Compound clustered keys can be a big waste of space."

You compare the advantages of a single integer column directly to a compound clustered key. What about a single column that represents a commonly selected range, such as a transaction date. It will be these dates in the WHERE clause, not the ID numbers. Wouldn't this in at least most cases make the better clustered index.

Anonymous said...

The same goal can be acheived by evaluating the 'HasClusteredIndex' property of the Table Facet using Policy Management. No extra script to keep track of. You can evaluate as desired, or automatically to warn of users who would create a heap without you knowing.

Anonymous said...

Yes, definitely join 'si' to 'p' on both object_id and index_id to return just one row per heap table.