Friday, May 03, 2013

Use the Default System Trace to Find Autogrowth Events

The SQL Server Management Studio built-in standard report for "Disk Usage" is very handy to view the performance-sucking autogrowth events for a database... until you want to see if autogrowth events have been occurring in any database on the server.

Autogrowth events are very problematic and can cause sudden, unpredictable and difficult-to-troubleshoot application timeouts and performance drops. This applies to any SQL Server database file, including those in SharePoint databases, system databases (such as the tempdb), and any user database data file or log file.


[As an aside, your friends in SharePoint know all about Autogrowth settings for their content databases as well.  Friend of the blog Drew at WerePoint posted this relevant blog entry recently about the timing of growth and recycle-bin emptying events. http://werepoint.blogspot.com/2013/05/on-proper-care-and-feeding-of-content.html ]


Ideally, as professional DBAs, we don't want to ever see an autogrowth event occur on our databases.  By proactively monitoring the space in our database files, we can grow database files on our own terms, on our own schedule (outside of busy hours), and without interrupting a user transaction.

Autogrowth events occur when a data file or log file has exhausted its reservation. SQL is forced to request the operating system that more space to be added to a file. During this request and the file growth, very little happens to the data file. This typically occurs during a user transaction - bad news!

The increment of the autogrowth might be defaulted to just 1 MB on your system, which means a single large transaction could result in a series of sequential autogrowth events before the transaction finally commits.  In the end, the argument over fixed autogrowth size versus percentage-based autogrowth settings are academic - a vigilant DBA should try to avoid autogrowth events in the first place.

But, while we're on the topic, 10% is a completely reasonable growth rate for data and log files for most databases. Depending on your available storage space, moving to a fixed growth rate (100mb-500mb) is more appropriate.

Instant File Initialization can also make a big difference in the time required to grow a database file (as well as restore database backups.) Note that this may need to be enabled at the virtual guest and SAN level for some virtual environments.

Here's a script to view the available space in every database file in a SQL Server instance, using the hacky old sp_MSforeachdb undocumented stored procedure:
exec sp_MSforeachdb  'use [?]; 
SELECT 
  ''DatabaseName_____________'' = d.name
, Recovery = d.recovery_model_desc
, ''DatabaseFileName_______'' = df.name
, ''Location_______________________________________________________________________'' = df.physical_name
, df.File_ID
, FileSizeMB = CAST(size/128.0 as Decimal(9,2))
, SpaceUsedMB = CAST(CAST(FILEPROPERTY(df.name, ''SpaceUsed'') AS int)/128.0 as Decimal(9,2))
, AvailableMB =  CAST(size/128.0 - CAST(FILEPROPERTY(df.name, ''SpaceUsed'') AS int)/128.0 as Decimal(9,2))
, ''Free%'' = CAST((((size/128.0) - (CAST(FILEPROPERTY(df.name, ''SpaceUsed'') AS int)/128.0)) / (size/128.0) ) * 100. as Decimal(9,2))
 FROM sys.database_files df
 cross apply sys.databases d
 where d.database_id = DB_ID() 
 and size > 0
'
One important piece of information we can get from the default trace when we look for autogrowth events is the time and duration of each autogrowth event.  I've personally used these two pieces of information to prove that random application timeouts experienced by a client were because of database file autogrowth events stalling their transaction for whole seconds while more space was added.

How far back does the default trace go? It depends on how much activity is being written to the default trace (which obviously needs to be enabled for this to work). The current active default trace file keeps up to 20mb of data.  Five default trace files are kept of 20mb each. This script only reads the latest file.
DECLARE @tracepath nvarchar(500)

SELECT 
 @tracepath = path 
 --select *
FROM sys.traces 
WHERE is_default = 1

--The trace automatically finds _n files, trim off the _nnn portion of the file name.
 select @tracepath = substring(@tracepath, 0, charindex('\log_', @tracepath,0)+4) + '.trc'

 print @tracepath

 SELECT 
  DBName    = g.DatabaseName
 , DBFileName   = mf.physical_name
 , FileType   = CASE mf.type WHEN 0 THEN 'Row' WHEN 1 THEN 'Log' WHEN 2 THEN 'FILESTREAM' WHEN 4 THEN 'Full-text' END
 , EventName   = te.name
 , EventGrowthMB  = convert(decimal(19,2),g.IntegerData*8/1024.) -- Number of 8-kilobyte (KB) pages by which the file increased.
 , EventTime   = g.StartTime
 , EventDurationSec = convert(decimal(19,2),g.Duration/1000./1000.) -- Length of time (in milliseconds) necessary to extend the file.
 , CurrentAutoGrowthSet= CASE
         WHEN mf.is_percent_growth = 1
         THEN CONVERT(char(2), mf.growth) + '%' 
         ELSE CONVERT(varchar(30), convert(decimal(19,2), mf.growth*8./1024.)) + 'MB'
        END
 , CurrentFileSizeMB = convert(decimal(19,2),mf.size* 8./1024.)
 , @tracepath 
 FROM fn_trace_gettable(@tracepath, default) g
 cross apply sys.trace_events te 
 inner join sys.master_files mf
 on mf.database_id = g.DatabaseID
 and g.FileName = mf.name
 WHERE g.eventclass = te.trace_event_id
 and  te.name in ('Data File Auto Grow','Log File Auto Grow')
 order by StartTime desc

 



EDIT: Smarter way to search the trace file _n files. This has also been reflected in the "autogrow events.sql" file in the toolbox.

Hunt Down Tables Without Clustered Indexes

This one's a gimme, but an underrated utility script when I enter a new environment and look for low-hanging fruit for easy improvements.

You'd be surprised how many vendors ship applications without a single index, much less properly aligned clustered indexes.

Heaps are bad.  You can't defragment a heap by performing index maintenance, despite old myths about the create/drop a clustered index trick.

Similarly, the myth of a heap performing better on pure inserts vs a table with a clustered index is an academic one.  Heaps are impossible to order, which means selecting from a heap is always a scan.

If your table will be constantly written to and rarely read, put a clustered index on it.  The writes will be sequential, resulting in fewer page breaks and less fragmentation, which mean more efficient storage and maintenance.

EDIT: Page "breaks" (I meant splits) aren't possible on heaps, but forwarding pointers, which create a maze of bad performance in heaps with nonclustered indexes. Thanks to commenters for pointing this out!

And when you have to look in that table for a stray logging record, it won't be a full table scan. If your table will be constantly written to and never read, why is it in your database?  Such a table doesn't exist.

This script is an easy one to use for finding heaps in a database, along with the row count and size of jumbled data that is haunting your database.  It is in my DBA toolbox folder alongside many other scripts.

--Doesn't work on 2000 databases or databases in 2000 compatability mode.  Need to change the db_id() syntax if so.

select 
 [Database Name] = db_name()
, [Table Name] = s.name + '.' + o.name
, p.row_count
, SizeMb= (p.reserved_page_count*8.)/1024.
from 
 sys.objects o
inner join  sys.schemas s on o.schema_id = s.schema_id
inner join  sys.dm_db_partition_stats p  on p.object_id = o.object_id 
inner join  sys.indexes si on si.object_id = o.object_ID
WHERE si.type_desc = 'Heap'
and  is_ms_shipped = 0
order by SizeMb desc


As an aside, but a different blog post altogether - a good clustered key is narrow, unique, unchanging and sequential - which is why IDENTITY integer columns are the perfect clustered key.  Compound clustered keys can be a big waste of space.