Showing posts with label performance. Show all posts
Showing posts with label performance. Show all posts

Tuesday, December 18, 2018

Actual Emails: What is a "soft delete"?

What's a "soft delete", and why should DBAs be aware of such behavior in tables?

So, a DELETE statement is a “hard” delete. The data is gone.

However, it is a common practice to build into tables a set of auditing fields…

For example:
CREATE TABLE dbo.whatever(
Id int identity(1,1) not null primary key,
WhateverInt int not null,
…
Createddate datetimeoffset(0) not null,
Createdby varchar(250) not null,
Modifieddate datetimeoffset(0) null,
Modifiedby varchar(250) not null,
IsActive bit not null CONSTRAINT DF_whatever_IsActive
DEFAULT (1)
)
The IsActive field, or anything similarly named like IsArchived or IsDeleted, is a common strategy to “soft” delete something and remove it from resultsets, but still retain history. All the queries on the table would then use WHERE IsActive = 1 to make sure they only viewed Active data, for example.

So, if you UPDATE a record to IsActive = 0 instead of DELETEing it, it disappears from queries and reports and screens, but it’s still there in case it’s useful. There may be some queries that intentionally want to query IsActive = 0 for historical or auditing purposes. 

Often, a filtered nonclustered index (introduced in SQL 2008) can be used to match the WHERE IsActive = 1, and then the index is smaller as a result and allows SQL Server to dramatically increase performance here. This is really only effective if a majority of the data has actually been soft-deleted.

For example: 
CREATE NONCLUSTERED INDEX IDX_NC_whatever_WhateverInt
ON dbo.whatever (WhateverInt) 
WHERE IsActive = 1
We had one client with an IsArchived flag (or similar) on a table with tens of millions of rows. Only 1% of the data was “active” and waiting to be archived. Their queries specified WHERE IsArchived=0, but still performed very poorly – lots of data still to be accessed. We added and modified some nonclustered indexes as filtered indexes to account for IsArchived = 0, and not only did this greatly reduce the size the indexes, but the query result time went from minutes to instantaneous. Huge difference. We could also have added similarly-filtered nonclustered columnstore indexes.


DBAs need to understand table design and recognize "soft delete" keys and the associated business logic in applications, because the Missing Indexes feature and other common query tuning tools won't include them in recommendations for filtering.

Furthermore, we can even enforce filtered uniqueness using a unique nonclustered index. In this way, we could make sure that there is only one active record for a given key set, but allow many inactive records for the same key set!



Wednesday, February 05, 2014

Actual Emails: More Grief for GUIDs

Wrote this email exchange with some developer colleagues about to embark on performance tuning.
Subject: bag o' tricks from DMV talk
From: A. Developer
Hey William,
We are about to spend around 4 weeks performance tuning an application we built. Much of the performance tuning is going to be the database. Could you send over those dmv queries that you used in your DMV talk? Or do you still have them available for download? I think they will be able to help us out quite a bit.
I know one of the big things you talked about is not using GUIDs, however, we are using them because of replication. Do you have any thoughts on helping ease the fragmentation because of this?

From: William Assaf

Yep, I have all that info here. This is the last time I gave that talk: http://www.sqltact.com/2013/09/houston-tech-fest-2013-sql-admin-best.html
Of course, if you have any questions, let me know.

So, as you know, I hate GUIDs because they are 4x as large as an integer yet serve the same purpose, or 2x as large as two integer columns to allow for multiple device synchronization.

But the biggest problem with GUIDs can happen when the first key of the clustered index of a table is a GUID column. With the creation of the new GUIDs, are you doing random GUIDs or sequential GUIDS?

If you’re creating them with a SQL default value (which you’re probably not, but as an example), this would be the difference between the newid() function (bad) and the newsequentialid() function (not as bad).

Using sequential GUIDs can allow you to create a clustered index that actually make some sense when it is ordered, and can have index maintenance performed on it to REBUILD or REORGANIZE, reducing fragmentation. Problem is, when you restart the SQL Service, the sequence also resets. So you won't have one contiguous string of sequentially-inserted GUIDs in a table over its lifetime. 

On random GUIDs, you can REBUILD or REORGANIZE all you want, but the data order still won’t make any sense. Changing from random to sequential GUIDs may be really easy to change in your database or application code. If you’re already using sequential GUIDs, there’s not really much more you can to do mitigate the performance and storage letdowns of GUIDs that you would not also do on tables that use integer IDENTITY columns.

As for basic indexing, run this script first to find any tables that are still heaps. Those should be the first things you target in your performance tuning phase. http://www.sqltact.com/2013/05/hunt-down-tables-without-clustered.html

If I can help or get in on that performance tuning stuff, let me know! Good luck, keep in touch.

From: A. Developer
Thanks for the info.
One question though. Will sequential GUIDs work with replication? Wouldn't there be collisions if they are sequential?


From: William

So, in theory, is it mathematically "unlikely" that GUIDs will have collisions. That math always troubles me when multiple machines are generating GUIDs, though the network card is used to generate a unique seed for the GUID on each machine. It is even more troublesome when GUIDs are generated sequentially, but we are supposed to trust the math. :/ 
I’m not super knowledgeable about that math but from what I know, your concern is understandable but “unlikely” to be an issue. I know, not comforting. Here is some comforting reference material though. http://blogs.msdn.com/b/sqlserverfaq/archive/2010/05/27/guid-vs-int-debate.aspx 
Uses words like “practically guaranteed” and “virtually impossible” for sequential GUID collisions.



I welcome comments and additions to this somewhat religious topic. The "oil rig" scenario that is commonly faced in the Gulf South provides a recurring fresh argument for/against GUIDs. In many scenarios, a pair of integer fields can provide the benefits of sequential, unique keys. Assuming one of the pairs is also the rig/device/source of the data, you also have a built-in foreign key constraint in the unique key, something you would have to store anyway in a table keyed on a GUID.

Wednesday, September 04, 2013

Replicated Stored Procedure Execution

One feature that many of my clients are unaware of is that the batched execution of a stored procedure can be replicated under certain circumstances. This can result in significant improvement in performance for stored procedures on replicated databases.

This MSDN article does a good job of breaking down the basics, but here's a demonstrable example you can use.

The rules about batching up the stored procedure executions are fairly severe, and require the SERIALIZABLE isolation level for the stored procedure's transaction. The goal is to see a single 100-row transaction, not 100 single-row transactions. The latter, inefficient behavior is the default behavior that we want to avoid. We'll prove it later.

Step one: Execute the below script block in an existing replicated database. (Not production, obviously.) Or, set up a new simple transactional replication between two databases, then run the below script block.
--Create testing table
CREATE TABLE [dbo].[TestArticle](
 [id] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
 [text1] [nvarchar](100) NULL,
 [text2] [nvarchar](100) NULL,
 [text3] [nvarchar](150) NULL,
 [text4] [nvarchar](150) NULL,
 [text5] [nvarchar](200) NULL,
 [int1] [int] NULL,
 [int2] [int] NULL,
 [int3] [int] NULL,
 [int4] [int] NULL,
 [int5] [int] NULL,
 CONSTRAINT [PK_TestArticle] PRIMARY KEY CLUSTERED ( [id] ASC )
)
--Populate with random data
declare @x int
select @x = 1
WHILE @x < 100
BEGIN
 --Insert filler data
 INSERT INTO dbo.TestArticle
 (text1,text2,text3,text4,text5,int1,int2,int3,int4,int5)
 SELECT
     replicate(char(round(rand()*100,0)),round(rand()*100,0))
 , replicate(char(round(rand()*200,0)),round(rand()*100,0))
 , replicate(char(round(rand()*300,0)),round(rand()*100,0))
 , replicate(char(round(rand()*400,0)),round(rand()*100,0))
 , replicate(char(round(rand()*500,0)),round(rand()*100,0))
 , round(rand()*10000,0)
 , round(rand()*10000,0)
 , round(rand()*10000,0)
 , round(rand()*10000,0)
 , round(rand()*10000,0)
 select @x = @x + 1
END
go
CREATE procedure [dbo].[addheaprows] 
as
 --Insert filler data
 INSERT INTO dbo.testarticle
 (text1,text2,text3,text4,text5,int1,int2,int3,int4,int5)
 SELECT top 100
     replicate(char(round(rand()*100,0)),round(rand()*100,0))
 , replicate(char(round(rand()*200,0)),round(rand()*100,0))
 , replicate(char(round(rand()*300,0)),round(rand()*100,0))
 , replicate(char(round(rand()*400,0)),round(rand()*100,0))
 , replicate(char(round(rand()*500,0)),round(rand()*100,0))
 , round(rand()*10000,0)
 , round(rand()*10000,0)
 , round(rand()*10000,0)
 , round(rand()*10000,0)
 , round(rand()*10000,0)
 FROM dbo.testarticle
GO

Step two: Add both the table and the stored procedure to a new publication as articles. The stored procedure must be added as an article with Replicate option = 'Execution in a serialized transaction of the SP'.

Step three: Set up a default "TSQL" trace on the Subscriber instance.

Step four: Execute the below on the Publisher instance so that you can see the replicated stored procedure execution in action.

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN testrepl
exec dbo.addrows
COMMIT TRAN testrepl

--In the trace, you should see one trace event for SQL:BatchStarting, textdata =

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; EXEC "dbo"."addrows"  ;SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
 
Step five: Now let's look at what happens normally without any modifications. Start up a default "TSQL" trace on the Subscriber instance and execute the below on the Publisher instance.
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
BEGIN TRAN testrepl
exec dbo.addrows
COMMIT TRAN testrepl


--In the trace, you will see 100 trace events for RPC:Starting, textdata = 


exec [sp_MSins_dbotestarticle] '8C39EB22-71BF-488A-8066-00C4AFABA497', .... 

exec [sp_MSins_dbotestarticle] .... 

exec [sp_MSins_dbotestarticle] .... 

exec [sp_MSins_dbotestarticle] .... 

....
This is the default behavior, and it's a bit shocking to see for the first time.

Remember our rules about the SERIALIZABLE isolation level, which means this isn't necessarily appropriate for all environments. This change could lead to an increase in locking during the execution of your procedure, so educate yourself on heightened isolation levels and why they are important.

Tuesday, September 03, 2013

Actual Emails: TL;DR: Stop using varchar(max)

Wrote this email recently to a crew of developers who were shooting themselves in the foot with a database rich in varchar(max) data types.

Hey folks-
TL;DR: Stop using varchar(max). We’re not storing books.
We need to review and avoid the varchar(max) data type in our tables. Here’s a short treatise as to why.
In SQL Server, varchar(max) is intended to replace the old text data type, which was different from varchar(n) because it was designed to store massive amounts of data.  Massive being greater than 8000 characters, and all the way up to 2gb worth of data. That’s what the varchar(max), varbinary(max), and nvarchar(max) data types are optimized for – HUGE blocks of data in a single cell. We should only use it if we're actually intending to store massive text and use the fulltext indexing engine (a completely separate and specific topic for text blocks).
This is an oversimplification, but varchar(max) is designed to store data differently, and specially for large text blocks. It appears to behave the same as a varchar(n) field, and that’s deceptive when we are throwing 100-200 characters in each row field.
The big drawbacks biting us right now about varchar(max) have to do with indexing, and this is regardless of how much data is actually in a varchar(max) field. A varchar(max) column can’t be the key of a nonclustered index, even if it never stores more than 8000 characters, and can’t have ONLINE index maintenance performed.  As a result, it is generally a giant pain for indexing, a pain you only want to put up with if you absolutely have to.
Furthermore, we’re doing ourselves a disservice for performance, straight up. Unless you’re storing books, (max) hurts performance. Check out this blog post: http://rusanu.com/2010/03/22/performance-comparison-of-varcharmax-vs-varcharn/ 
In short, varchar(max) is burdensome overkill for our datasets.
So here’s the solution… Change varchar(max) to varchar(n), where n is an generous but appropriate number for that column’s data. If Excel creates varchar(max) columns for us when performing a data import wizard, change them to varchar(8000), which is the highest number you can assign to a varchar field.  Or better yet, once the data is in SQL, use this simple syntax to find out the max length of a column and then pad it.
For example: select MAX(LEN([yourcolumn])) from yourtable 
Problem is, our SSIS packages are all very picky about the data types and will break if we just change the data types. So, after making these table changes, you’ll need to open your SSIS package, open the data flow destination or other object, hit OK to apply the new metadata, save and deploy it again. No actual changes necessary.
This all came up because we have data quality issues with the fields Foo and Bar. Both of those columns are varchar(max). I’m dumping the varchar(max) data into temp tables with varchar(200) to get the queries to return in a reasonable amount of time. 
Let me know if you have any questions!
 William 

I like to use the word treatise to prepare my audience for verbosity.

Friday, May 03, 2013

Use the Default System Trace to Find Autogrowth Events

The SQL Server Management Studio built-in standard report for "Disk Usage" is very handy to view the performance-sucking autogrowth events for a database... until you want to see if autogrowth events have been occurring in any database on the server.

Autogrowth events are very problematic and can cause sudden, unpredictable and difficult-to-troubleshoot application timeouts and performance drops. This applies to any SQL Server database file, including those in SharePoint databases, system databases (such as the tempdb), and any user database data file or log file.


[As an aside, your friends in SharePoint know all about Autogrowth settings for their content databases as well.  Friend of the blog Drew at WerePoint posted this relevant blog entry recently about the timing of growth and recycle-bin emptying events. http://werepoint.blogspot.com/2013/05/on-proper-care-and-feeding-of-content.html ]


Ideally, as professional DBAs, we don't want to ever see an autogrowth event occur on our databases.  By proactively monitoring the space in our database files, we can grow database files on our own terms, on our own schedule (outside of busy hours), and without interrupting a user transaction.

Autogrowth events occur when a data file or log file has exhausted its reservation. SQL is forced to request the operating system that more space to be added to a file. During this request and the file growth, very little happens to the data file. This typically occurs during a user transaction - bad news!

The increment of the autogrowth might be defaulted to just 1 MB on your system, which means a single large transaction could result in a series of sequential autogrowth events before the transaction finally commits.  In the end, the argument over fixed autogrowth size versus percentage-based autogrowth settings are academic - a vigilant DBA should try to avoid autogrowth events in the first place.

But, while we're on the topic, 10% is a completely reasonable growth rate for data and log files for most databases. Depending on your available storage space, moving to a fixed growth rate (100mb-500mb) is more appropriate.

Instant File Initialization can also make a big difference in the time required to grow a database file (as well as restore database backups.) Note that this may need to be enabled at the virtual guest and SAN level for some virtual environments.

Here's a script to view the available space in every database file in a SQL Server instance, using the hacky old sp_MSforeachdb undocumented stored procedure:
exec sp_MSforeachdb  'use [?]; 
SELECT 
  ''DatabaseName_____________'' = d.name
, Recovery = d.recovery_model_desc
, ''DatabaseFileName_______'' = df.name
, ''Location_______________________________________________________________________'' = df.physical_name
, df.File_ID
, FileSizeMB = CAST(size/128.0 as Decimal(9,2))
, SpaceUsedMB = CAST(CAST(FILEPROPERTY(df.name, ''SpaceUsed'') AS int)/128.0 as Decimal(9,2))
, AvailableMB =  CAST(size/128.0 - CAST(FILEPROPERTY(df.name, ''SpaceUsed'') AS int)/128.0 as Decimal(9,2))
, ''Free%'' = CAST((((size/128.0) - (CAST(FILEPROPERTY(df.name, ''SpaceUsed'') AS int)/128.0)) / (size/128.0) ) * 100. as Decimal(9,2))
 FROM sys.database_files df
 cross apply sys.databases d
 where d.database_id = DB_ID() 
 and size > 0
'
One important piece of information we can get from the default trace when we look for autogrowth events is the time and duration of each autogrowth event.  I've personally used these two pieces of information to prove that random application timeouts experienced by a client were because of database file autogrowth events stalling their transaction for whole seconds while more space was added.

How far back does the default trace go? It depends on how much activity is being written to the default trace (which obviously needs to be enabled for this to work). The current active default trace file keeps up to 20mb of data.  Five default trace files are kept of 20mb each. This script only reads the latest file.
DECLARE @tracepath nvarchar(500)

SELECT 
 @tracepath = path 
 --select *
FROM sys.traces 
WHERE is_default = 1

--The trace automatically finds _n files, trim off the _nnn portion of the file name.
 select @tracepath = substring(@tracepath, 0, charindex('\log_', @tracepath,0)+4) + '.trc'

 print @tracepath

 SELECT 
  DBName    = g.DatabaseName
 , DBFileName   = mf.physical_name
 , FileType   = CASE mf.type WHEN 0 THEN 'Row' WHEN 1 THEN 'Log' WHEN 2 THEN 'FILESTREAM' WHEN 4 THEN 'Full-text' END
 , EventName   = te.name
 , EventGrowthMB  = convert(decimal(19,2),g.IntegerData*8/1024.) -- Number of 8-kilobyte (KB) pages by which the file increased.
 , EventTime   = g.StartTime
 , EventDurationSec = convert(decimal(19,2),g.Duration/1000./1000.) -- Length of time (in milliseconds) necessary to extend the file.
 , CurrentAutoGrowthSet= CASE
         WHEN mf.is_percent_growth = 1
         THEN CONVERT(char(2), mf.growth) + '%' 
         ELSE CONVERT(varchar(30), convert(decimal(19,2), mf.growth*8./1024.)) + 'MB'
        END
 , CurrentFileSizeMB = convert(decimal(19,2),mf.size* 8./1024.)
 , @tracepath 
 FROM fn_trace_gettable(@tracepath, default) g
 cross apply sys.trace_events te 
 inner join sys.master_files mf
 on mf.database_id = g.DatabaseID
 and g.FileName = mf.name
 WHERE g.eventclass = te.trace_event_id
 and  te.name in ('Data File Auto Grow','Log File Auto Grow')
 order by StartTime desc

 



EDIT: Smarter way to search the trace file _n files. This has also been reflected in the "autogrow events.sql" file in the toolbox.

Hunt Down Tables Without Clustered Indexes

This one's a gimme, but an underrated utility script when I enter a new environment and look for low-hanging fruit for easy improvements.

You'd be surprised how many vendors ship applications without a single index, much less properly aligned clustered indexes.

Heaps are bad.  You can't defragment a heap by performing index maintenance, despite old myths about the create/drop a clustered index trick.

Similarly, the myth of a heap performing better on pure inserts vs a table with a clustered index is an academic one.  Heaps are impossible to order, which means selecting from a heap is always a scan.

If your table will be constantly written to and rarely read, put a clustered index on it.  The writes will be sequential, resulting in fewer page breaks and less fragmentation, which mean more efficient storage and maintenance.

EDIT: Page "breaks" (I meant splits) aren't possible on heaps, but forwarding pointers, which create a maze of bad performance in heaps with nonclustered indexes. Thanks to commenters for pointing this out!

And when you have to look in that table for a stray logging record, it won't be a full table scan. If your table will be constantly written to and never read, why is it in your database?  Such a table doesn't exist.

This script is an easy one to use for finding heaps in a database, along with the row count and size of jumbled data that is haunting your database.  It is in my DBA toolbox folder alongside many other scripts.

--Doesn't work on 2000 databases or databases in 2000 compatability mode.  Need to change the db_id() syntax if so.

select 
 [Database Name] = db_name()
, [Table Name] = s.name + '.' + o.name
, p.row_count
, SizeMb= (p.reserved_page_count*8.)/1024.
from 
 sys.objects o
inner join  sys.schemas s on o.schema_id = s.schema_id
inner join  sys.dm_db_partition_stats p  on p.object_id = o.object_id 
inner join  sys.indexes si on si.object_id = o.object_ID
WHERE si.type_desc = 'Heap'
and  is_ms_shipped = 0
order by SizeMb desc


As an aside, but a different blog post altogether - a good clustered key is narrow, unique, unchanging and sequential - which is why IDENTITY integer columns are the perfect clustered key.  Compound clustered keys can be a big waste of space.

Thursday, October 11, 2012

Performance Tuning Presentation from October 2012 BRSSUG Meeting

Thanks to the 15+ folks who attended the Baton Rouge SQL Server User Group meeting last night!  I hope everyone enjoyed "everyone gets a book" night!

I presented on SQL Server Performance tuning, which began with Michelle Ufford's article from GoDaddy on how she saw massive performance gains just by correcting columns to more appropriate data types.  Here's a link to that article: https://web.archive.org/web/20130629210916/http://inside.godaddy.com/scaling-database-data-types/

Then we went through database compression, indexing and execution plans, index INCLUDE structure, the missing indexes feature, how foreign keys can affect performance, isolation modes and MAXDOP.

I've uploaded all my sample scripts in a .zip file to the brssug.org website here for you to download.

Thanks for attending, see you November 14!

UPDATE 20141104: Link to a cached copy of Ufford's article.

Saturday, July 18, 2009

How many threads is too much for SQL?

Here's a redacted version of a Q&A I had with one of the devs (who is not a DBA) at my consulting company. I thought it was worth sharing.

I got a question on multithreading in SQL 2005. Do you know much about it? Our client would like to know how many threads can be open before performance starts to suffer.


And my response to my colleague:

That's a really hard question to answer since SQL doesn't operate in application "threads". SQL server, with only one processor (and therefore one Scheduler) can handle many many threads from an application. More processors will enable it to more quickly respond to multiple threads, and SQL 2005 detects processor count automatically and scales. "Open threads" is not really something to measure, as a multithreaded application really doesn't correlate directly to SQL Server. Sure, you can see the threads connections open and close, but to measure performance, the count of threads is really not useful.

If they are worried about CPU capacity, there are a number of perfmon counters and internal dynamic management views (DMV's) to look at to see if CPU is a bottleneck.

In Perfmon, you want to be looking at

\System\Processor Queue Length (shouldn't exceed 2x # of CPU's)
\Processor(_Total)\% Processor Time (self-explanatory, lower number the better)
\Process(sqlservr)\% Processor Time (self-explanatory, shouldn't differ from the above if no other major apps are running on the server)

In DMV's, look for

sys.dm_os_wait_stats (You don't want to see a lot of CPU-related wait types)

Here is a good link from MS: http://technet.microsoft.com/en-us/magazine/2007.10.sqlcpu.aspx

There's obviously much more to look at with SQL server, but not knowing my audience I'm not sure how much detail you want. Sorry for the unclear answer but there really is no hard number of application threads to stay under. It depends a lot upon the other subsystems of the server as well (RAM and physical IO). Sounds like they want to do performance tuning. Sounds like an opportunity.



"You always admire what you really don't understand." -Blaise Pascal

Wednesday, May 27, 2009

Fancy Missing Indexes

UPDATED 5/30/2019, the latest version of this script is in the Toolbox here: https://github.com/SparkhoundSQL/sql-server-toolbox/blob/master/missing%20indexes.sql

UPDATED 5/28/2013, see notes below.

You've seen fancy ketchup. Here's fancy missing indexes.

This script is database specific. It does not use the server-wide DMV's alone, it joins to INFORMATION_SCHEMA.TABLES system tables to generate index create scripts. This is a very fast way to enter an environment, and take a peek at the indexing situation, something I do regularly as a DBA consultant.

SELECT 
 mid.statement
, create_index_statement = 'CREATE NONCLUSTERED INDEX IDX_NC_' + replace(t.name, ' ' ,'')
 + replace(replace(replace(ISNULL(replace(mid.equality_columns, ' ' ,''),'') , '],[' ,'_'),'[','_'),']','') 
 + replace(replace(replace(ISNULL(replace(mid.inequality_columns, ' ' ,''),''), '],[' ,'_'),'[','_'),']','') 
 + ' ON ' + statement 
 + ' (' + ISNULL (mid.equality_columns,'') 
    + CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END 
    + ISNULL (mid.inequality_columns, '')
 + ')' 
 + ISNULL (' INCLUDE (' + mid.included_columns + ')', '')  COLLATE SQL_Latin1_General_CP1_CI_AS
, unique_compiles, migs.user_seeks, migs.user_scans, last_user_seek, migs.avg_total_user_cost
, avg_user_impact, mid.equality_columns,  mid.inequality_columns, mid.included_columns
, quartile
--select *
FROM sys.dm_db_missing_index_groups mig
INNER JOIN --get only the top 20% of suggestions based on cost.
(select *, quartile = NTILE(5) OVER (ORDER BY avg_total_user_cost asc) from sys.dm_db_missing_index_group_stats) migs 
ON migs.group_handle = mig.index_group_handle
--and migs.quartile = 1
INNER JOIN sys.dm_db_missing_index_details mid 
ON mig.index_handle = mid.index_handle
inner join sys.tables t 
on t.object_id = mid.object_id
inner join sys.schemas s
on s.schema_id = t.schema_id

WHERE 1=1
and mid.database_id = db_id() -- current database only
--and  (datediff(week, last_user_seek, getdate())) < 6
--AND  migs.unique_compiles > 10
--and  migs.quartile >= 3
--and  migs.user_seeks > 10
--and  migs.avg_user_impact > 75
--and  t.name like '%salesorder%'

--order by avg_user_impact * avg_total_user_cost desc 
order by create_index_statement
Pardon blogspot's inane raw code posting ability.
  • The improv calc up top in the alternate ORDER BY is simply to evaluate the relative use of a potential index against others that show up in the report. Its not terribly scientific but weighs usage and value.
  • The WHERE clause items can be used depending on the situation. If your server has been operating for a while and you'd like to see what indexing needs recent changes have been generating, use the date filter. Have a lot of ad hocs? Filter on unique_compiles > 1. The third WHERE clause is the same as the improv value above, helps me filter out things that are too small to care about. Lastly, I filter out anything that won't have at least a 50% impact. Oftentimes, I'll set this last value much higher.
  • Joins to INFORMATION_SCHEMA.TABLES system tables to generate a proper index create statement. It follows a naming convention I prefer, IDX_NC_ for nonclustered indexes, followed by the table name and the key fields. For covering indexes, I'll remove the key fields from the name and just put "covering". Your naming convention for indexes is probably different.
  • Before creating any indexes from the missing index DMV's, do a little research on them and keep in mind that they reset with the server. It maybe useful to retain these over time, spanning multiple resets.
  • Oftentimes, you'll see many of the rows in this result overlap. The INCLUDES might be in a different order, or you'll see one suggested index that has all but one other field that another suggestion has. Intelligently combine these together, keeping in mind that order DOES matter in the key fields.
  • Lastly, if you've examined existing indexing on the table, use the provided creation script in the column create_index_statement .
  • This script, aside from a few tweaks here and there, is hardly an original creation of this blog. As part of your research on missing index DMV's, don't miss: http://blogs.msdn.com/bartd/archive/2007/07/19/are-you-using-sql-s-missing-index-dmvs.aspx
Other linkage:
https://learn.microsoft.com/en-us/previous-versions/sql/sql-server-2008-r2/ms179325(v=sql.105)?redirectedfrom=MSDN
https://www.mssqltips.com/sqlservertip/1634/find-sql-server-missing-indexes-with-dmvs/

Updated 5/28/2013 to reflect some changes, including getting rid of the INFORMATION_SCHEMA tables in favor of system tables, update the formatting, 

Thursday, May 21, 2009

clear out DTA indexes and statistics

Update: There's a saying that when you look at your old code, you should feel you want to improve it? This is one of those. WDA 20180927

Here's a helpful script I keep around in my portable HD toolbox. When the Database Tuning Advisor fails or is aborted for whatever reason, it can leave behind its hypothetical indexes and stats. This script will remove script out dropping these active indexes and stats from your database.

Uses the is_hypothetical flag to find indexes, this is reliable. Uses the _dta prefix to find stats, this is less reliable because it depends on naming convention. This assumes that when you implement DTA-recommended stats, you change their name (always a best practice). If you have implemented DTA-recommended statistics object and not changed their name, this script will remove script out dropping those too.
SELECT 'drop index [' + i.name+ '] on [' + schema_name(o.schema_id) + '].[' + object_name(i.[object_id]) + ']'
FROM sys.indexes i
INNER JOIN sys.objects o ON i.object_id = o.object_id
WHERE 1=1
and o.is_ms_shipped = 0
and o.type = 'u'
and i.name is not null
and i.is_hypothetical = 1  --find stats from the DTA

select 'drop statistics [' + schema_name(o.schema_id) + '].[' + object_name(i.[object_id]) + '].['+ i.[name] + ']'
FROM sys.stats i
INNER JOIN sys.objects o ON i.object_id = o.object_id
WHERE 1=1
and o.is_ms_shipped = 0
and o.type = 'u'
and i.[name] LIKE '_dta%' --find stats from the DTA

See also:
I originally posted this on tsqlscripts.com (drink!)
http://www.graytechnology.com/Blog/post.aspx?id=e21cbab0-8ae2-478e-a027-1b3b14e7d0b9
http://weblogs.sqlteam.com/mladenp/archive/2007/07/16/60257.aspx
http://www.sqlservercentral.com/Forums/Topic398549-360-1.aspx


"The great tragedy of science - the slaying of a beautiful hypothesis by an ugly fact." -T.H. Buxley