SQL Tact: 09/01/2013

Saturday, September 28, 2013

Houston Tech Fest 2013 - SQL Admin Best Practices with DMV's

Here is the downloadable file with all my sample scripts and slidedeck from my SQL Admin Best Practices with DMV's presentation from the 2000+ person Houston Tech Fest 2013 at the Reliant Center in Houston!

Thanks for attending! If you attended my talk, shoot me some feedback here: http://speakerrate.com/william.assaf

Download scripts and slidedeck

Wednesday, September 04, 2013

Replicated Stored Procedure Execution

One feature that many of my clients are unaware of is that the batched execution of a stored procedure can be replicated under certain circumstances. This can result in significant improvement in performance for stored procedures on replicated databases.

This MSDN article does a good job of breaking down the basics, but here's a demonstrable example you can use.

The rules about batching up the stored procedure executions are fairly severe, and require the SERIALIZABLE isolation level for the stored procedure's transaction. The goal is to see a single 100-row transaction, not 100 single-row transactions. The latter, inefficient behavior is the default behavior that we want to avoid. We'll prove it later.

Step one: Execute the below script block in an existing replicated database. (Not production, obviously.) Or, set up a new simple transactional replication between two databases, then run the below script block.

--Create testing table
CREATE TABLE [dbo].[TestArticle](
 [id] [int] IDENTITY(1,1) NOT FOR REPLICATION NOT NULL,
 [text1] [nvarchar](100) NULL,
 [text2] [nvarchar](100) NULL,
 [text3] [nvarchar](150) NULL,
 [text4] [nvarchar](150) NULL,
 [text5] [nvarchar](200) NULL,
 [int1] [int] NULL,
 [int2] [int] NULL,
 [int3] [int] NULL,
 [int4] [int] NULL,
 [int5] [int] NULL,
 CONSTRAINT [PK_TestArticle] PRIMARY KEY CLUSTERED ( [id] ASC )
)
--Populate with random data
declare @x int
select @x = 1
WHILE @x < 100
BEGIN
 --Insert filler data
 INSERT INTO dbo.TestArticle
 (text1,text2,text3,text4,text5,int1,int2,int3,int4,int5)
 SELECT
     replicate(char(round(rand()*100,0)),round(rand()*100,0))
 , replicate(char(round(rand()*200,0)),round(rand()*100,0))
 , replicate(char(round(rand()*300,0)),round(rand()*100,0))
 , replicate(char(round(rand()*400,0)),round(rand()*100,0))
 , replicate(char(round(rand()*500,0)),round(rand()*100,0))
 , round(rand()*10000,0)
 , round(rand()*10000,0)
 , round(rand()*10000,0)
 , round(rand()*10000,0)
 , round(rand()*10000,0)
 select @x = @x + 1
END
go
CREATE procedure [dbo].[addheaprows] 
as
 --Insert filler data
 INSERT INTO dbo.testarticle
 (text1,text2,text3,text4,text5,int1,int2,int3,int4,int5)
 SELECT top 100
     replicate(char(round(rand()*100,0)),round(rand()*100,0))
 , replicate(char(round(rand()*200,0)),round(rand()*100,0))
 , replicate(char(round(rand()*300,0)),round(rand()*100,0))
 , replicate(char(round(rand()*400,0)),round(rand()*100,0))
 , replicate(char(round(rand()*500,0)),round(rand()*100,0))
 , round(rand()*10000,0)
 , round(rand()*10000,0)
 , round(rand()*10000,0)
 , round(rand()*10000,0)
 , round(rand()*10000,0)
 FROM dbo.testarticle
GO

Step two: Add both the table and the stored procedure to a new publication as articles. The stored procedure must be added as an article with Replicate option = 'Execution in a serialized transaction of the SP'.

Step three: Set up a default "TSQL" trace on the Subscriber instance.

Step four: Execute the below on the Publisher instance so that you can see the replicated stored procedure execution in action.

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
BEGIN TRAN testrepl
exec dbo.addrows
COMMIT TRAN testrepl

--In the trace, you should see one trace event for SQL:BatchStarting, textdata =

SET TRANSACTION ISOLATION LEVEL SERIALIZABLE; EXEC "dbo"."addrows"  ;SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

Step five: Now let's look at what happens normally without any modifications. Start up a default "TSQL" trace on the Subscriber instance and execute the below on the Publisher instance.

SET TRANSACTION ISOLATION LEVEL READ COMMITTED
BEGIN TRAN testrepl
exec dbo.addrows
COMMIT TRAN testrepl


--In the trace, you will see 100 trace events for RPC:Starting, textdata = 


exec [sp_MSins_dbotestarticle] '8C39EB22-71BF-488A-8066-00C4AFABA497', .... 

exec [sp_MSins_dbotestarticle] .... 

exec [sp_MSins_dbotestarticle] .... 

exec [sp_MSins_dbotestarticle] .... 

....

This is the default behavior, and it's a bit shocking to see for the first time.

Remember our rules about the SERIALIZABLE isolation level, which means this isn't necessarily appropriate for all environments. This change could lead to an increase in locking during the execution of your procedure, so educate yourself on heightened isolation levels and why they are important.

Tuesday, September 03, 2013

Actual Emails: TL;DR: Stop using varchar(max)

Wrote this email recently to a crew of developers who were shooting themselves in the foot with a database rich in varchar(max) data types.

Hey folks-

TL;DR: Stop using varchar(max). We’re not storing books.

We need to review and avoid the varchar(max) data type in our tables. Here’s a short treatise as to why.

In SQL Server, varchar(max) is intended to replace the old text data type, which was different from varchar(n) because it was designed to store massive amounts of data. Massive being greater than 8000 characters, and all the way up to 2gb worth of data. That’s what the varchar(max), varbinary(max), and nvarchar(max) data types are optimized for – HUGE blocks of data in a single cell. We should only use it if we're actually intending to store massive text and use the fulltext indexing engine (a completely separate and specific topic for text blocks).

This is an oversimplification, but varchar(max) is designed to store data differently, and specially for large text blocks. It appears to behave the same as a varchar(n) field, and that’s deceptive when we are throwing 100-200 characters in each row field.

The big drawbacks biting us right now about varchar(max) have to do with indexing, and this is regardless of how much data is actually in a varchar(max) field. A varchar(max) column can’t be the key of a nonclustered index, even if it never stores more than 8000 characters, and can’t have ONLINE index maintenance performed. As a result, it is generally a giant pain for indexing, a pain you only want to put up with if you absolutely have to.

Furthermore, we’re doing ourselves a disservice for performance, straight up. Unless you’re storing books, (max) hurts performance. Check out this blog post: http://rusanu.com/2010/03/22/performance-comparison-of-varcharmax-vs-varcharn/

In short, varchar(max) is burdensome overkill for our datasets.

So here’s the solution… Change varchar(max) to varchar(n), where n is an generous but appropriate number for that column’s data. If Excel creates varchar(max) columns for us when performing a data import wizard, change them to varchar(8000), which is the highest number you can assign to a varchar field. Or better yet, once the data is in SQL, use this simple syntax to find out the max length of a column and then pad it.

For example: select MAX(LEN([yourcolumn])) from yourtable

Problem is, our SSIS packages are all very picky about the data types and will break if we just change the data types. So, after making these table changes, you’ll need to open your SSIS package, open the data flow destination or other object, hit OK to apply the new metadata, save and deploy it again. No actual changes necessary.

This all came up because we have data quality issues with the fields Foo and Bar. Both of those columns are varchar(max). I’m dumping the varchar(max) data into temp tables with varchar(200) to get the queries to return in a reasonable amount of time.

Let me know if you have any questions!
William

I like to use the word treatise to prepare my audience for verbosity.