Sent: Friday, January 20, 2012 10:30 AM
To: William Assaf
Subject: dba has a question
To: William Assaf
Subject: dba has a question
William,
Can you send me references to support my contention that NOLOCK is the epitome of evil?
That is, that it can return invalid and/or unrelated data.
He is still suggesting that folks use NOLOCK (rather than READPAST).
We have to work around other poorly written queries, so some of these are inevitable.
Thanks,
[my colleague's name redacted to protect the innocent]
--------------------------------------------------------------------------------------------
The basic danger is that with NOLOCK, the data can change out from  underneath the query while it is in progress.  This doesn't require the query to  run for a long time, it could happen even on simple queries.  
He should know that SQL doesn't read data out of a database  like an excel spreadsheet, it reads a series of addresses all across a hard  disk.  Without even a shared read lock, the data in those addresses can change  out from underneath a query using NOLOCK or READ UNCOMMITTED (same thing,  different keywords.)
The more appropriate and sometimes-used term for this  isolation level is CHAOS.  You could (and there are labs out there to prove  this) end up with blank data, malformed bytes, or worst - wrong data.   The likelyhood is low admittedly, but the potential for data corruption is  misreporting is not something that should be invited.
I would be strongly against using this isolation level on  any query that could be attached or provide source data from an  update/insert/delete statement, which would commit the bad data.  Most select  statements are the basis for some update, unfortunately.  
Perhaps on a query that is used frequently on a  constantly-refreshing dashboard would be a situation in which NOLOCK would be  appropriate.  But then, there are better, safer alternatives like SNAPSHOT  isolation mode, or better architectural decisions to make data more available  for read-heavy queries.
Good examples and info here:
-William
 
2 comments:
In my application, I need to calculate a time interval between adjacent rows. To do this, I use a CTE:
WITH PointSampleRows AS
( SELECT ps.*, ROW_NUMBER() OVER (ORDER BY ps.PointID,UTCDatetime) AS rn FROM [tbl_PointSample] ps join
[tbl_Point] p on ps.PointID = p.PointID)
INSERT INTO [analysis].[IntervalHistogram]
SELECT mc.PointId
, DATEDIFF(second, mc.UTCDatetime, mp.UTCDatetime)
, count(*)
, CAST(0 as bit)
FROM PointSampleRows mc JOIN
PointSampleRows mp
ON mc.PointId = mp.PointId
and mc.rn = mp.rn - 1
group by mc.PointId
, DATEDIFF(second, mc.UTCDatetime, mp.UTCDatetime);
I was concerned that this query running over the entire table will end up getting escalated to a SHARE TABLE lock, locking out concurrent writers to this table, so I ran it with NOLOCK. This query returns utter garbage everytime. SNAPSHOT ISOLATION works great.
Steven-
GREAT example. Thanks for sharing!
-William
Post a Comment