Saturday, January 21, 2012

Actual Emails: Is "NOLOCK is the epitome of evil?"

Sent: Friday, January 20, 2012 10:30 AM
To: William Assaf
Subject: dba has a question
William,

Can you send me references to support my contention that NOLOCK is the epitome of evil?
That is, that it can return invalid and/or unrelated data.
He is still suggesting that folks use NOLOCK (rather than READPAST).
We have to work around other poorly written queries, so some of these are inevitable.

Thanks,
[my colleague's name redacted to protect the innocent]

--------------------------------------------------------------------------------------------

The basic danger is that with NOLOCK, the data can change out from underneath the query while it is in progress. This doesn't require the query to run for a long time, it could happen even on simple queries.
He should know that SQL doesn't read data out of a database like an excel spreadsheet, it reads a series of addresses all across a hard disk. Without even a shared read lock, the data in those addresses can change out from underneath a query using NOLOCK or READ UNCOMMITTED (same thing, different keywords.)

The more appropriate and sometimes-used term for this isolation level is CHAOS. You could (and there are labs out there to prove this) end up with blank data, malformed bytes, or worst - wrong data. The likelyhood is low admittedly, but the potential for data corruption is misreporting is not something that should be invited.
I would be strongly against using this isolation level on any query that could be attached or provide source data from an update/insert/delete statement, which would commit the bad data. Most select statements are the basis for some update, unfortunately.

Perhaps on a query that is used frequently on a constantly-refreshing dashboard would be a situation in which NOLOCK would be appropriate. But then, there are better, safer alternatives like SNAPSHOT isolation mode, or better architectural decisions to make data more available for read-heavy queries.

Good examples and info here:

-William

2 comments:

Steven Liszewski said...

In my application, I need to calculate a time interval between adjacent rows. To do this, I use a CTE:
WITH PointSampleRows AS
( SELECT ps.*, ROW_NUMBER() OVER (ORDER BY ps.PointID,UTCDatetime) AS rn FROM [tbl_PointSample] ps join
[tbl_Point] p on ps.PointID = p.PointID)
INSERT INTO [analysis].[IntervalHistogram]
SELECT mc.PointId
, DATEDIFF(second, mc.UTCDatetime, mp.UTCDatetime)
, count(*)
, CAST(0 as bit)
FROM PointSampleRows mc JOIN
PointSampleRows mp
ON mc.PointId = mp.PointId
and mc.rn = mp.rn - 1
group by mc.PointId
, DATEDIFF(second, mc.UTCDatetime, mp.UTCDatetime);
I was concerned that this query running over the entire table will end up getting escalated to a SHARE TABLE lock, locking out concurrent writers to this table, so I ran it with NOLOCK. This query returns utter garbage everytime. SNAPSHOT ISOLATION works great.

w said...

Steven-
GREAT example. Thanks for sharing!
-William