SQL Tact: Actual Emails: Is "NOLOCK is the epitome of evil?"

Saturday, January 21, 2012

Actual Emails: Is "NOLOCK is the epitome of evil?"

Sent: Friday, January 20, 2012 10:30 AM
To: William Assaf
Subject: dba has a question

William,

Can you send me references to support my contention that NOLOCK is the epitome of evil?

That is, that it can return invalid and/or unrelated data.

He is still suggesting that folks use NOLOCK (rather than READPAST).

We have to work around other poorly written queries, so some of these are inevitable.

Thanks,

[my colleague's name redacted to protect the innocent]

--------------------------------------------------------------------------------------------

The basic danger is that with NOLOCK, the data can change out from underneath the query while it is in progress. This doesn't require the query to run for a long time, it could happen even on simple queries.

He should know that SQL doesn't read data out of a database like an excel spreadsheet, it reads a series of addresses all across a hard disk. Without even a shared read lock, the data in those addresses can change out from underneath a query using NOLOCK or READ UNCOMMITTED (same thing, different keywords.)

The more appropriate and sometimes-used term for this isolation level is CHAOS. You could (and there are labs out there to prove this) end up with blank data, malformed bytes, or worst - wrong data. The likelyhood is low admittedly, but the potential for data corruption is misreporting is not something that should be invited.

I would be strongly against using this isolation level on any query that could be attached or provide source data from an update/insert/delete statement, which would commit the bad data. Most select statements are the basis for some update, unfortunately.

Perhaps on a query that is used frequently on a constantly-refreshing dashboard would be a situation in which NOLOCK would be appropriate. But then, there are better, safer alternatives like SNAPSHOT isolation mode, or better architectural decisions to make data more available for read-heavy queries.

Good examples and info here:

http://blogs.msdn.com/b/craigfr/archive/2007/06/12/query-failure-with-read-uncommitted.aspx

http://sqlblogcasts.com/blogs/tonyrogerson/archive/2006/11/10/1280.aspx

http://summit2009.sqlpass.org/Agenda/PrePostConferenceSessions/BenGanQA.aspx

http://blogs.msdn.com/b/sqlcat/archive/2007/02/01/previously-committed-rows-might-be-missed-if-nolock-hint-is-used.aspx

http://blogs.msdn.com/b/davidlean/archive/2009/04/06/sql-server-nolock-hint-other-poor-ideas.aspx

-William

2 comments:

Steven Liszewski said...: In my application, I need to calculate a time interval between adjacent rows. To do this, I use a CTE:
WITH PointSampleRows AS
( SELECT ps.*, ROW_NUMBER() OVER (ORDER BY ps.PointID,UTCDatetime) AS rn FROM [tbl_PointSample] ps join
[tbl_Point] p on ps.PointID = p.PointID)
INSERT INTO [analysis].[IntervalHistogram]
SELECT mc.PointId
, DATEDIFF(second, mc.UTCDatetime, mp.UTCDatetime)
, count(*)
, CAST(0 as bit)
FROM PointSampleRows mc JOIN
PointSampleRows mp
ON mc.PointId = mp.PointId
and mc.rn = mp.rn - 1
group by mc.PointId
, DATEDIFF(second, mc.UTCDatetime, mp.UTCDatetime);
I was concerned that this query running over the entire table will end up getting escalated to a SHARE TABLE lock, locking out concurrent writers to this table, so I ran it with NOLOCK. This query returns utter garbage everytime. SNAPSHOT ISOLATION works great.; January 24, 2012 at 7:02 AM
w said...: Steven-
GREAT example. Thanks for sharing!
-William; January 24, 2012 at 7:34 AM