pointers, solutions and scripts for the SQL DBA
Not intended to replace common sense

1/21/2012

Actual Emails: Is "NOLOCK is the epitome of evil?"

Sent: Friday, January 20, 2012 10:30 AM
To: William Assaf
Subject: dba has a question
William,

Can you send me references to support my contention that NOLOCK is the epitome of evil?
That is, that it can return invalid and/or unrelated data.
He is still suggesting that folks use NOLOCK (rather than READPAST).
We have to work around other poorly written queries, so some of these are inevitable.

Thanks,
[my colleague's name redacted to protect the innocent]

--------------------------------------------------------------------------------------------

The basic danger is that with NOLOCK, the data can change out from underneath the query while it is in progress. This doesn't require the query to run for a long time, it could happen even on simple queries.
He should know that SQL doesn't read data out of a database like an excel spreadsheet, it reads a series of addresses all across a hard disk. Without even a shared read lock, the data in those addresses can change out from underneath a query using NOLOCK or READ UNCOMMITTED (same thing, different keywords.)

The more appropriate and sometimes-used term for this isolation level is CHAOS. You could (and there are labs out there to prove this) end up with blank data, malformed bytes, or worst - wrong data. The likelyhood is low admittedly, but the potential for data corruption is misreporting is not something that should be invited.
I would be strongly against using this isolation level on any query that could be attached or provide source data from an update/insert/delete statement, which would commit the bad data. Most select statements are the basis for some update, unfortunately.

Perhaps on a query that is used frequently on a constantly-refreshing dashboard would be a situation in which NOLOCK would be appropriate. But then, there are better, safer alternatives like SNAPSHOT isolation mode, or better architectural decisions to make data more available for read-heavy queries.

Good examples and info here:

-William

2 comments:

  1. In my application, I need to calculate a time interval between adjacent rows. To do this, I use a CTE:
    WITH PointSampleRows AS
    ( SELECT ps.*, ROW_NUMBER() OVER (ORDER BY ps.PointID,UTCDatetime) AS rn FROM [tbl_PointSample] ps join
    [tbl_Point] p on ps.PointID = p.PointID)
    INSERT INTO [analysis].[IntervalHistogram]
    SELECT mc.PointId
    , DATEDIFF(second, mc.UTCDatetime, mp.UTCDatetime)
    , count(*)
    , CAST(0 as bit)
    FROM PointSampleRows mc JOIN
    PointSampleRows mp
    ON mc.PointId = mp.PointId
    and mc.rn = mp.rn - 1
    group by mc.PointId
    , DATEDIFF(second, mc.UTCDatetime, mp.UTCDatetime);
    I was concerned that this query running over the entire table will end up getting escalated to a SHARE TABLE lock, locking out concurrent writers to this table, so I ran it with NOLOCK. This query returns utter garbage everytime. SNAPSHOT ISOLATION works great.

    ReplyDelete
  2. Steven-
    GREAT example. Thanks for sharing!
    -William

    ReplyDelete

All comments are moderated before publishing. If you find something wrong or disagree, please comment so that this blog can be as accurate and helpful as possible.