Friday, March 08, 2013

Conversation History: The Continuing Case Against GUIDs

We had a cooperative relational database design exercise at the office last week as part of our regular department "Lunch and Learn" series, and inevitably one topic that came up is the use of GUIDs as the unique key for a table.

And before long, I had been goaded onto my soapbox to deliver my hellfire-and-brimstone sermon against the use of GUIDs as the primary key, much less the clustered index, of a table. (If you haven't heard this, you need to attend more Baton Rouge SQL Server User Group meetings.)

The case for GUIDs traces back to an oil rig case study, an actual design example we (at Sparkhound) encountered here in the Gulf South. There are hundreds of oil rigs out in the middle of the Gulf of Mexico, each with a sometimes-available Internet connection, and the need to sync back home to a onshore server.  By using GUIDs, each server can write a unique key to the same table, so the argument says, without any conflicts.

(This is not my complete soap-box against GUIDs.  Another day.)

Why not use a compound key of integers?  Surely, you must still identify what oil rig is sending the data, then using a rig-side identity column would provide a compound primary key with guaranteed uniqueness and at half the cost (4 bytes x 2) of a GUID (16 bytes).  That storage space adds up, and has a big impact on nonclustered indexes, fragmentation, page splits, etc.

After that lunch and learn ended, one of our bright new hires engaged me afterwards...

C.M. 1:22 PM
So, to play a little devil's advocate, take your oil rig example, and make it a mobile app
it may be impractical or impossible to keep a record of each client running your application
a user might run the app from multiple devices so you can't use their login

William A. 1:38 PM
the mobile app would still need to be registered to identify it as some sort of key
we may need to know about the equipment they're using or other business data 
that registration process would assign an ID to that equipment
it'd have to be linked to a user so its data can be reported consistently every time they check in
or if not to a user, some sort of hardware id.

C.M. 1:41 PM
i was thinking more like a consumer application, without any central management to register new devices, but i suppose the software could connect to the server the first time it runs and obtain a permanent device ID
depending if it's acceptable to require them to be online when they start using it for the first time

William A. 1:44 PM
even then, the local repository could be required to be updated with an account/device/user ID before syncing

C.M. 1:46 PM
so, i'll concede that there's almost always a way to avoid the guids

William A. 1:46 PM
i agree!
though I assume a system like SharePoint would be a nightmare without GUIDs
another alternative is ranges of ID's
so this oil rig gets values 1-1 million, the next gets values of 2-3 million, but i dislike that for a bunch of reasons.
but i've seen it done
at that point it turns the number into a smartcode, which i dislike.

C.M. 1:49 PM
the other context i see them being useful is when you need uniqueness among several different kinds of entities, but i know there are alternative ways to do that too
guids just make it easy

William A. 1:50 PM
easy... until you get to scale
and easy... if you're not the one in charge of index maintenance
and easy... if storage space is unlimited.
and easy... if you can hide bad performance with iron
otherwise... easier(?) maybe :)

C.M. 1:51 PM
yeah, like i said, easy :P

William A. 1:51 PM

C.M. 1:51 PM
anyway, i enjoyed the lunch and learn, thanks

William A. 1:51 PM
thanks for coming! nice to "work" with you for the first time
next time you're at a big convention with SQL tracks, there's usually someone there who will RAIL on GUID's
Kimberly Tripp is a SQL MVP and author who gives a 90min anti-GUID presentation that will make you CRY either in despair or laughter

C.M. 1:52 PM

William A. 1:52 PM
depending on how much you've used GUIDs

C.M. 1:52 PM
i haven't used them much personally

William A. 1:53 PM
was that an interview question? If not... it should be :)

C.M. 1:53 PM
only really used them when coding against the TFS API or recently Microsoft CRM
however, i've read about the [theoretical] use of them in some contexts

William A. 1:53 PM
yeah, as a result, SharePoint (and CRM) have different rules for treating database indexes and statistics and their own elaborate internal index maintenance plans

C.M. 1:54 PM
like then a client needs to generate an ID itself instead of letting the server/database do it
then -> when

William A. 1:54 PM
that's a good scenario when GUID's could be helpful too
in SQL 2012 there's a new feature called SEQUENCE
where an app can go and get the "next" value and use it to reliably insert their new primary key value.
Oracle has a similar feature

C.M. 1:56 PM
yeah, that's pretty nice

William A. 1:56 PM
sequences are the way to avoid having to insert, retrieve @@IDENTITY, then insert related tables, etc.
can get a sequence out of a function, perform one transaction worth of inserts

C.M. 1:57 PM
is a particular sequence tied to a table or what?

William A. 1:57 PM

C.M. 1:58 PM
i'll read up on it, good stuff

William A. 1:58 PM
welcome aboard man, have a great weekend too

C.M. 1:58 PM
you too, thanks

No comments: