May 23, 2015 UUID Primary Keys in PostgreSQL
This weekend Dr Nic shared with us a well written article discussing the shortcomings of auto-incrementing (serial) keys and an alternative approach. While discussing the article the question came up about how to go about using UUID keys in our applications and what we have to do within PostgreSQL in order to use them. PostgreSQL out of the box defines a UUID Data Type which is a great start. We then have three options for generating UUID's,
- Within your application code
- Within the database using the extension uuid-ossp
- Within the database using the extension pgcrypto
Generating them within our application will work quite nicely and is a good way to go, unless you want the database to handle creating these automatically for you which is usually where our point of view.
Only use uuid-ossp if you have a real and specific need for the functions it provides If your only need is to generate and index them then you do not need uid-ossp. (Update: See below.) For generating UUID's within the database an easy place to start is to use the
gen_random_uuid () function from the pgcrypto extension.
So how do we use this function? First we need to create the extension in the database we wish to use it with,
CREATE EXTENSION pgcrypto;
This is how you load pre-compiled shared library code which adds functionality into your PostgreSQL database.
Note specifically that the extension must be created (loaded) once for each database in which you wish to use it. Once it has been loaded into a running instance of the database server it will be there for use from then on spanning restarts.
Also be aware that if you have dumped and restored the database from one server to another running instance then depending on the method of the dump/restore you may need to load it into this new instance after restoring.
Once you have done this you can generate a random id,
Which returns a UUID Data Type.
Let's create a table with a UUID primary key and see how we use the
gen_random_uuid() function to populate our IDs for us,
CREATE SCHEMA IF NOT EXISTS snw; CREATE TABLE snw.contacts( id UUID PRIMARY KEY DEFAULT gen_random_uuid(), name TEXT, email TEXT );
We can now add entries into our newly created Stark & Wayne Contacts table,
INSERT INTO snw.contacts (name,email) VALUES ('Dr Nic Williams','drnic'), ('Brian Mattal','brian'), ('Wayne E. Seguin','wayneeseguin'), ('Long Nguyen','long'), ('Bill Chapman','bill'), ('Chris Weibel','chris'), ('Jeremey Budnack','jrbudnack'), ('Ruben Koster','rkoster'), ('Jamie Van Dyke','jamie'), ('Quintessence Anx','qanx'), ('McGowan','mcg'), ('高，秀娇 (XJ)','xj'), ('Geoff Franks','geoff'), ('Van Nguyen','vnguyen'), ('John Longanecker','jlonganecker') ; INSERT 0 15
Let's examine our table now,
SELECT * FROM snw.contacts; id | name | email --------------------------------------+------------------+--------------- 0d60a85e-0b90-4482-a14c-108aea2557aa | Dr Nic Williams | drnic 39240e9f-ae09-4e95-9fd0-a712035c8ad7 | Brian Mattal | brian 9e4de779-d6a0-44bc-a531-20cdb97178d2 | Wayne E. Seguin | wayneeseguin 66a45c1b-19af-4ab5-8747-1b0e2d79339d | Long Nguyen | long bc8250bb-f7eb-4adc-925c-2af315cc4a55 | Bill Chapman | bill 200393bc-8aaa-45a8-9093-80c4792348cd | Chris Weibel | chris cd881764-bea1-4249-b86d-f8fb8182eec1 | Jeremey Budnack | jrbudnack 970972dd-dce8-4c65-a85b-63735ada0fc9 | Ruben Koster | rkoster 1c225a3a-2c70-4d95-b87f-f086cbd20366 | Jamie Van Dyke | jamie 9f0bb16e-fc25-47f3-b60a-635b6224225a | Quintessence Anx | qanx 9788c636-936e-4dd6-b9d5-f340329142bd | McGowan | mcg 3d8b664f-ef5f-4587-a45c-f2991a1fc029 | 高，秀娇 (XJ) | xj f618f5a8-380e-44fb-9b4e-b3286f29dcc8 | Geoff Franks | geoff 35cb41dd-3edb-4483-a1fe-fc315243d2f8 | Van Nguyen | vnguyen 74b37cd4-75aa-4871-b17b-a5160428e589 | John Longanecker | jlonganecker (15 rows)
We see that each row has a UUID
id field which is the primary key and was automatically generated for us.
It was pointed out by @drewblas (thank you Drew!) that using
pgcrypto has a negative side effect with respect to keyspace fragmentation on disk for the tables. Drew told us that:
Random produces very fragmented inserts that destroy tables. Use
uuid_generate_v1mc()[instead] ... the keys are seq because they're time based. So all inserts go to the same data page without random io.
This makes sense due to the random probability distribution of the keys, it should be fragemented. This fragmentation however is not so good for efficiency of the database system itself. In order to gain the benefits of using UUID primary keys with lower keyspace fragmentation perhaps Drew points out that it is better to use
uuid_generate_v1mc() from the
uuid-ossp extension because underneath it is using a time based seq algorithm which you can read about in the postgresql documentation.