According to best practices, when modeling a MySQL relation that has a UUID as its id, you should still create a hidden numeric
auto_increment field for the primary key. Doing so has all sorts of performance advantages (see this post from the founder of Percona or these interesting benchmarks).
However, if you’re used to writing upsert queries with MySQL’s
ON DUPLICATE KEY UPDATE... syntax, you’ll soon find that doing so is a recipe for wasting auto increment values.
For example, consider the following create table statement:
Performing an upsert on the table will consume an auto increment value each time, regardless of whether the query only updates a row.
In the insert case, a record with the specific UUID doesn’t exist, and the query auto increments
hidden_id as you would expect. However, when you trigger the duplicate key part of the query during an update, an auto increment value is also consumed because MySQL first tries to perform the update (consuming an id), detects the duplicate, and then performs the update.
This is a simple oversight that is easy to make, but left unchecked, can waste much of your id space. It should also be noted that the same problem can be experienced when using
INSERT IGNORE.... To avoid either type of query, you should do the following:
- Try to insert the record (
INSERT INTO contrived_examples (id, updateable) VALUES (0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa, 42))
- Check/catch the case that no rows were inserted
- If no rows were inserted, update the existing row (
UPDATE contrived_examples SET updateable=42 WHERE id=0xaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa)
This issue is why I spent time changing all our fancy upserts into separate update and insert queries shortly after our DevOps Monitoring Platform, Blue Matador, was released. We monitor servers and applications all day every day and it didn’t take long for us to notice a very large gap in the ids that were being generated.
Hopefully this post helps you avoid the same level of tedium.
Monitor your MySQL databases with Watchdog, the forever free server monitor from Blue Matador.
Mark is a full-stack engineer who cut his teeth on professional software development at Lucid Software where he specialized in building scalable backend systems. He's a huge fan of open source software and is one of the primary authors of Relate (database access library) and Cumulus (AWS infrastructure management). Mark loves watching shows with his wife, studying Chinese, and hunting Minecraft chickens.