Become a SQLite expert

When deciding on primary keys, I generally recommend using integers, especially in SQLite, because they are efficient and align with the row ID, minimizing lookup times. However, if you are generating IDs on the client side or need to prevent ID conflicts, using UUIDs, GUIDs, or similar options can be beneficial, as they offer many permutations and avoid coordination issues. Ultimately, the choice depends on your specific application needs, but with SQLite, you have flexibility since the secret row ID prevents issues like B-tree fracturing.

Much ink has been spilled or maybe to put a modern spin on it, much footage has been recorded. Trying to answer the question, what should I use for primary keys? Should I use a big int? Should I use a u uid? Should I use a GUID?

Should I use a UUID? I don't know, man. I don't wanna ever tell you what to do because I'm not in your shoes. I don't know the scenario that you're operating in. And so I'm gonna try to give you as much information as possible and offer some suggestions and then let you, the very smart viewer, possible and offer some suggestions and then let you, the very smart viewer, determine what is right for your setup.

So my general recommendation is to use integers. Integers are great especially in SQLite because it becomes an alias of the row id which means you avoid that second lookup through the primary key to then go find the actual row in the clustered index. You avoid that because your primary key and your row ID are now the same thing. That's pretty nice. When you have, your primary key written to disk, pretty small compared to something like a uid or a uid or a guid or something like that.

It's much much smaller than a big long string of data. Right? Now, when it comes to u uids or or GUIDs or UUIDs or whatever, it's you don't have that same penalty. So when you're reading online and you're reading documentation and people are like don't use UUIDs as a primary key because you'll fracture the b tree. That's not true in SQLite because it uses the secret row ID as the clustered index.

So, these sorts of these sorts of keys that are not auto incrementing, integers can be really good in a lot of in a lot of scenarios, especially in the scenario where you're generating an ID maybe on the client side. So you're doing something like an optimistic UI and you create some of entity on the client side and you need to give it an ID without coordinating with the database. To do that, you cannot use incrementing big IDs because you don't know what the next one is. Especially, you don't know what the next one is by the time it gets to the database. So you could use something like a uid or a guid or a snowflake or a ulid.

You could use any of those. Generate the ID on the client side and then send it back to the application which then writes it to the database. And you're almost a 100% guaranteed to not have ID conflicts because there are so many variations. There are so many permutations of and stuff. You're basically guaranteed to never have a collision.

So that's one way that, a non integer primary key could be really helpful. It could also be really helpful if you're just doing a massive number of inserts and getting a hold of the next incrementing ID becomes a bottleneck, then you might go to something where you can create the IDs without coordination. I don't know that that is really gonna be the case for most for most, applications. Honestly, I think, Optimistic UIs generating IDs without coordination on the client side, that is a much more reasonable use case in my opinion for uids or GUIDs or snowflakes or whatever you want. So it is it is kind of up to you.

You do have more freedom in SQLite because you're not concerned about the b tree fracturing because it uses the secret row ID. So my general recommendation, use an integer that aliases the row ID If you have a very good reason to use u uids, gudids, uids, snowflakes, any of that, that's great. I don't really have a problem with that. They're gonna be bigger on disk, but you're not gonna fracture your clustered index because it's not being used for the clustered index. So with all of that information hopefully you can make a good decision for your application and your scenario.

Full Course

SQLite for Production

Summary

Video Transcript