High Performance SQLite

In this video, I explain the importance of SQLite's unique feature, the row ID, which acts as a secret primary key. I show how the row ID can be an alias for a created primary key, ensuring stability and how using an integer as a primary key aligns it with the row ID for better database management. I also discuss the quirks of the row ID assignment algorithm and stress not relying directly on row IDs as they may change.

We've been talking a lot about data types and this is important because we need to know how to build out correct schemas. Right? So we've talked about columns and their affinities, strict typing, flexible typing. Now we need to talk about something else that is very very important and pretty unique to SQLite and that is the row ID. The row ID is a secret primary key for your table.

What what do I mean by that? Let's take a look at a few examples and then I'll explain the theory in more detail. If we were to create a table called example with a, with a column called n and we'll declare that as an integer. So if we were to create this table and then insert into example values 1 and 2 and 3.

Select * from example and that looks pretty good. Okay. So that's that's pretty normal so far. The thing that is different here is there is a secret primary key in this table. So if we were to select row ID and everything from example, there's our secret primary key.

This is the secret key that SQLite keeps track of. It goes by many names. So we could say select row ID, select row ID, select ID and select everything from example. And all of those come back as row ID. They're all aliases for each other.

So what is going on with this secret row ID? Well, a few things. One is it's a historical artifact. The creator of SQLite says, I wish it didn't work this way but to preserve backwards compatibility with trillions of databases, it works this way. I offer my sincere condolences.

So that's kinda humanizing. I like that. But what do we do with a row ID table and why is it there? So in most cases, in most other databases, the way that it actually works under the hood is you declare a primary key. You declare the primary key and then that is how the database keeps track of your data on the disk.

So if you have a auto incrementing big integer primary key and the keys are 12345, all of the rows, row 1 is gonna be right next to row 2, is gonna be right next to row 3, when it's actually physically written to the disk. This is called the clustered index. The clustered index defines how the data is stored on the disk. That's very important. In SQLite, the row ID is the clustered index.

So I'm gonna show you an example here where we create a primary key ourselves, but SQLite's still gonna use the row ID to actually physically put the data on the disk. Let's take a look. We're gonna go ahead and create a new table. Let's say, example UUID. We're gonna pretend here that we are using uuid's as primary keys, which we'll probably talk about that later.

So we'll say, uuid is a text and it is the primary key. So let's create this table and then we can insert into example UUID, values of, let's say, 12341234. Now, I know that is not a real UUID. This is just an example. Great.

So we insert that and then we'll do select star from example UUID. K. There's our UUID. And if we were to try to insert it again it would say, unique constraint failed. K.

That's that seems right. Primary key should only there should only be one of them. Something interesting is insert into example uuidvaluesofnull and select star from example, you you it. That's interesting. So we have just inserted a null primary key which is interesting.

But remember, that's not really the primary key. So if we were to select, let's say, row ID and everything from example UUID, there is your actual primary key. This UUID is just a secondary index key. This is just a secondary index with a unique constraint applied. This is 1 of those quirks of SQLite.

You can have a null primary key which seems crazy to me, but that's because you have that backup row ID that is actually the clustered index. That row ID declares how the data is stored on disk. When we talk about b trees, which we will, we're gonna talk a lot about b trees so brace yourselves or get excited 1 or the other. When we talk about b trees, I will show you that this, primary key that we have created is just a secondary index and why this could be a little bit slower because it has to traverse that secondary index and then jump back to the row ID to look up the actual data. Now, there are some times where the row ID and the primary key are the exact same thing.

So let's take a look at when that is the case. We're gonna create a new new table, not new, create a new new table called example, and let's just call this one example ID. And in this case, we're gonna say that ID is an integer that is a primary key. And if we say that and then we insert into example id values of 1 and well, I need to do 2. Select * from example ID.

That's the same as the u uid that we've been doing. Right? But now watch this. Select row ID, row ID, o ID, and everything from example ID, and they're all called ID. So what has happened here is we've created a table where the ID is the primary key.

We declared it as an integer primary key. And when we declared it as an integer primary key, SQLite looked at that and said, yes, that's the same as the row ID. I'm just gonna make those 2 aliases for each other. So now instead of having this separate primary key and the secret row ID, they're the exact same thing. I think this is probably the way that you want to go.

Because if you have a secondary key that is actually, you know, you declared it as your primary key, you'll have to do 2 index lookups to get to the data that you're looking for. I like IDs being integers. That's just something that I am used to. I like it. I there are reasons when we talk about btrees.

There are reasons where I think it's a good idea. But another reason I think it's important to alias the row ID. Here's another reason I think it's important to alias the row id. If you don't create a column that is an alias of the row id, then the row ID is subject to change. So in our example where we have, let's just look at it again.

So in our example where we have select star from example UUID, in this example, these row IDs right here, those row IDs are not necessarily guaranteed to be stable. Those probably are because they're compact and they're early on, they're 12, so there's nowhere that they could really go. But you are not guaranteed that that row ID will not change. In the case of select row ID from example ID, you're guaranteed that those will not change because that's now your data. Right?

Because you've created a column called id that happens to be an alias of row id, you're guaranteed that the ID column, that's stable. That would be weird if you put stuff in the database and it changed. But because row ID is secret and not supposed to be used by you, you really should never you should never address row ID directly. You especially shouldn't count on the row id's being stable, so don't take these row id's and put them in your application and think I can count on those. If the database is vacuumed or something else, then those row IDs could change, they could become more compact.

So never ever ever address the row ID directly. Always address your alias of or otherwise your primary key which would be the you you would in that earlier case. There's 1 more thing. 1 more thing I wanna show you about row IDs which is also a quirk. The row ID assignment algorithm tries to pick the next biggest number, but can't always do that.

So if we were to insert into example UUID, and this time we're gonna provide our own row ID along with the UUID. And so we'll say that the values are the biggest integer possible. And then 3456, 3456. Great UUID construction there. Select row ID and everything from example UUID.

We get that huge number, 92,000,000,000,000,000. I don't know what that number is. Now, let's say we are to insert another row without declaring a row ID. Insert into example uuid values of where did 3456, so let's do 4567, 4567. Now we select that back and look, it just picked one kind of in the middle.

I don't even know if that's the middle. It just picked one that was available. If we were to insert it again, we'll change that to 8 and then we'll change that to 9 and let's make some space for the big show. There we go. They're kind of all over the place.

So 3, 4, 5, 6 was the first 1 ending in 7, ending in 8, ending in 9. So the row ID selection algorithm will reuse or go back and put, fill in blanks. And so that's a that's another quirk. There's a way to prevent this which we'll talk about in the next video. But regardless of if you are using row ID or using an alias for row ID, it is possible to have them not be ever increasing.

And this is a little bit of an edge case, but there's a very real case where you delete a bunch of rows out of the middle and SQLite comes back and reuses those row IDs which I don't super love. I think it's a historical artifact. We're gonna look in the next video at the auto increment keyword which changes the row ID selection algorithm.

Full Course

SQLite for Production

Summary

Video Transcript