Become a SQLite expert

In the video, I explain how database indexes work, emphasizing that they read from left to right and stop at the first range condition. I show that multiple indexes sharing the same left prefix can be redundant, and you can often delete the simpler one to save space. However, if an invisible ID affects your queries, you must be careful not to remove the wrong index, as it could slow down data retrieval.

Hopefully, by now, you have seared into your brain left to right, no skipping, stops at the first range. That is how an index is used. It starts at the left. It moves to the right. It stops when it encounters the first range condition.

This also means 1 other thing in that indexes that share a common left prefix, stay with me, indexes that share a common left prefix are duplicative. Woah. What does any of that mean? Let's look at the code. Let's take a look at what we have so far.

And we've got nada, which is what we're after. And as a reminder, we've got bookmarks, migrations, and users. Migrations doesn't matter. We've got users and bookmarks. Okeydoke.

Let's say we need an index on, an index on email. That seems common enough. Let's do create index email email on user's email. So we do that and then we're we take a look and we're like, great. We've got our index.

Actually, you know what? We do a lot of querying by email and is pro. So if we can remember, select star from users, limit 2 is fine. That box mode is too bad. Is pro is there.

So let's create another index and we'll say create create index email is pro on users email is pro. So we create that index and now we have 2 indexes. Let's pull this guy back up. We have 2 indexes that have email as a leftmost column. They share a left most prefix.

That means that this index right here is a duplicate of this index even though it has an additional column in there. So let's clear that line and we'll say select star from users where email email equals aaron.frantis at example.com. So it says search users using index email. So it's using that other index, which that's fine. We would probably expect that.

But if we were to drop index email and we were to run this again, it just uses email is pro. So it just uses the other index. So the the takeaway here is if you look at your indexes and indexes will accumulate over time, right, as business needs change or as your data schema changes, indexes just kind of pile up. And so go dig through your indexes and look to see, do any of these indexes share a leftmost prefix? So, usually what you'll find is that you have an index on a single column like email and then an index that is a composite index that has email as the left column.

That does you no good. All that does is take up space, potentially confuse the query planner, Although, the query planner is pretty sophisticated but it is worthless. The only tiny tiny tiny caveat is that the index on email is actually an index on email row ID because remember every secondary index contains a pointer to get back to the main table. And in our case, id is an alias of row id. So when we create an index on email, actually we're creating it on email comma ID.

And when we create it on email is pro, we're actually creating it on email is pro ID. Why would this ever matter? Left to right, no skipping stops at the first range, so you can imagine that this might matter. This is the the outside outside use case where, 2 indexes are not duplicative of each other. So I'm gonna error that out and we'll say select star from users where email equals [email protected].

That's gonna use our email is pro index just fine. But if we threw an order by ID descending on there, now is pro is in the way, right? So is pro is blocking our access to this invisible ID that is automatically added at the end. So if we ran that, you'll see use tempbtree for order by. So it used the first part of the index, got to the second part, and said, ah shoot, that is not in my query so I can't use it, which means I don't have access to the third part of the index for sorting.

So let's do, create index on users or create index email on users email. Let's create that 1 again. And now if we were to do the same thing where email equals whatever order by id, we'll see it's using that other key, that other index email and the sort by temp b tree thing, that has gone away. Because it was able it was able to use the first part of that index for the strict equality lookup and the second part of that index for ordering. So what does this what does this all mean?

This means that any 2 indexes that share a left most prefix, so it has the same columns in the leftmost side of the index, those are duplicates of each other, and you can delete the 1 with fewer columns because the 1 with more columns in the index encompasses that 1 with fewer columns. The only caveat is if you are relying on that secret ID or row ID being there, then you probably can't delete that duplicate index. Most people are not relying most people don't even know that that second like that secret ID is in there and can be used. And so most people aren't relying on it. But if you are relying on it and you delete it, then your ordering may become quite a bit slower and I don't want you blaming me for that.

Full Course

SQLite for Production

Summary

Video Transcript