Aaron is anything but boring, he makes learning engaging and has given me more than one trick up my sleeve. I have been working with relational databases both professionally and for all my side projects and I've learned things about databases I didn't even know existed. Any course this man produces is an instant auto-buy because he makes it easy to do. You can tell he's put his heart and soul into this course and it'll stick with me longer than the current season of 'The House of the Dragon.' Wow is there any topic this guy can't make fun!Alex Maven
Build with and integrate SQLite into your production applications.
I've made my test data available for you to use and follow along.
Moving along at a pretty good pace and we're building up step by step. So far everything we've done has been putting an index on a single column. We have put an index on the birthday column, for example. Now, we're gonna start putting an index on multiple columns at once. So instead of just birthday, we're gonna put an index on first name, last name, and birthday.
But importantly, not as 3 separate indexes but rather as 1 composite or compound index. So we're gonna take these 3 columns and put them together in an index and we're gonna see when it works and when it doesn't. If databases are just rules, indexes are just rules also. And there are several rules with composite indexes. Let's take a look.
As a reminder, we have select star from users, limit 1. You can get this data down in the description below and you can spin it up on if you want. We still have dot EQP on, which that generates this query plan here which says we're just scanning the users. We're gonna leave that on because it's kind of important. That's the whole thing we're doing right now.
We've got first name, last name, email, birthday is pro, deleted at, created at, and updated at. We have been putting an index on birthday. Let's go ahead and create a better or at least more interesting index on several things. Create index. We'll call it we'll just call it multi on users and we're gonna put it in first name, last name, and birthday.
Okay. Let's let this run actually. I am going to talk to you about order quite a bit and I'm going to tell you that order matters quite a bit. The order that I'm talking about is this order right here. I'm talking about the order that you declare the index in.
I'm not talking about when we go to write select star from users where something something something. That is not the order that I'm talking about. To a database, a equals 1 and b equals 2 is the exact same thing as b equals 2 and a equals 1. Okay. So it doesn't matter what order you write your queries in.
This is that that part we talked about real early on where SQLite is gonna look at it, it's gonna parse it and it's gonna turn it into that virtual machine code. And so in your actual query, order doesn't matter. In your index, order matters quite a bit. I'm gonna give you the rules for composite indexes. I'm gonna tell you what they are and then I'm gonna show you that those rules are true.
So if you don't understand while I'm telling you, that's fine. Then we will look at it together and we will prove those rules to be true. The first rule, indexes are usable from left to right without skipping any columns. That's all you that's all you gotta know. Left to right, no skipping.
The second rule is it stops at the first range condition. Left to right, no skipping, stops at the first range condition. Okay. With that in mind, let's take a look. Okay.
So we have this multi index. So let's do that, pragma thing. We'll do prag index info and I think we just named it multi. So you can see here first name, last name, birthday and the sequence number. So this tells us what order we declared the index in.
First name is 0 then last name then birthday. So first name, last name, birthday. Left to right, no skipping. So that being said, select star from users where first name equals Erin. Let's put a limit on that because I think there are a lot of us in there.
Great. Search users using index multi first name equals. Okay, so we have this multi index that contains 3 columns but we're only using 1 of those columns in this query. But because it forms a leftmost prefix on that index, we're okay. We're using first name and the first column in that index is first name.
So we're accessing that index left to right. Watch this. So if we were to say select star from users where last name equals Francis limit 5. If we were to do this, you'll see it scans the entire table. But wait a second.
We put an index on first name, last name, birthday. Yes. But we have to access that index left to right, no skipping. And because we skipped first name here, that last name column is blocked by the first name column in that index. And so this is not this is not indexable.
Watch this. Let's hop back up to that guy and then we'll take that guy out and last name equals Francis. There aren't that many of those so we can just do that. And if we do that, you see look, now search using index multi first name and last name. So SQLite is telling us, great job.
I was able to use 2 parts of that multi index whereas before when we were just doing first name I was only able to use 1 part. So do you see the difference here? Still using we're still using that multi index twice, but when we have a first and last name, it says, great. I can use both parts. So So what we're learning so far is left to right no skipping is true, but also you don't have to use the entire index.
As long as you form a left prefix, you can just use parts of the index. So you could use first name, you could use first name, last name, or you could use first name, last name, birthday. You don't have to use the whole thing. What is this, what is this stops at the first range? So I think we've proven out the first rule which is you must go left to right.
Let's prove it 1 more time by saying select star from users where birthday equals, 1989-0214, limit 5. So scan on users. We were not able to use that index whatsoever. And if we do it again and last name equals Francis, you'll see scan on users. Shoot, we're still blocked by that first name and first name equals Aaron.
Now, that entire index is available to us because we started at the left which was first name and then went last name and then went birthday. And you see that in this query, I I put them in the query backwards, birthday, last name, first name because this order doesn't matter. This order matters 0% for anything ever. What matters is the order in the index. So, left to right, no skipping.
We're good. Gotta start at the left, gotta move to the right, no skipping. You can use part of it, you can use all of it, you just can't skip anything. Stops at the first range. What does stops at the first range mean?
Well, let's take a look. This query here, this query here is the 1 that allows us to use the entire index, first name, last name, and birthday. So what if we change what if we come over here to last name? And this is a little bit of a goofy query to say like last name less than. But we'll say last name is less than Francis.
First name is Aaron and birthday is 1989-0214. So if we do that, we have lost the ability to use this birthday because it stops at the first range scan. And so it used first name. It was a strict equality strict equality on first name. First name equals Aaron.
So it said great, first name equals Aaron. First name is the leftmost column. That's great. I can use that. Last name is less than Francis.
Okay. I can use that but because this is a range condition I'm gonna throw up the big stop sign and say no more. So then when we get to birthday equals it's like, sorry, we we had to stop at the first range condition. So what happens here is, when this index is being used, it goes through that b tree and finds directly, it finds all of the people named aaron with the first name of aaron. And then when it's down there in the leaf nodes, down at the bottom of that index, it starts scanning and looking for people with the last name of Francis.
And once that scan kicks off, there's no more direct access, there's no more traversal. It's just, alright, I'm down here. I'm gonna scan across and that's it. And then it takes those row ids, goes back to the main table and eliminates anyone with a wrong birthday. So left to right, no skipping, means you have to start with first name, last name, birthday.
You have to go in that order. You don't have to go all the way, but you have to start at the leftmost prefix. Stops at the first range means any time you have a range condition SQLite will use the index to satisfy that range condition but no further parts of the index can be accessed. So in this case, last name less than throws up a giant stop sign and says that's the end of the index. Even though birthday is actually there, it's just no longer accessible.
Left to right, no skipping, stops at the first range. With these rules in mind and now hopefully like a more deep understanding of them now that we've seen it here in the code, we can start to kinda like, update our rules of thumb on what columns we should index. We know now that like high cardinality, high selectivity are good. We also know, the most common equality conditions should probably go at the beginning of our composite indexes because 2 reasons. Most common means you can use a composite index and just use the first part of it.
If you have a bunch of searches that search on first name but don't include last name or birthday, this multi index is still great for that because first name is at the beginning. So now we're starting to see that maybe an index can satisfy multiple different queries. So common equality conditions should go at front. So common, we've already covered equality. Because it's going to stop at the first range scan, if you have any, queries that usually use range scans, so like created at is less than, or created at is greater than, or birthday is less than, or age is less than, anything like that.
You want those conditions to go at the end of your composite indexes because nothing is usable after that. You can't use anything beyond a range condition. So common equalities with high selectivity, high cardinality go at the beginning and range, columns that you normally use for a range scan should go at the end. That's gonna give you your most powerful and most flexible indexes to meet the needs of many different queries.