High Performance SQLite is an excellent and immersive course, featuring real-time typed commands and valuable edge case explanations. I highly recommend it for enhancing your SQLite skills.Han-Hsun Liu
Build with and integrate SQLite into your production applications.
I've made my test data available for you to use and follow along.
So far, we have learned a lot about indexes except for how to use an index. We've learned all of this, like, academic theoretical stuff that is valuable. I stand by that. I think it is valuable, but we gotta learn how to use these things, not just, like, what data structure they live in, which is pretty cool. But let's learn how to use these things.
So you may remember that I told you when you're building out your schema, you should look at your data and try to derive a schema out of that. When you're, figuring out where your indexes should go, you have to look at your queries. You have to look at your access patterns. If you showed me a table full of data and said where would where would the indexes go on this? I would say, I don't know.
What are your queries? How are you accessing the data? That's where you would drive your indexes from. Now you may be thinking, well, Aaron, indexes are good. I'm gonna add an index on every column because I'm a very, very smart boy.
That's a really good try but that's not what you're looking for. If you add an index on every column you must remember indexes are not not free. Indexes are not free. So it is a separate data structure that maintains a copy of part of your data. And if you add an index on every column, you're functionally duplicating your table.
Your, inserts, updates, and deletes are gonna be a lot slower because those separate structures must be maintained. And, frankly, your read, when you're actually just reading from the database, it's not gonna be as fast as you want because usually an index on a single column or many single columns is worse than a composite or a multi column index. So you may have heard another rule of thumb that you should index anything that shows up in the where clause. This is a better rule of thumb, honestly, but for us, since we're professionals now, I want us to consider the entire query. The where is important.
But, what about the ordering? What about the grouping? What about the even the selecting? We need to look at the entire query as we figure out where should these indexes go. So those are just some rules of thumb.
We're gonna be expanding on all of this over the next many videos, but let's take a look at a users table and start working on that right now. We're gonna be using this users table here. So I've got select star from users limit. Let's just do limit 1 so we can see. So I've got first name, last name, email, birthday.
I is short for I think is pro or something. It's just a boolean and we'll I think we have like 25% of our members are pro. Deleted at which is null for this person. 1,000,000 users in here. The, 1,000,000 users in here.
The, the mode is a little bit bad. So if we do mode list and run that again, you'll see a 1000000 users in this table. If you look down below this video, you'll find the ways you can get this table, this data set. So if you wanna play along, there's a way to get this loaded into Terso quite easily. If you want to just spin this up in the cloud, you can do that on Terso.
I highly recommend that. Or you can download it and play with it locally. Let's keep going. So the first thing, the first thing that I want to show you is if we do let's actually do, EQP on. This turns on something called the Explain query plan.
So if we turn that on, that is what's generating this part right here. Scan users covering index that user. So if we turn EQP off and run it again, you'll see that that is no longer being run and so this is a good way to turn the explain query plan on. Now, let's switch the mode back to box because I do like that a lot better. So that's good.
Now, if you're not operating on the CLI and you don't have EQP on, let's go ahead and turn that off. You can do explain and then do select star from, let's do explain select count star from users. Now this is gonna be well, just brace yourself. What? What?
I don't I I'll be honest. I have absolutely no idea what this means. There are this is this is well documented. There are there are you can go and you can read the documentation and learn exactly what all of this means. I think this is the actual, virtual machine instructions.
I don't know, man. There's no way I could read that. So instead of explain, which is explain as traditional in, other databases, explain query plan, select count star from users. Wow, that is a lot better. So I recommend using explain query plan.
There is another dot command, that we can look at here in a little bit, but we'll go over the explain query plan kind of as we're going, and I think I will do a full video on Explain query plan. But this is the best way, I think, to figure out what's going on under the hood. Explain, if you're some sort of super genius, you might be able to figure this out. This is not helpful to me. Okay.
So with that in mind, knowing that we can do explain query plan on or we can just do explain query plan, let's look at some stuff. So we'll just leave it on. That's fine. If we do select star from users where birthday equals what do we wanna do? Let's do, 1989, 0214 which is a very special day.
So if we do that, we get all of these people back. That's a lot. Let's go ahead and limit it down to 5. That's a little bit better. So you see that our query plan right now is telling us it is scanning the user's table.
So instead of doing any sort of fast b tree traversal, this is just scanning the entire users table looking for birthdays that match. So So the first thing we're gonna do is we're gonna add an index on birthday and then we're gonna see where that index can be used. To create this index, we just say create index b day. We're just gonna call it b day because we're fun, we're carefree. Create index b day on users and then the column that we're actually creating it on is birthday.
Now if we check the dot indexes, we'll see b day is there. You can also do pragma index list. So we're gonna list from the users table. There you go. You see that bday is not unique and it is not a partial index which is, we'll go over that later.
It's pretty interesting. It's it's it's a nice feature of SQLite. You can also do pragma index, I think it's index info and then you put in the name of b day. So there you see the column name is birthday. And if we were to do users email unique, you would see that that is on the email column.
So this, has proven that what has it proven? We've added the index and it's on the birthday column. That's about all it's proven. Now EQP on. So let's do select star from users where birthday birthday equals 1989-0214.
And if we run that, that's too many. Let's do limit 10. That's still too many. I don't want the whole screen taken up. There we go.
So if we do this again, limit 5, you'll see search users using index b day. So this already tell us tells us that SQLite is using that index which means indexes are great for strict equality lookups. So we're kind of like intuiting this as we go and the query plan is reading that back to us and saying, yes. That was a good index for that query. The next place this can be useful is if we do select star from users where birthday is less than 19 89, 02, 14 and we'll limit that down to 5 so we can see it all on the screen.
You'll see search using index birthday, birthday is less than. So we've got strict equality. We've got an unbounded range. We can also do a bounded range. So we'll say birthday, let's say birthday between and we wanna find all the people born in 1989, 1989, 12 31.
And you'll see that now we have a bounded range. So we've got strict equality, unbounded range, bounded range. Those are 3 places that, the index has helped us so far, but let's consider the entire query. Let's look at ordering, maybe by birthday. So if we order by birthday, so if we do select star from users, order by birthday descending limit 5, are we still using the index?
Yes. We are still using the index. So what this is gonna do is this is gonna go scan that birthday index for the first five birthdays and then take those row IDs and go back to the table and grab the rows that match. So we get index assisted ordering here. And just to prove it, if we were to say order by what is something that is not indexed, created at is not indexed, you'll see Oh, goodness.
We're scanning the entire users table and then we're using a temporary b tree to order by. So we're not getting any index assisted ordering here. I mean, we are, but that's because they created they constructed that, that, data structure for itself because it thought, man, it's gonna be a lot faster if I had a bee tree. And it says, alright, I'm gonna make a bee tree out of this. So, this is a way to show you, well, there might be there might be a good, a good index to add there.
I'm gonna go ahead and tease a little bit. We can turn expert mode on. And if we were to run that again, this is gonna tell you, it might be better if you created an index on created at and put it in descending order. What? SQLite has a built in index suggestion.
This is just a tease. We're gonna do a video on expert mode. And then we grouped by birthday instead of ordering it. We'll still throw a limit on there just for fun. You can see the counts come up over here and the birthdays come up over there.
But most importantly, still using an index. In fact, we're using a covering index which is pretty awesome. We'll talk about that later. But, so, what have we learned? We've learned, indexes can help with strict equality, unbounded ranges, bounded ranges, ordering, and grouping.
So, we've learned several places that indexes can be useful. We've learned a way to determine if SQLite is using them and we've teased expert mode and covering indexes. The next several videos, we're just gonna keep going on indexes.