Aaron and Steve are a killer combo for high quality courses like High Performance SQLite. The course shows at great detail the ins and outs of SQLite and how to scale it to compete with any other database. I love it!Nik Spyratos
Build with and integrate SQLite into your production applications.
I've made my test data available for you to use and follow along.
Okay. It's time to get a little bit academic but not terribly academic. I'm not a computer science guy. I just am a database enthusiast. So we're gonna talk about B trees.
B trees are the underlying data structure that make indexing effective, that make it powerful, that make it fast. So the b tree is the way that the data is arranged when you create an index. For this example, we have a users table. So if I do select star from users, it's a very simple table. It has 10 people in it.
And what we're gonna walk through is we're gonna walk through me creating an index on the name column and then I will show you, just kind of visually what that structure looks like. And then we're gonna walk through that tree looking for a particular name, pretending that we're the database. So taking these 10 names and putting an index on them, you get a structure that ends up looking like this. Maybe calling this a tree is generous. It kinda looks like a tree, but you can see at the top we have 2 names, Isaac and Steve.
That is the root node. Now, if we jump all the way to the bottom, you'll see across the bottom are all 10 names and importantly, those 10 names are in order. So it starts on the left at Aaron and then it goes all the way across to the right with Virginia, the last name in the tree. The bottom of this tree, these nodes down here at the very bottom, these are called the leaf nodes. So sticking with the tree metaphor, the thing at the very, very end, those are the leaves.
That's what's at the very, very bottom. Now, the leaf nodes are interesting because they contain the data. The the data lives in the leaf nodes and we'll walk through this in a second but also the pointer back to the table lives in those leaf nodes. Now, let's let's walk through this tree as if we're looking for the name Jennifer. So we're the database and we have been issued a query that says, hey.
In fact, let me just write this query out. Let's say, we have been issued a query that looks like this. Select star from users where name equals Jennifer. And using this index that we have created, we're gonna walk through how the database actually does that. Okay?
Let's go. Every tree traversal is going to start at the root node. So we have Jennifer. We look at the root node and we have Isaac on the left and Steve on the right. Alphabetically, Jennifer falls in the middle, so we're gonna take the middle path.
That leads us down to this interior node here with just the name Simon. So we compare Jennifer to Simon and Jennifer alphabetically is less than Simon and so we follow the left path of the tree down into the leaf node. And then once we're down in the leaf node, we just search for the name Jennifer and there is Jennifer. Let's do this again looking for the name Taylor. So we're gonna start at the root node again, and this time Taylor is greater than the right side of the node, which contains Steve.
So because Taylor is greater than Steve, we're gonna go down the right side of the tree. And then we hit a node that actually contains Taylor. And because it is equal to Taylor, we follow the right side of the tree again. And then down in the leaf nodes, we find what we're looking for which is Taylor. So what is the point of having an index?
It is cool. You must admit that it is cool. But what's the point? Well, the point is you want to avoid reading the entire table. So our alternative to having an index is something called a table scan, which scans the table.
It's all right there in the name. So if you don't have an index and you ask SQLite to find, the first name Taylor, it's gotta literally go and read every single row in the table looking for Taylors. When you create an index like this, it puts the names in order and it puts it in a structure that is easily traversable. So you can imagine that it's faster when you have 10 rows. That doesn't move me that much.
But once you hit a 1000, 10,000, a 1000000000 rows, you need this sort of so that you can skip over, you know, 0.9999000000000 and you just traverse the nodes to find exactly what you're looking for. So an index is that secondary data structure that maintains a copy of part of our data. And, in the next video, we'll see that it contains a pointer back to the main table.