High Performance SQLite

I explain how SQLite's partial indexes allow me to create indexes over just a subset of rows, making them smaller and faster than full-column indexes. By using partial indexes, I can optimize database queries for specific groups, like "pro" users or active emails, while also enforcing uniqueness where needed. This feature of SQLite helps me keep indexes efficient and tailored to my most common query needs.

One of SQLite's coolest features that MySQL doesn't even have is partial indexes. So what partial indexes allow you to do is put an index over a subset of the rows. So we've been putting indexes on entire columns, all the way down top to bottom. You can put an index on a part of that column or a subset of the rows. And this is really helpful when the index could potentially be huge, but you just want really a small part of it, or you want to enforce uniqueness across a subset of rows.

We'll look at those two examples, but first let me show you how to do it. In case you don't remember, this is what we're working with. Users, limit, 5, and this is the shape of the table, ID, first name, last name. This is is pro, and then we've got deleted we've got our timestamps over there. So let's look at this.

Select count select count star from users where is pro equals yeah. Let's take account a look at how many pro users we have. We have 44,000 pro users which means we have 900 and 55,000. Hey. Look at that.

955,000 not pro or free users. If we wanted, if we wanted to do a bunch of queries just against the pro members, Well, what would we wanna do? We could do something like we could say create index, email is pro. Email is pro on users, and we could say email is pro. We could do something like that, but a and that would that would work just fine, but we can get a more compact, more useful index if we create a partial index.

So let's clear this out and take a look. We can say create index email. We can go ahead and still call it, email is pro. That might be confusing with the naming convention so we'll just say pro emails. Usually, when I do email is pro, I expect it to be across the 2 columns and so I don't wanna confuse myself later.

So we'll say pro emails on users and then we're just gonna put it on the email, but we can add a where to this create index statement, which is so cool to me. So where is pro equals 1. So if we create that and then we do indexes, we see pro emails. If we do pragma index info, I think it might be one word. Pro emails.

Oh, great. So there we see we have created our index. We don't get a lot of details out of this, unfortunately. Let's go ahead and run select star from users where email we can just throw a like on there because I don't know exactly what I'm looking for. Like, let's say, AA percent and is pro equals 1.

Let's go ahead and limit that down, limit to 4. So we see we do use the scans of users table using the pro emails index. Now, if we were to let's do this again and just do let's see which one, SQLite will pick. Email is pro, email is pro, and then we're just gonna drop this off altogether. So if we create that, one, it's gonna take a whole lot longer.

It looked fast, but it's a lot longer. It's gonna take a lot longer because you're putting an index over a 1000000 records instead of 44,000. So it's gonna take a lot longer to create and it's gonna take a lot longer to use. And so we'll see we'll see if SQLite is smart enough to there you go. So it's still going to use that smaller index.

This index is better because it is so much smaller and it's not even having to do a secondary like elimination so it finds all the emails that are like AA and then it scans the index for is pro. We know by definition that any email that is in this index is a pro email because those are the only emails that went into the index in the first place. One drawback to partial indexes is it only works when your query matches the way that you defined the index. So if we were to run this again, but we do it with an is pro equals 0, that index is not going to be used. Whereas if we change it to is pro equals 1 and we run it again, it uses that pro emails.

So when you're looking at your data, you might look and see, am I always querying something and never querying something else? Right? Am I always querying, non deleted users and very rarely querying deleted users. Well, that might be a good way to make your index a little bit smaller. That example is also a good way to enforce uniqueness over a subset.

Let's look at that. I've added myself twice to this data set. So if we did select star from users where email equals aaron. Francis at example.com, you'll see I'm in here twice. Once as Aaron Francis, once as Guido Nicolas, which is much cooler name than Aaron Francis.

Here's the problem. I used to have a unique index on this. I dropped that because what I want to do is I want to allow users to delete their account but then later recreate it with the same email address. But I still want only one active email address per person. Even if you have 50 deleted accounts, I don't mind.

You can only have one active with that email address. Partial indexes can help us here. In fact, partial unique indexes can help us here. So what I'm gonna do is I'm gonna create a unique partial index. And this is gonna fail, but it will still be interesting.

Create unique index. We'll call it active emails on users. And then we'll say users email where deleted at is null. So this is the way that I declare who is active and who is not active. And amongst the active cohort, they must have a unique email address.

Watch this fail. Unique constraint failed and that is because both of these people are included in the active cohort and their email is the exact same. So update users set deleted at equal to datetime now where id equals 208. Sorry, Guido. 4 6 o.

So if we run that and then we will do the select again. Let's make some space. Let's do the select again. You see this person has now been soft deleted or, tombstoned or trash canned, whatever you want to call it. This person is still alive.

So if we were to try to create our unique index again, we would be able to do that. So now we have the ability to enforce uniqueness across a subset which is really interesting and opens up a lot of use cases. You also have the ability to create a smaller, faster, better index that meets the needs of your most used queries, which could be, users that aren't deleted or users that are pro. And so you can keep your indexes very, very small for your 95% use case and kind of eat the unindexed pain for your much less used queries. This is a great pattern that I'm really excited about that SQLite has.

I try not to gush too much, but I love this about SQLite.

Full Course

SQLite for Production

Summary

Video Transcript