High Performance
SQLite
Introduction
Introduction to this course
Introduction to SQLite
SQLite is safe
It's just a file
Running SQLite locally
Good uses for SQLite
SQLite's limitations
The documentation
SQLite Internals
SQLite's structure
SQLite's file format
Dot commands
Pragmas
Virtual tables
Schema
Flexible types
Types
Strict types
Dates
Booleans
Floating point
Rowid tables
Auto increments
Without rowid
Generated columns
Optimizing SQLite
Locking
Rollback mode
WAL mode
WAL vs Journal benchmarks
Busy timeout
Transaction modes
Vacuum
Analyze & Optimize
Suggested pragmas
Faster inserts
Indexes
Introduction to indexes
B+ trees
Primary, secondary, and clustered indexes
Without rowid tables
Benchmarking without rowid tables
Imposter tables
Primary key data types
Where to add indexes
Index selectivity
Composite indexes
Composite ordering
Covering indexes
Partial indexes
Indexes on expressions
Automatic indexes
Duplicate indexes
Indexing joins
Advanced SQL
Explain
Index obfuscation
Joins
Subqueries
Unions
CTEs
Recursive CTEs
Window functions
Dealing with NULLs
Row value syntax
Indexed sorting
Upserts
Returning
Aggregates
Triggers
Operating SQLite
Altering schema
Altering schema with tools
Multi-database
Multi-tenancy
Backups
Exports
How to corrupt SQLite
Advanced JSON
Intro to JSON
JSON vs. JSONB
Argument types: path, value, JSON
JSON5
Valid JSON
Creating JSON objects + arrays
JSON Extraction
Updating JSON
JSON aggregates
JSON table functions
Indexing JSON
Full Text Search
Creating FTS tables
Performing searches
Ranking results
Highlighting matches
Bonus Interviews
DHH discusses SQLite in Ruby on Rails
Distributed SQLite with Litestream and LiteFS
Offline-first, multiplayer SQLite
Production SQLite with Turso and libSQL
Migrating from Postgres to SQLite with Kent C. Dodds
Ruby on Rails with SQLite
Moving from Redis to SQLite with Mike Buckbee
Locked video

Please purchase the course to watch this video.

Video thumbnail
Advanced SQL
Subqueries

Full Course

$
179
$229
USD, one-time fee
Aaron is an incredible! Every video is packed full of useful information explaining concisely, I learned so much from this course!
Derek Dorr
Derek Dorr

SQLite for Production

Build with and integrate SQLite into your production applications.

Test Data

I've made my test data available for you to use and follow along.

Download
or
Use on Turso

Summary

In this video, I explain why I love using subqueries to filter data from one table based on another in situations where traditional joins might create unnecessary duplicates. I demonstrate how to find users with more than 16 bookmarks using a subquery and also explain the importance of avoiding correlated subqueries to keep performance efficient. By watching, you’ll learn how to effectively use subqueries for better database querying and avoid common pitfalls that can slow down your queries.

Video Transcript

In this video, we're gonna talk about sub queries, which I personally love. So a sub query is really, really helpful when you have data in 1 table, we'll say table a, and you want to eliminate, or include things in table a based on data in table b. Now what you could do in some cases is you could join stuff together and then filter it out after. But if you do a left join, you're gonna end up with potentially tons and tons of rows in the first table because of that cartesian multiplication. Right?

So if you have users that have many bookmarks and you join the bookmarks in, you end up with a bunch of duplicate users which may be fine, but in a lot of cases that's not exactly what you're looking for. So let me lay out a scenario for us here. We want to find users that have more than, we'll say, 15 bookmarks. Now, there are a few ways we could do this. So we'll do we'll start with select star from users and we could say something like left join bookmarks on users dot id equals user id.

I don't like this by the way. On user id group by users dot ID. Oh, this feels bad already. Having count greater than 15. So the reason I don't like this is we're bringing along a bunch of data that we we don't actually need.

We're not actually after the bookmarks, we're after the users that have that many bookmarks. You can imagine you want users with more than, 5 projects and projects are in a related table. Or you want projects that have, fewer than 1 task in it because that's a project that's likely been abandoned and you wanna send an alert or a notification to somebody saying, hey, you made this project and then you never added any tasks. Don't you wanna add some tasks? So this is the same this is the same kind of query, that you could imagine in your domain application.

So let's take a look at how we can do this with a sub query. We're gonna start by figuring out how we can figure out the user IDs of users that have more than 15 bookmarks and then we're gonna wrap that up into a subquery. So the first thing that we're gonna do is we will do select user ID from bookmarks. We're just gonna limit that down. Okay.

User ID from bookmarks and we will group by user ID, group by user ID having count greater than 15. I wonder how many that is. 75. That's really, that's not that many. What if we change it to 16?

That's that's more workable. So what we've done here is we've created, we've started by writing the inner part of the query. So now we have a bunch of user IDs. What do we do with those?

This having just means it operates after the group by and if there is no group by, having is functionally equivalent to where. This wouldn't work because there's no longer a group by, but when you have a group by, you can throw a having on and it will filter after the grouping. Kinda nice. Alright. Again, an embarrassment of riches with these, this line spacing.

We don't need that much. This is probably fine. Everybody's gonna be complaining because it's not exactly the style that you prefer, but that's okay. Alright. So we've got the user ID's.

Now, we have a few options here. Let's, make some space and come up here. We're gonna select select first name and we'll select ID first name and last name from users limit 10. That's probably what we're after. Is email in there?

I bet email is in there too. Emails, this is a good report. So the boss man or woman is asking for, hey, give me a report of the best people, meaning the people that have made the most bookmarks. And so this is something that looks right. Hey, boss.

Here's ID, first name, last name. This works really well. What we wanna do though is we wanna limit it down to people that have more than 16 bookmarks. So we can do that I hate that. We can do that by so from users limit 10 where so from users where ID is in a subquery.

So we're gonna plop the subquery right there. You may be familiar with something like this where you hard code, some IDs in there or something. Totally legit. That's still completely fine. I would say that if you're doing something like we're doing here, a subquery is going to be better because this allows this gives SQLite all the information.

And under the hood, it might turn it into something called a semi join or an anti join or it might use some other optimization strategy. It might use some sort of index assisted lookup. Whereas if you just hard code the ID's, you're kind of taking a little bit of that context or that power away from SQLite. Also, it is possible to run this query, get all of the ID's back and then in your application form a new query, form this query with the ID's embedded and send it back over to SQLite. This is less of a problem in SQLite because there's no network time in another database.

You're having to then make 2 round trips, 1 to get the ID's out and then you build up this giant query on the application sub query in. So we're gonna take that subquery that we wrote, get that guy out of there, and now we have to change actually we don't. So we only selected user ID. So ID and user ID, those are the keys that should match each other. So now if we run this we should see we have 19 people.

And is that how many? So if we were to just run the subquery, we get 19. So we should have 19 up here and we do. Those are the 19 people that have more than 16 bookmarks. And if you wanted to join in the bookmark count, then you're back to a join.

But you can still use this subquery to build up the temporary table. So let me show you how to do that. In this case, we're gonna take select we'll we'll just duplicate this down here and we will add count. And then instead of all of this, where the ID in, we're gonna say from users left join and then we're gonna open up a subquery here and say that we want to select We'll do select, star and count as count, and then we can change this little guy down here, greater than 16 on as temp. So we have to give this table a name as temp on users dot id equals temp dot user id.

So if we run that, id is ambiguous. You are correct. SQLite users dot ID. So now if we run that we see that the data has been joined in except it's wrong. We did a left join.

I only want the ones that have 16 or more. So I did the wrong type of join. There we go. So now we have all of the users that have more than 16 bookmarks and we've joined in the number of bookmarks using a subquery. I'm gonna show you 1 more thing that could potentially wreck your sub query performance, but just keep in mind when you're filtering a table based on another table, a sub query is great there.

When you're trying to join stuff together, you can still use a sub query to build up a temporary table and then join that in if you need to. So the thing that could potentially wreck your performance is you could accidentally so let's take a look at this. This is pretty good. You could accidentally explain query plan, search using scan subquery. Okay.

We'll keep this in mind. And let's copy this and come down here. You could accidentally create a dependent or a correlated subquery. So if you were to do something like this, where exists and then select, doesn't matter. Select star from bookmarks, group by having and we're gonna reference the outer table, users dot ID, user ID equals users dot ID.

So in this case, we're still going to we're we're gonna get back the exact same thing, but it's going to take forever, so fast in fact or so slow in fact that I'm gonna have, producer Steve speed this part up. But we will see down here the query execution time. So let me run this without the explain query plan. Let me run this. I will see you on the other side.

This was a terrible idea. Oh, man. Do I just give up? I thought this was gonna take, like, 3 minutes. Boy.

Oh, boy. Okay. That took fully 20 minutes and it didn't finish. So what I was trying to show you is that there's a way that you could get in big trouble and I got myself in big trouble. So if we do explain query plan down here, this is all I wanted to show you.

I don't know why I wasted 20 minutes on this. This is all I wanted to show you. It's a correlated subquery. So what this means is, the subquery, this part is correlated to the outer query and that is because we reference the outer table in the inner query. In MySQL, MySQL's optimizer will look at this and say that's that's not really correlated.

They call it trivially correlated. They look at that and say, that's not really correlated. I'm gonna undo that because a correlated subquery has to run once for every outer row. So we were we were running this a million times, which is why it wasn't working. So be extremely careful when you create correlated subqueries because they're usually a performance boondoggle.

You can uncorrelate it yourself by saying where id in, select user id from here, and then remove the reference to the outer table. And if we explain that query, we no longer see we no longer see the correlated part. And so now it's just a subquery that is run once instead of 1, 000, 000 times. So hopefully you enjoyed that little snafu that cost me 20 minutes of my life. Hopefully it cost you a few seconds.

Thanks to producer Steve's editing. But the the things you need to remember, if you're filtering 1 table based on data from another, great use for a subquery. Be careful that your inner query doesn't reference the outer query. You want them to be uncorrelated as much as you can because as we saw, that's gonna help performance quite a bit to only run the inner query once, not for every row on the outer query.