High Performance SQLite

In this video, I explain why I love using subqueries to filter data from one table based on another in situations where traditional joins might create unnecessary duplicates. I demonstrate how to find users with more than 16 bookmarks using a subquery and also explain the importance of avoiding correlated subqueries to keep performance efficient. By watching, you’ll learn how to effectively use subqueries for better database querying and avoid common pitfalls that can slow down your queries.

In this video, we're gonna talk about sub queries, which I personally love. So a sub query is really, really helpful when you have data in 1 table, we'll say table a, and you want to eliminate, or include things in table a based on data in table b. Now what you could do in some cases is you could join stuff together and then filter it out after. But if you do a left join, you're gonna end up with potentially tons and tons of rows in the first table because of that cartesian multiplication. Right?

So if you have users that have many bookmarks and you join the bookmarks in, you end up with a bunch of duplicate users which may be fine, but in a lot of cases that's not exactly what you're looking for. So let me lay out a scenario for us here. We want to find users that have more than, we'll say, 15 bookmarks. Now, there are a few ways we could do this. So we'll do we'll start with select star from users and we could say something like left join bookmarks on users dot id equals user id.

I don't like this by the way. On user id group by users dot ID. Oh, this feels bad already. Having count greater than 15. So the reason I don't like this is we're bringing along a bunch of data that we we don't actually need.

We're not actually after the bookmarks, we're after the users that have that many bookmarks. You can imagine you want users with more than, 5 projects and projects are in a related table. Or you want projects that have, fewer than 1 task in it because that's a project that's likely been abandoned and you wanna send an alert or a notification to somebody saying, hey, you made this project and then you never added any tasks. Don't you wanna add some tasks? So this is the same this is the same kind of query, that you could imagine in your domain application.

So let's take a look at how we can do this with a sub query. We're gonna start by figuring out how we can figure out the user IDs of users that have more than 15 bookmarks and then we're gonna wrap that up into a subquery. So the first thing that we're gonna do is we will do select user ID from bookmarks. We're just gonna limit that down. Okay.

User ID from bookmarks and we will group by user ID, group by user ID having count greater than 15. I wonder how many that is. 75. That's really, that's not that many. What if we change it to 16?

That's that's more workable. So what we've done here is we've created, we've started by writing the inner part of the query. So now we have a bunch of user IDs. What do we do with those?

This having just means it operates after the group by and if there is no group by, having is functionally equivalent to where. This wouldn't work because there's no longer a group by, but when you have a group by, you can throw a having on and it will filter after the grouping. Kinda nice. Alright. Again, an embarrassment of riches with these, this line spacing.

We don't need that much. This is probably fine. Everybody's gonna be complaining because it's not exactly the style that you prefer, but that's okay. Alright. So we've got the user ID's.

Now, we have a few options here. Let's, make some space and come up here. We're gonna select select first name and we'll select ID first name and last name from users limit 10. That's probably what we're after. Is email in there?

I bet email is in there too. Emails, this is a good report. So the boss man or woman is asking for, hey, give me a report of the best people, meaning the people that have made the most bookmarks. And so this is something that looks right. Hey, boss.

Here's ID, first name, last name. This works really well. What we wanna do though is we wanna limit it down to people that have more than 16 bookmarks. So we can do that I hate that. We can do that by so from users limit 10 where so from users where ID is in a subquery.

So we're gonna plop the subquery right there. You may be familiar with something like this where you hard code, some IDs in there or something. Totally legit. That's still completely fine. I would say that if you're doing something like we're doing here, a subquery is going to be better because this allows this gives SQLite all the information.

And under the hood, it might turn it into something called a semi join or an anti join or it might use some other optimization strategy. It might use some sort of index assisted lookup. Whereas if you just hard code the ID's, you're kind of taking a little bit of that context or that power away from SQLite. Also, it is possible to run this query, get all of the ID's back and then in your application form a new query, form this query with the ID's embedded and send it back over to SQLite. This is less of a problem in SQLite because there's no network time in another database.

You're having to then make 2 round trips, 1 to get the ID's out and then you build up this giant query on the application sub query in. So we're gonna take that subquery that we wrote, get that guy out of there, and now we have to change actually we don't. So we only selected user ID. So ID and user ID, those are the keys that should match each other. So now if we run this we should see we have 19 people.

And is that how many? So if we were to just run the subquery, we get 19. So we should have 19 up here and we do. Those are the 19 people that have more than 16 bookmarks. And if you wanted to join in the bookmark count, then you're back to a join.

But you can still use this subquery to build up the temporary table. So let me show you how to do that. In this case, we're gonna take select we'll we'll just duplicate this down here and we will add count. And then instead of all of this, where the ID in, we're gonna say from users left join and then we're gonna open up a subquery here and say that we want to select We'll do select, star and count as count, and then we can change this little guy down here, greater than 16 on as temp. So we have to give this table a name as temp on users dot id equals temp dot user id.

So if we run that, id is ambiguous. You are correct. SQLite users dot ID. So now if we run that we see that the data has been joined in except it's wrong. We did a left join.

I only want the ones that have 16 or more. So I did the wrong type of join. There we go. So now we have all of the users that have more than 16 bookmarks and we've joined in the number of bookmarks using a subquery. I'm gonna show you 1 more thing that could potentially wreck your sub query performance, but just keep in mind when you're filtering a table based on another table, a sub query is great there.

When you're trying to join stuff together, you can still use a sub query to build up a temporary table and then join that in if you need to. So the thing that could potentially wreck your performance is you could accidentally so let's take a look at this. This is pretty good. You could accidentally explain query plan, search using scan subquery. Okay.

We'll keep this in mind. And let's copy this and come down here. You could accidentally create a dependent or a correlated subquery. So if you were to do something like this, where exists and then select, doesn't matter. Select star from bookmarks, group by having and we're gonna reference the outer table, users dot ID, user ID equals users dot ID.

So in this case, we're still going to we're we're gonna get back the exact same thing, but it's going to take forever, so fast in fact or so slow in fact that I'm gonna have, producer Steve speed this part up. But we will see down here the query execution time. So let me run this without the explain query plan. Let me run this. I will see you on the other side.

This was a terrible idea. Oh, man. Do I just give up? I thought this was gonna take, like, 3 minutes. Boy.

Oh, boy. Okay. That took fully 20 minutes and it didn't finish. So what I was trying to show you is that there's a way that you could get in big trouble and I got myself in big trouble. So if we do explain query plan down here, this is all I wanted to show you.

I don't know why I wasted 20 minutes on this. This is all I wanted to show you. It's a correlated subquery. So what this means is, the subquery, this part is correlated to the outer query and that is because we reference the outer table in the inner query. In MySQL, MySQL's optimizer will look at this and say that's that's not really correlated.

They call it trivially correlated. They look at that and say, that's not really correlated. I'm gonna undo that because a correlated subquery has to run once for every outer row. So we were we were running this a million times, which is why it wasn't working. So be extremely careful when you create correlated subqueries because they're usually a performance boondoggle.

You can uncorrelate it yourself by saying where id in, select user id from here, and then remove the reference to the outer table. And if we explain that query, we no longer see we no longer see the correlated part. And so now it's just a subquery that is run once instead of 1, 000, 000 times. So hopefully you enjoyed that little snafu that cost me 20 minutes of my life. Hopefully it cost you a few seconds.

Thanks to producer Steve's editing. But the the things you need to remember, if you're filtering 1 table based on data from another, great use for a subquery. Be careful that your inner query doesn't reference the outer query. You want them to be uncorrelated as much as you can because as we saw, that's gonna help performance quite a bit to only run the inner query once, not for every row on the outer query.

Full Course

SQLite for Production

Summary

Video Transcript