Become a SQLite expert

In the video, I explained how SQLite handles aggregates like count, sum, min, and max, and highlighted some unique quirks and useful tricks. We explored using filters and order-by clauses within aggregate functions, the difference between sum and total for handling nulls, and the group_concat function for gathering data into a single string. I also emphasized the importance of using indexes for faster queries, especially when grouping by expressions.

SQLite supports aggregates. Just like every other database, count sum, min, max, all of the stuff that you would expect, there are a few quirks and a few gotchas and a few cool things that SQLite does that I don't think anybody else does. So let me show you what we are working with. The first thing I wanna show you is a little bit of a quirk. So if we do select star from users limit 2 and we run that, we get 2 users, both of whom are not pro.

So let's do this. Let's do, select count star. So we're gonna count up all of the rows and we're just gonna select everything as well. Group by and we're just gonna group by is pro over here because those first two people are not pro users. And so if we run this, you see something that you don't really expect.

We see that we have 945,000 non pro users. 440 I'm sorry. 44,000 pro users, but who why Bertie, Wysoski, and N. Dibbert? So where where did these people come from?

And this is the first maybe weird thing about SQLite grouping is that you can just select columns that aren't a part of an aggregate expression even when the the query is being grouped. Other databases will complain about this, rightly so in my opinion, because these results are useless. You cannot rely on any order here. You can't rely on n dibbert showing up and you can't rely on Bertie Wissowski showing up because those are not a part of any group and so those, those results are arbitrary. The only thing that we can really count on here is the count and the is pro because the count is up here as an aggregate and the is pro is what we're grouping by.

And so we're guaranteed that those are correct. But these could just be coming from anywhere. So I don't recommend selecting things that you're not grouping by if you are relying on them, which I don't know why you would select it if you're not relying on it. So, that is the first thing I wanna show you, but there's that's a quirk. Let's look at some interesting things.

One thing I think is super interesting is you can filter down your aggregate expressions. So if we do select star from users, I should have put a limit on that. If we do select star from users, limit 10 and we look at that, you see we've got these 10 users here. We can say something like count star filter where, this is crazy syntax to me, filter where is pro equals 1. And if we do that, it's gonna count it's gonna count across the whole table because we don't have any group by here.

And so it's gonna count it down and there's there's that 44382 that we see. And this can be really helpful when you're doing stuff like this. You're trying to get either multiple sums or multiple counts. You can do like sum, you know, purchases, filter where, and you can put a filter in there and you could say, like, I don't know. We put something in there and then as as pro purchases or something like that.

And this is a really nice way to get a bunch of aggregates all in one go. So, you could also do something like this where you say, alright, I need the pro members, the non pro members, and for whatever reason, I need all the people born in 1989. Who knows why, but you could do it. So you could say filter where we need a strftime again of percent year and on birthday equals, let's do 1989, great year. And if we run that again and we properly wrap that as 1989 and we do that again, So now you can see, you can get multiple aggregates across the same dataset but counting up different things.

And this would work, you know, this would work with any of them. With sum or min or max or anything like that. Anything where this evaluates true is then going to go into whatever aggregate function you're running on that, on that part of the query. SQLite has another trick up its sleeve when it comes to aggregating, the total or the sum of a set of values. So if we look at this, we'll say with numbers as the c t e.

Remember c t e's? So we'll do select 1. Let's just call this as n and union all selects 1, that's fine. So then we can do select sum in from numbers. So we run that, we get 2.

Why is that interesting? Well, that much isn't interesting. However, we have total n. What is the difference? Well, first of all, we do see that total is a float and sum is an integer.

Now, sum is the SQL standard and total is this new guy that they introduced. So why did they introduce this? If we change this to null, we see still we get 1 and 1. But if we change this to null, we get null and 0.0. So a decision was made to keep sum adhering to the SQL standard but also introduce total such that if there are no rows or all of the values are null, you still get, 0.0 back which you can work with on your application side.

And so this is another one of those handling null situations where the creators of SQLite decided, honestly, most of the time when you sum up something that's missing, the sum should be 0 and they just made a decision there without impacting how sum works under the hood which I gotta respect him for. I like that. There is another one. We've talked about floating point numbers and so if these are all, you know, long floating point numbers, we've talked a little bit about how those get messy. You do have to load that decimal math extension, but there is also a decimal sum.

Once you load that extension into SQLite, it is not there by default, but that can give you exact, an exact representation of a sum of floating point numbers which can be pretty important. Now we're gonna look at one final trick for aggregating which again I think is pretty cool. So if you did select star from users, we'll do actually let's do select star from categories because there aren't quite as many of those. One of the functions that you can use is something called group concat, and you can put, a column name in there and it's gonna shove it all together. By default, comma separated, you can pass, the separator of your choice in there and that will be used as the separator.

It is the same as, I think, string ag. Okay. So string ag and group concat are the same thing. So whichever one, you like better, use that one. That's great.

Now, one thing you can do with, inside of inside of this function, so you could let's back out and say select star from users. So we're gonna go to the users table, but we're gonna do group concat first name and then we're gonna say order by first name and let's call it ascending. So if we run that, we're gonna get a bunch of first names, which I think this is really interesting in here that you can add an order by. I think that is pretty cool. This doesn't matter order by doesn't matter for stuff like sum and min and max.

I don't know that SQLite is actually even going to order it. It might ignore it. But for these string concat functions, it can be, it can be extremely helpful. So you can do something like this. You can also say like, well, that sure is a lot of Aaliyah's, isn't it?

Like, I'm happy for her but that's a lot of Aaliyah's. You can do distinct first name. And now we get, oh, those are those are all of the names those are all of the names, distinct and so we get one representation of each. And you could, of course, you could, of course, go even further. You could say group by group by strftime%, year and birthday.

And then we're gonna bring this guy up here and then we'll say as birth year and we'll say, we need a comma there and we can say, as names. So if we run that we see, hey, they're all the same. They're all they're all the same because, the library I use generates the same name based on not birth year. But you can see, great. Now we've got the birth year and all of the names for people in that birth year because that is what we are grouping by.

So you have, the ability to filter your aggregate functions. You have the ability to order within your aggregate functions. And so SQLite is honestly pretty great when it comes to aggregates. If you are going to do something like this, group by an expression, You can see that this ran in, a second, 925 milliseconds. That's not super great.

I would if I were doing this query for real, I would either put an index on this expression or I would make a generated column out of this expression and just call it birth year and then group on that. So anytime you can do index assisted grouping, it is going to be better, faster, cheaper, easier, all the things you want. This will of course work like we've just seen, but if this were my application and I was being charged with creating, this query here, I would advocate for an index on the expression or a generated column that I later indexed.

Full Course

SQLite for Production

Summary

Video Transcript