High Performance
SQLite
Introduction
Introduction to this course
Introduction to SQLite
SQLite is safe
It's just a file
Running SQLite locally
Good uses for SQLite
SQLite's limitations
The documentation
SQLite Internals
SQLite's structure
SQLite's file format
Dot commands
Pragmas
Virtual tables
Schema
Flexible types
Types
Strict types
Dates
Booleans
Floating point
Rowid tables
Auto increments
Without rowid
Generated columns
Optimizing SQLite
Locking
Rollback mode
WAL mode
WAL vs Journal benchmarks
Busy timeout
Transaction modes
Vacuum
Analyze & Optimize
Suggested pragmas
Faster inserts
Indexes
Introduction to indexes
B+ trees
Primary, secondary, and clustered indexes
Without rowid tables
Benchmarking without rowid tables
Imposter tables
Primary key data types
Where to add indexes
Index selectivity
Composite indexes
Composite ordering
Covering indexes
Partial indexes
Indexes on expressions
Automatic indexes
Duplicate indexes
Indexing joins
Advanced SQL
Explain
Index obfuscation
Joins
Subqueries
Unions
CTEs
Recursive CTEs
Window functions
Dealing with NULLs
Row value syntax
Indexed sorting
Upserts
Returning
Aggregates
Triggers
Operating SQLite
Altering schema
Altering schema with tools
Multi-database
Multi-tenancy
Backups
Exports
How to corrupt SQLite
Advanced JSON
Intro to JSON
JSON vs. JSONB
Argument types: path, value, JSON
JSON5
Valid JSON
Creating JSON objects + arrays
JSON Extraction
Updating JSON
JSON aggregates
JSON table functions
Indexing JSON
Full Text Search
Creating FTS tables
Performing searches
Ranking results
Highlighting matches
Bonus Interviews
DHH discusses SQLite in Ruby on Rails
Distributed SQLite with Litestream and LiteFS
Offline-first, multiplayer SQLite
Production SQLite with Turso and libSQL
Migrating from Postgres to SQLite with Kent C. Dodds
Ruby on Rails with SQLite
Moving from Redis to SQLite with Mike Buckbee
Locked video

Please purchase the course to watch this video.

Video thumbnail
Advanced SQL
Aggregates

Full Course

$
179
$229
USD, one-time fee
It's a really well-structured course that helped me understand the intricacies of SQLite through short and easy-to-grasp videos. The content is on point, non-opinionated, thoughtfully presented, and has good real-life applicability. Thanks to Aaron and the team!
Viacheslav Azarov
Viacheslav Azarov

SQLite for Production

Build with and integrate SQLite into your production applications.

Test Data

I've made my test data available for you to use and follow along.

Download
or
Use on Turso

Summary

In the video, I explained how SQLite handles aggregates like count, sum, min, and max, and highlighted some unique quirks and useful tricks. We explored using filters and order-by clauses within aggregate functions, the difference between sum and total for handling nulls, and the group_concat function for gathering data into a single string. I also emphasized the importance of using indexes for faster queries, especially when grouping by expressions.

Video Transcript

SQLite supports aggregates. Just like every other database, count sum, min, max, all of the stuff that you would expect, there are a few quirks and a few gotchas and a few cool things that SQLite does that I don't think anybody else does. So let me show you what we are working with. The first thing I wanna show you is a little bit of a quirk. So if we do select star from users limit 2 and we run that, we get 2 users, both of whom are not pro.

So let's do this. Let's do, select count star. So we're gonna count up all of the rows and we're just gonna select everything as well. Group by and we're just gonna group by is pro over here because those first two people are not pro users. And so if we run this, you see something that you don't really expect.

We see that we have 945,000 non pro users. 440 I'm sorry. 44,000 pro users, but who why Bertie, Wysoski, and N. Dibbert? So where where did these people come from?

And this is the first maybe weird thing about SQLite grouping is that you can just select columns that aren't a part of an aggregate expression even when the the query is being grouped. Other databases will complain about this, rightly so in my opinion, because these results are useless. You cannot rely on any order here. You can't rely on n dibbert showing up and you can't rely on Bertie Wissowski showing up because those are not a part of any group and so those, those results are arbitrary. The only thing that we can really count on here is the count and the is pro because the count is up here as an aggregate and the is pro is what we're grouping by.

And so we're guaranteed that those are correct. But these could just be coming from anywhere. So I don't recommend selecting things that you're not grouping by if you are relying on them, which I don't know why you would select it if you're not relying on it. So, that is the first thing I wanna show you, but there's that's a quirk. Let's look at some interesting things.

One thing I think is super interesting is you can filter down your aggregate expressions. So if we do select star from users, I should have put a limit on that. If we do select star from users, limit 10 and we look at that, you see we've got these 10 users here. We can say something like count star filter where, this is crazy syntax to me, filter where is pro equals 1. And if we do that, it's gonna count it's gonna count across the whole table because we don't have any group by here.

And so it's gonna count it down and there's there's that 44382 that we see. And this can be really helpful when you're doing stuff like this. You're trying to get either multiple sums or multiple counts. You can do like sum, you know, purchases, filter where, and you can put a filter in there and you could say, like, I don't know. We put something in there and then as as pro purchases or something like that.

And this is a really nice way to get a bunch of aggregates all in one go. So, you could also do something like this where you say, alright, I need the pro members, the non pro members, and for whatever reason, I need all the people born in 1989. Who knows why, but you could do it. So you could say filter where we need a strftime again of percent year and on birthday equals, let's do 1989, great year. And if we run that again and we properly wrap that as 1989 and we do that again, So now you can see, you can get multiple aggregates across the same dataset but counting up different things.

And this would work, you know, this would work with any of them. With sum or min or max or anything like that. Anything where this evaluates true is then going to go into whatever aggregate function you're running on that, on that part of the query. SQLite has another trick up its sleeve when it comes to aggregating, the total or the sum of a set of values. So if we look at this, we'll say with numbers as the c t e.

Remember c t e's? So we'll do select 1. Let's just call this as n and union all selects 1, that's fine. So then we can do select sum in from numbers. So we run that, we get 2.

Why is that interesting? Well, that much isn't interesting. However, we have total n. What is the difference? Well, first of all, we do see that total is a float and sum is an integer.

Now, sum is the SQL standard and total is this new guy that they introduced. So why did they introduce this? If we change this to null, we see still we get 1 and 1. But if we change this to null, we get null and 0.0. So a decision was made to keep sum adhering to the SQL standard but also introduce total such that if there are no rows or all of the values are null, you still get, 0.0 back which you can work with on your application side.

And so this is another one of those handling null situations where the creators of SQLite decided, honestly, most of the time when you sum up something that's missing, the sum should be 0 and they just made a decision there without impacting how sum works under the hood which I gotta respect him for. I like that. There is another one. We've talked about floating point numbers and so if these are all, you know, long floating point numbers, we've talked a little bit about how those get messy. You do have to load that decimal math extension, but there is also a decimal sum.

Once you load that extension into SQLite, it is not there by default, but that can give you exact, an exact representation of a sum of floating point numbers which can be pretty important. Now we're gonna look at one final trick for aggregating which again I think is pretty cool. So if you did select star from users, we'll do actually let's do select star from categories because there aren't quite as many of those. One of the functions that you can use is something called group concat, and you can put, a column name in there and it's gonna shove it all together. By default, comma separated, you can pass, the separator of your choice in there and that will be used as the separator.

It is the same as, I think, string ag. Okay. So string ag and group concat are the same thing. So whichever one, you like better, use that one. That's great.

Now, one thing you can do with, inside of inside of this function, so you could let's back out and say select star from users. So we're gonna go to the users table, but we're gonna do group concat first name and then we're gonna say order by first name and let's call it ascending. So if we run that, we're gonna get a bunch of first names, which I think this is really interesting in here that you can add an order by. I think that is pretty cool. This doesn't matter order by doesn't matter for stuff like sum and min and max.

I don't know that SQLite is actually even going to order it. It might ignore it. But for these string concat functions, it can be, it can be extremely helpful. So you can do something like this. You can also say like, well, that sure is a lot of Aaliyah's, isn't it?

Like, I'm happy for her but that's a lot of Aaliyah's. You can do distinct first name. And now we get, oh, those are those are all of the names those are all of the names, distinct and so we get one representation of each. And you could, of course, you could, of course, go even further. You could say group by group by strftime%, year and birthday.

And then we're gonna bring this guy up here and then we'll say as birth year and we'll say, we need a comma there and we can say, as names. So if we run that we see, hey, they're all the same. They're all they're all the same because, the library I use generates the same name based on not birth year. But you can see, great. Now we've got the birth year and all of the names for people in that birth year because that is what we are grouping by.

So you have, the ability to filter your aggregate functions. You have the ability to order within your aggregate functions. And so SQLite is honestly pretty great when it comes to aggregates. If you are going to do something like this, group by an expression, You can see that this ran in, a second, 925 milliseconds. That's not super great.

I would if I were doing this query for real, I would either put an index on this expression or I would make a generated column out of this expression and just call it birth year and then group on that. So anytime you can do index assisted grouping, it is going to be better, faster, cheaper, easier, all the things you want. This will of course work like we've just seen, but if this were my application and I was being charged with creating, this query here, I would advocate for an index on the expression or a generated column that I later indexed.