High Performance
SQLite
Introduction
Introduction to this course
Introduction to SQLite
SQLite is safe
It's just a file
Running SQLite locally
Good uses for SQLite
SQLite's limitations
The documentation
SQLite Internals
SQLite's structure
SQLite's file format
Dot commands
Pragmas
Virtual tables
Schema
Flexible types
Types
Strict types
Dates
Booleans
Floating point
Rowid tables
Auto increments
Without rowid
Generated columns
Optimizing SQLite
Locking
Rollback mode
WAL mode
WAL vs Journal benchmarks
Busy timeout
Transaction modes
Vacuum
Analyze & Optimize
Suggested pragmas
Faster inserts
Indexes
Introduction to indexes
B+ trees
Primary, secondary, and clustered indexes
Without rowid tables
Benchmarking without rowid tables
Imposter tables
Primary key data types
Where to add indexes
Index selectivity
Composite indexes
Composite ordering
Covering indexes
Partial indexes
Indexes on expressions
Automatic indexes
Duplicate indexes
Indexing joins
Advanced SQL
Explain
Index obfuscation
Joins
Subqueries
Unions
CTEs
Recursive CTEs
Window functions
Dealing with NULLs
Row value syntax
Indexed sorting
Upserts
Returning
Aggregates
Triggers
Operating SQLite
Altering schema
Altering schema with tools
Multi-database
Multi-tenancy
Backups
Exports
How to corrupt SQLite
Advanced JSON
Intro to JSON
JSON vs. JSONB
Argument types: path, value, JSON
JSON5
Valid JSON
Creating JSON objects + arrays
JSON Extraction
Updating JSON
JSON aggregates
JSON table functions
Indexing JSON
Full Text Search
Creating FTS tables
Performing searches
Ranking results
Highlighting matches
Bonus Interviews
DHH discusses SQLite in Ruby on Rails
Distributed SQLite with Litestream and LiteFS
Offline-first, multiplayer SQLite
Production SQLite with Turso and libSQL
Migrating from Postgres to SQLite with Kent C. Dodds
Ruby on Rails with SQLite
Moving from Redis to SQLite with Mike Buckbee
Locked video

Please purchase the course to watch this video.

Video thumbnail
Indexes
Index selectivity

Full Course

$
179
$229
USD, one-time fee
Aaron is communicating so clearly that my 10 year old daughter understands what he is talking about. Production is at the highest level I have ever seen. I first watched his MySQL course and I can see the progress Steve and Aaron are making. HP SQLite course is a pure masterpiece.
Ivan Bunčić
Ivan Bunčić

SQLite for Production

Build with and integrate SQLite into your production applications.

Test Data

I've made my test data available for you to use and follow along.

Download
or
Use on Turso

Summary

I learned that not all indexes are good for speeding up database searches, especially when a column has few unique values like names that are all the same. The concept of selectivity, which is the percentage of unique values compared to the total rows, and cardinality, which is the count of unique values, are important in deciding if a column is fit for an index. For the most effective indexes, I should focus on columns with high cardinality and selectivity, as these will help databases like SQLite work faster.

Video Transcript

Unfortunately, not all indexes are created equal. Some columns are just not good candidates for indexing. Even if you are using a strict column. So when that it's a bad idea is that the index doesn't help you narrow anything down. Imagine that this users table was full of people named Aaron.

The every single person that is 1 of our users is named Aaron. I would fit in well there but that would be a weird group of people. Now, if you put an index on first name and then you say show me people where first name equals Aaron, who is that helping? It's the entire table. There's only 1 value in the table, and it's Aaron.

The index doesn't help. So what we need to talk about now is cardinality and selectivity. So cardinality refers to the number of unique values in that table as an absolute. Selectivity refers to that as a ratio. So if you have 2 unique values, that that might be good if you have 2 rows.

But if you have 2 unique values and 1, 000, 000 rows, cardinality is 2, but the selectivity is nearing 0. So let me show you how we can figure this out. We're gonna start by calculating the cardinality of the birthday column. Now, cardinality, it's a good word to know. It comes up in the documentation.

It comes up on Q and A. You can use it to talk to chat gpt about. But, it's okay if you don't remember cardinality. What we're trying to figure out here is is this column a good candidate for indexing and it it it's not a good candidate if it doesn't have unique values. So cardinality.

So let's count distinct birthday from users. That's the cardinality. There are 10, 950 unique birthdays in this table. That's honestly not very many. I guess that's the whole birthday problem.

Right? Once you have a bunch of people, the likelihood of a birthday collision goes way up. This is the cardinality. So let's do another 1. Let's do select count distinct, is pro from users.

There are 2 distinct values there. Yes and no. It's a boolean column so there are only 2 distinct values. So the cardinality here is very very low. Now, how do we figure out selectivity?

Or rather, what even is selectivity? Selectivity is the number of distinct values over the total number of rows which turns it into a percentage which is a lot more helpful. So if we do select countdistinctbirthday and then we we're gonna coerce that into a float. So then, we will wrap this whole guy up. So we'll do select count distinct birthday divided by count star.

And then, let's go ahead and just wrap this whole guy up too. We'll wrap that guy up and then we'll say, as selectivity from users. So there we go. Let's turn the widths off because the the word wrap is messing me up here. There we go.

Selectivity is point 1. Is that good or is that bad? I could tell you. I could tell you, but I don't like memorizing things. I don't like memorizing things.

So if we were to use our powerful brains and think for a second, what is the most selective column possible? Well, that would be the ID because the ID is a unique key. It's the primary key. We could use email because email is unique, I think, on this table too. But instead of teaching you or instead of making you memorize if a low selectivity or high selectivity is good, Let's just see if we can intuit it together.

So if we come all the way down here and we'll say, alright. I know that ID is the most selective thing possible, so it's gonna be very good. What is ID? ID is a selectivity of 1.0. That makes perfect sense to me because ID is distinct across the entire table.

So when you put a 1000000 ID's over a 1000000 rows, you get 1. So that tells me that point 1 is honestly not that good. So this is a good way to look for candidates for indexing. I don't like memorizing things. I like intuiting things because I'm more likely to actually remember them if if I intuit it.

So we know that this birthday column is worse selectivity than the ID column. Let's run a few experiments with some of these other columns and see if SQLite agrees with this, with this theory that we're working with here. So let us create an index. We'll do create index deleted at on users deleted at. And, yep, that should be good.

And let's do a quick, select count select count. Yeah. We'll do select count star from users where deleted at is not null. Let's do that. So we see that out of a 1000000 rows, 10, 000 of them have been deleted.

So these are, we can call these tombstoned. We can call them soft deleted. We can call them, awaiting clean up. That's insane. Let's just call them soft deleted.

So we have 10, 000 rows that are soft deleted. So if we were to do something like, what do we have now? We've got 2 indexes. We have birthday and deleted at users, email. Yeah.

That's fine. We're not touching that right now. So if we do select star from users, yep. That's fine. Select star from users where birthday equals 19890214 and deleted at is not null.

This is an exercise for you. So just listen to my voice and think with your brain what is going to happen here. We're selecting from the users table. Some we're selecting all the people that were born on February 14, 1989 and they have not been soft deleted. We'll go ahead and limit this down to 5.

We have indexes on both birthday and deleted at. And so SQLite is now given a choice, which index do you want to use. In most cases it's gonna use 1 index only. There are some optimizations that allow like some sort of index merge to happen which is pretty cool but usually it's just gonna pick 1 index and usually we want to create indexes that are pickable. So having listened to all of that, is it gonna choose birthday or delete it at?

Let's take a look. It is using birthday. So because, presumably, we don't know we don't exactly know but because birthday is more selective than deleted at is not null, it's saying, alright. I'm gonna choose all of the people that were born on that day using the birthday index. I'm gonna take those row ids back to the main table and then grab all of those row I d's and then further filter out anyone who has been soft deleted rather than going to the deleted at index, grabbing fully 99.9% of the table and then going back and saying, alright.

Well now I gotta filter out people who are not born on the right day. So whichever whichever index it can use to filter out, most or to rather select most of the correct rows, it's gonna choose that. Go back to the main table and then weed out any that don't match any other conditions. So this is a way that selectivity, cardinality all play in together. Sometimes it can be great to put an index on a boolean column even though there are only 2 values and it's very very low selectivity.

If if the query that you're using is searching for those very, very few people, right. So if if you have 99.9% of the table is, is guest users or whatever and like 0.1% is let's say admin users, putting an index on that column can be great if you're searching for the little bitty segment of the table. But if you're searching for the giant segment of the table, SQLite's gonna look at it and go man it's probably faster if I just scan the table and it might ignore the index altogether. So rule of thumb, put indexes on columns that have, a lot of distinct values, have a high cardinality and a high selectivity and that is going to make SQLite more efficient and more effective when it uses that index.