Enter your email below to watch this video
Thank you for the excellent High Performance SQLite course! It’s been incredibly useful in deepening my understanding of SQLite. The sections on SQLite internals, schema, and optimization techniques are particularly insightful and have significantly improved my ability to work with SQLite in production environments.Ali Kasmani
Build with and integrate SQLite into your production applications.
I've made my test data available for you to use and follow along.
Remember back when we talked about the structure of SQLite? It is one file with a bunch of pages inside of it and you can calculate the size of the database based on the number of pages and the page size. So what happens if you delete basically everything in the database? Does the file size go down? Probably not.
Let's take a look. So in here we've got a bunch of databases that I have been working with. Let's do let's copy database dot SQLite and name it, we'll call it vacuum dot SQLite. A little hint about where we're going. So SQLite 3 vacuum SQLite.
So now we're in this vacuum database. We've got parent, or we've got bookmarks, child, migrations, parent, users. Just some tables that have been filling up. Again, you can find this data down below. You can spin it up on Terso pretty easily, but we'll just keep going here locally.
So if we were to run let's do pragma page count. This tells us how many pages are in the database and pragma page size will tell us the size of those pages. So if we were to do this number times 4096, we get 740,000. So if we do here, vac 740,000. So that we're calculating the size of our database here.
Let's get back into the database and do one other thing. We can do pragma free list, I think, free list count. And this tells us how many pages are on the free list. So when you delete a bunch of data, it doesn't actually all go away unless you use secure delete which is a thing. It doesn't actually all go away.
It just gets marked as, you can reuse this page later. This data, you can override it. It doesn't matter. So if we were to do select count star from users, There are a 1000000 users in there and if we look at bookmarks, there are 5,000,000 bookmarks in there. Which one do you wanna delete?
Let's, let's delete from bookmarks. So what could go wrong? Here we go. So select count from bookmarks. Wow.
That was really fast. And now if we check the free list count, you'll see, woah. We got a whole bunch of free pages there which means those pages are now marked as reusable. And if we exit, we'll see that the size of the database has not changed one bit. Literally literally one bit.
It hasn't changed at all. That is because those pages still exist. If we were to crack open this database and look, we would see all of that data is still there. And there are a few ways to get rid of it. Based on the name of the database, what do you think we're gonna do?
We are going to vacuum it up. So if we hop back in here to vacuum and clear this out, you can issue this command, vacuum. That's it. So this is going to take the database. It actually makes an entire copy of the database, and compresses it down so that all of those free pages are gone.
So if we were to check the free list count, we're down to 0 free pages. And if we were to exit, we would see that the size of the database has been reduced dramatically because all those pages all the populated pages were moved together so it reduced fragmentation and then the free list was moved to the end and then cut off. And so the database got a lot smaller. Now there are two potential reasons you might want to manually vacuum your database. The first is the one we just looked at which is size.
Now, in the world in which I live, which is the web application world, disk space is not really that big of a concern to me. So I'm never probably gonna vacuum just to reclaim a little bit of disk space. That doesn't concern me that much. You may live in a very different world. You may live in mobile devices or embedded devices or anything like that where disk space kind of is at a premium.
Or maybe you're working somehow in the browser and you need to you need to compress that database back down after a big delete. That would be a great use case for vacuuming. So it kinda depends on where you live and where your application lives. But for me, super not that important. The other reason you might vacuum is to reduce fragmentation.
So over time, as you're inserting, updating and deleting, you might have rows being put into those free list pages and so eventually the rows are kind of fragmented. And everything is gonna work faster if the data is all right next to other relevant data. And so there's unfortunately no way to, like, get an absolute read on this. When it comes to compressing it for size, you can compare the free list count to the page count and after it reaches a certain threshold you might say, goodness, 30% of my database is empty. I'm gonna go ahead and compress that.
That's great for the size argument, but for the fragmentation argument, there's there's no such stat that you can run. So you kinda just have to watch your queries. And if they start to get slow over time, you might consider looking into vacuuming. The problem with vacuuming is it can be slow and it requires an exclusive lock to, commit all of that data back into the database. It can also, in fact it does, require more space because it's making an entire copy of that database.
And so for a period of time, you're gonna have a lot more used space before it's all written, the compressed version is all written back in. So the two reasons would be make the database smaller and reduce fragmentation. It's kinda hand wavy when you wanna reduce fragmentation. There are auto vacuum modes. You can have a full auto vacuum or an incremental auto vacuum.
By default, it's off and I think that's probably just fine. I think it's probably just fine. I don't want to incur the penalty of continually vacuuming, every time a transaction or a connection is closed out. Then it's gotta acquire that lock. It's gotta vacuum.
It's gotta do all of that stuff. And that just I'm just not moved by that very much. But I do want you to know that this is something you can look out for. If you are doing a lot of maybe potentially thrashing to the table, inserting and deleting a lot of stuff, it might be a good idea to consider vacuuming.