Become a SQLite expert

In the video, I compared two key-value tables, one with a row ID and one without, to see if there's a performance difference when running benchmarks. I found that there's almost no performance difference between the two on my machine, although the table without a row ID is slightly smaller in size. My advice is not to focus on row ID optimizations early in development and instead prioritize using indexes for improvements.

We're gonna benchmark 2 different key value tables. 1 has a row ID and 1 is without row ID. Otherwise, these tables are identical. This benchmark is helpful for comparing these two things to each other, but like every benchmark, it's fake. It's fabricated.

I tried to make it as real as possible, but this is this is as good as we can get with an artificial benchmark. So benchmarks are all wrong, but some of them are useful and I hope this one is useful. Let's take a look at the methodology. So don't worry about any of this stuff. This all just Laravel's database creator here.

So we've got 2 different tables, kvone, kv 2. We're gonna set wall mode, to on for both of those. And then these statements here create the table, and they're pretty create table kv with key as a text, primary key, value as an any. Both tables are strict, but you'll see that kv2 is without row id and kv1 is not without row id which means it is with row ID. Let's take a look at the benchmarking code.

I tried to make this as close to production as possible, but it's not actually production. So here's here's what we're gonna do. This is the command bench. We can pass in the number of processes. We'll probably keep that at 25.

We'll probably bump up the number of queries to 50,000 instead of 15,000. And then what we do is let's hop down to the child down here. So the first thing we do is we figure out if we're gonna do a read or a write. We're only gonna write 5% of the time and we're gonna read 95% of the time. That seems like a pretty good mix to me.

The key is 8 characters, 8 random characters. The value is 64 random characters. Now I have seeded these random functions so we will get reproducible random values. They're still fully random. They just all are gonna be the same every time we run them.

So that what that allows us to do is test with k v one table and then test test with k v two table. And the values that are being used for these tests are gonna be the same, but I don't have to generate a 1000000 random strings up front myself. All I did was seed the random number generator. Then the only thing I wanna measure are these statements. If it's a write, we're gonna do this statement.

And if it's a read, we're gonna do this statement. I don't wanna accidentally measure this over head here. Not that it would be that much, but I don't wanna do it. So for a write, what we're gonna do let's see if I can get this on one line. There we go.

Insert into kv, k key value. We're gonna bind them in. There they are. On conflict key, do update set value, excluded dot value. So this is saying if we're inserting a key that already exists, just update the value instead.

This is an upsert. Pretty cool that SQL Lite does this. I do like that. If it's a read, select star from kv, where key equals key. That's it.

And then we put out w or r for read or write, and then write all the timings to a JSON file, and then we calculate the percentage, etcetera. So that is the methodology. Hopefully it is close enough to real world as to be useful, but it's at least close enough to each other for a relative comparison. So, let's do some comparisons. I've got them up side by side here, but the first one we're gonna do is the kvone database and we're gonna run 50 1,000 queries per process and I think we're gonna go ahead and run 25 processes which is the default.

So, let's run this and see what happens. So it's about to start and we're off to the races. So you can see each one is trying to get up to 50,000 and there we go. So, the d b was kvone and as, as a reminder, 1 is row id, 2 is no row ID. So this is the row ID, table.

25 processes, 50,000. So we had what does that come out to? A 1,000,000 a quarter? Queries. And here's the p 50 through 99.

Honestly, it's I mean, without row ID or or a row ID table is plenty fast. The p 99 is at point 89 milliseconds. So let's, let's run this again. So we'll say arbenchdb equals, kv2andqueries equals 51, 23, 50,000. So this is the one with no row ID.

And so, I think what we're about to discover is that it doesn't make a difference on this machine. This is running on a m one MacBook. So you'll see here, if we look at the p 90 or the p 95 is slightly faster. P 90 p 90 is slightly faster. So the reads are slightly faster except for the p 99.

We got one errant read out here. And, weirdly, the writes are slightly faster. And this is so this is so slight to me that it's, noise. It doesn't it's not real. This is too close to be real.

I mean, we're talking about, 0 less than 1 milliseconds difference. It's just it's just not real. There's no difference. And so it's it's interesting here that the without row ID table is not that much faster. I think one thing that we can see if we change into database and, list it out, we should see There we go.

Kv2 is smaller than kv1. So the the table that has no row ID is in fact smaller, that database is. So let's open this up. I don't remember the commands to make it pretty, so we're gonna look at 3 megabytes versus 3.4 megabytes. Because it is maintaining 2 b trees, so the the table that has the row ID is maintaining 2 b trees and the table without the row ID is only maintaining 1 b tree, that means there's less data in this second database.

So my, my suggestion to you is that you don't worry about without row ID as an optimization until until you're near the end. Once you're near the end of development and you look around knowing everything you know, you might run some tests on your data to see if there's a without row ID optimization that could be made. But I wouldn't worry about it too much and it's definitely not the first thing I would reach for. The first thing I would reach for is indexes.

Full Course

SQLite for Production

Summary

Video Transcript