Distributed SQLite with Litestream and LiteFS

Well, I am here with Ben Johnson. And, Ben, I have something to show you that I did not warn you about. Tell me. Do you do you recognize any of this? Do you recognize I do.

These headers? It looks better in color, though. I gotta say. It it does look better in color. I printed out.

So and and in part and part of my studying for this course, of of course, the first place I went was the Fly blog to print out all of your, articles over there. And they are super good. So thank you so much for writing all of those. And it has that same, like, that's I don't know if it's you or fly, but it has that same kinda, like, irreverent, casual, fun, like, conversational tone that I I really like. So good job on all those.

Well, thank you. Yeah. I appreciate that. Hey. I hope you're enjoying this.

Just as a reminder, you can learn more about SQLite at high performance SQLite.com. I'll get out of your way. Back to the interview. So tell me, by way of introduction, where you're working. I I just spoiled it, but where you're working, what you do, and how you ended up how you ended up there.

Yeah. So I'm at fly.io right now. We do, you know, application hosting and, server host yeah. We put put applications everywhere kind of as our our gig. And I ended up there a couple years ago.

I did a project called LightStream, which is essentially trying to make, like, SQLite, make it work on basically a single node. So you're gonna have, like, automatic backups and, like, for a lot of applications, that's it's probably good enough. I mean, unless you need some crazy, like, high availability thing going on, then, otherwise, I mean, like, a simple SQLite can go a long way. And then before that, yeah, did a lot of consulting work in Go. I've done a lot of work on databases.

I did a database called BoltDB. That's in, like, the Go world. Not know that was you. Yeah. I did that maybe 10 years ago.

I was actually at Shopify at the time. Okay. Yeah. We were using LMDB at the time, for, like, an analytics database. We kind of I had actually written an analytics database before that.

And that that's why I kinda ended up at Shopify. We started doing analytics, and we kinda wrote our own database there. And, LMDB worked great, which is, like, another key value store. Mhmm. But we had some issue, and we could not ever track it down.

But, like, every couple months, we'd have, like, one server that would just fill up the disk, and we had no idea why. And at the time, I'd kind of been playing with, like, doing a port of LMDV to go, but more just like an educational thing. And then, you know, it eventually got to the place where it was like, we couldn't figure out this bug, and we had spent forever on it. And, we're like, hey. Why don't we just try this this little project you're done on the side, which is which is Bolt at the time.

So Mhmm. Yeah. I did that. Before that, I've done a lot of work in, like, data visualization. I was actually an Adobe Flex developer, if anybody is old enough to remember what the hell that is.

I I feel like I've been around for a while, but I don't know what flex is. Yeah. It's basically if you took, like, like React, but, like, 15 years ago, and it was on the, the Flash platform, essentially. Oh, okay. It was back when, like, before HTML and, like, browsers could really do a whole lot.

So it was actually kinda cool, but, yeah. Adobe kind of tanked with that that kind of stuff. Flash, I remember. I was a I was an ActionScript developer, but I didn't ever know I didn't ever know Flex. So Yeah.

Most people, most people would say you should never write your own database. And here you are having written at least 2 that we know of and then some sort of file system for other databases and a little sidecar. And so you're just, like, you're doing all the things that you're not supposed to do, and you're doing them very well, it seems like. I mean, I think the first ones weren't very good. That's kind of the that's the trick.

It's like Yeah. Make sure you get the bad ones out first, and then, no. I mean, like, I think that it's kinda one of those things where, like, it's, like, you know, rolling your own crypto. Like, you shouldn't do that, but, obviously, there was somebody that rolled their own crypto first. And I think you just, like, kind of you find something some little niche and you just dig into it really deep and just keep going, you can just find fascinating little little areas to dive into.

So, I think I had a, like, a little scratch I wanted to itch. I worked at a company that did, like, behavioral analytics, and I thought they could do it much better. I was actually the data visualization guy at the time at that place, not the database guy. And this is back in, like, Hadoop, like, when Hadoop first came out. And I was like, hey.

We should use Hadoop for this thing. It'd be great. So we're spending forever, like, trying to actually run these reports. And I started just playing around with trying to make it faster and faster, and eventually, I just wrote my own database. I think that's kind of where everyone ends up.

Right? That is the end state for trying to make things faster for sure. Yeah. So I wanna talk to you about LightStream first. So we got we got we have 2 light products, LightStream and LightFS.

Mhmm. And we can we can go super technical, so don't be shy. So I wanna hear I wanna hear about, LightStream and how you kinda, I guess, came up with the idea and why you wrote this, and then we can talk about kind of how it works technically. Yeah. Sure.

So I think the basis for it was kind of the work I did with Bolt, way back in the day. I realized, like, when you have a database right next to the application, it's really fast. Like Mhmm. Especially when you're used to using Postgres or I mean, not that there's anything wrong with client server databases. They're great.

But there's just something about not having any overhead, like, network overhead that's just, like, kinda blows your mind. And, it's really simple too. Like, there's no external dependencies. You just all you bundle it all up, and it's right there. Mhmm.

So I think for me, it was more, I was trying to make, like, application development with Bolt try to work. And it was just there's a lot of nice things you get with, like a SQL database, like schemas and, like, foreign keys and all that, you know, little things that Yeah. You wanna fill yourself. Like schemas. Indexes.

You know? Yeah. So I just eventually, I was like, you know what? This is way too hard probably for anyone that's not a database developer. Mhmm.

And, I tried to look around for other, SQL databases that were embedded, and really it's just SQLite out there. That's like the the big one. Works on almost every language and yeah. It's fantastic. So, I started developing with that and I I kind of got this itch of like, what is like, for most applications, I think most developers write.

Like, if you wanna deploy it out, it needs to have, you know, pretty good uptime, maybe, like, 3 nines or something. Mhmm. But, like, you don't need, like, 5 nines. You don't need, like, crazy anything. You just want something that's simple, that's easy to run, that's cheap.

And the biggest thing I had an issue with was, like, the disaster recovery piece of SQLite. Where you could do, like, you could do, like, a daily backup or even, like, an hourly backup, which honestly actually works great for a lot of people. We even even have documentation on LightStream about alternatives, and one of those is just back it up every hour, and that's that's fine. Mhmm. But if you wanna actually, like, track your changes so that you only have kind of this really small window of down of, data loss in the event of, like, catastrophic, events, then, you know, something like LightStream where you can continuously back up, like, incrementally back up each little set of transactions every few seconds, you really, like you minimize your actual risk of data loss.

And that's kinda the the the end goal. So, yeah, I I wanna make application development super simple. Run it runs as a separate process, so you don't have to, like your application doesn't have to know about it in particular. Mhmm. That was kind of a big design goal as well.

Mhmm. Anyway, yeah, I'm just kinda rambling now, but that's kinda where it came from. This is this is great. And I think one of the things that, was so fascinating to me about kind of the structure of it or the the the architecture of it, and forgive me if this is naive and or wrong. It it seems like you the final product is you looked at SQLite and you're like, how do I do this without basically touching SQLite internals?

Right? Because they're very they're very, like, we're open we're source available, but we don't take PRs. We don't take anything. It's us three guys that have been doing it since the early 2000. We're gonna keep doing it.

And so that kinda puts, that kinda puts an interesting constraint on I wanna do something with SQLite, but I can't really, like, get inside it. And so I think the the approach you took was kinda trick it to be this long running, like, it works, you know, it keeps, like, a transaction open and it hijacks the checkpoints or something like that. And that to me is like, oh, that's incredibly like, that's very clever. Because then people still get to use, like, vanilla out of the box SQLite and then put your thing beside it. So can you tell me a little bit about, like, the the architecture and the the structure of it and how that, like, trickiness works?

And I say trickiness in a good way because I think it's fascinating. It it's all hacks out there. Honestly, we're all we're moving electrons around. I feel like everything's a hack, honestly. So, yeah.

I mean, the way that it works is there's basically 2, journaling modes. So journaling mode in a database, I don't know if you've covered this before, but, like, essentially, it's how you safely, like, write changes to a database so that they all get written and they're durable and that if if it needs to roll back, like, everything can roll back. So one is a journal mode where, essentially, you are copying out the the previous version of the changes, like the rows, into a separate file called the journal file. And then you'll make changes to the actual database, and then you write the changes. Like, you you basically delete the journal file after you're done.

So at any point, you can recover back, like, if you need a rollback. So this is the journal mode, and that's like the old old mode of SQLite. It has a lot of drawbacks. It's not very fast. You don't have, like, concurrency and all that.

And kind of the better mode, generally I mean, there's a couple trade offs, but by and large, like, the wall mode, which is right ahead log, that's kinda like the best mode to really run SQLite in. You get a lot of concurrency. You can have a bunch of different readers at the same time, one writer, but they can all kind of intermix at the same time. Mhmm. So what the way that the the wall mode works, w a l, is that every transaction you make, you'll make changes to pages.

So pages are kinda like blocks in your database. And every time you make a change to a page, it'll kind of tack it onto the end of this file called the the write a head log, and then it can reference those changes. Can I can I pause your Oh, yeah? Please do. Yeah.

So the, correct me if I'm wrong, but edification of the listener, the 2 trends or the 2, mode the journal modes are rollback and write ahead. And write ahead, we call wall because it's write ahead logging, w a l. So we've got rollback and write ahead. Rollback is traditional, and still default because SQLite is super on about backwards compatibility. They just love it so much, which is great.

Yeah. Love it. So we've got these 2 modes, rollback and wall. Wall is the, wall is the better one for better, higher concurrency. You can have a bunch of readers.

Still one writer, which is fine, but you can have a bunch of readers working. Rollback mode writes to the actual database file, And if you wanna roll back, it grabs the thing that it set aside, and it puts it back in if you need to roll back. Right ahead, instead of writing to the actual database, file, it creates this new file that's like a dash w a l, and it writes there and leaves the database, untouched until such a time where it takes the wall takes the the second file and basically merges it back into the first file, and then we start all over again. Is that fair so far? Yeah.

Yep. That's great. Yeah. And then whenever it merges those changes back in, that's called checkpointing. And, and the the great thing about the wall is that you essentially you have every version of a page until you checkpoint it back in.

So you can have different, readers kind of have a different snapshot of the database at every point in time when they started. So it has this nice that nice property. It's called serializability. So, yeah, that's essentially how the wall works. And the the issue whenever you have like a a kind of disaster recovery kind of streaming backup system, like, LightStream, is that you essentially need to track the changes from the last time you you sent an incremental backup.

Mhmm. And, like, the checkpoint is where you kind of have a potential to lose that, where, like, the database could write some changes, and then at checkpoints, it clears the whole write a head log. And then you never caught that last little transaction at the end before it did that. And so at that point, your database has changed, but you don't necessarily know how, and so you can't send off that increment to to some other place to store. It's now become an amorphous blob again that is the database, and now your right of head log is empty.

Is that right? Yeah. Exactly. And you don't really you can't detect that necessarily. Mhmm.

So you actually just even worse than, like, missing the transaction is, like, you're actually correcting your database, potentially. So that's bad. You don't want that. That's bad. Yeah.

So the the one nice trick, a little this is the little hack is that SQLite doesn't allow you to actually checkpoint if there's any open readers on the database because they might potentially have changes they're tracking within the wall. So what LightStream does is essentially it keeps a long running read open, on the transit on the database. And then whenever it decides it's big enough to tran like, to actually do the checkpointing, it'll kind of control that process of checkpointing and make sure it has all the data before the process happens. And then it just immediately clicks back in and grabs that, read again or read transaction again. Mhmm.

So there's like So Yeah. It's it's very yeah. Awesome. I I love this so much because this is a such a clever solution, working from the outside in. Like, you didn't have to talk to anybody at SQLite.

You didn't have to make any requests. It's amazing. Tell me on a technical level, how so when you who first of all, who decides, in in a LightStream setup, who decides that a checkpoint is ready? Because I know in a traditional setup, SQLite decides, well, it's gotten big enough checkpoint ready. Let's write it.

And a certain commit could take a little bit longer because it's doing the checkpointing as a part of that commit. In a LightStream setup, who decides, alright. It's time to it's time to commit. And then how do you are you literally plucking it out of the wall file before you allow SQLite to commit, or how do you handle that, that process there? Sure.

Yeah. So what we ended up doing is, yeah, we essentially copied the the same settings as what SQLite defaults to. I think it's 4 megabytes, for a wall. Like, once it exceeds that, then it'll attempt to start copying stuff back over. We essentially use the same settings and then kind of we don't even we don't implement the actual checkpointing process.

We actually call into the the functions inside of SQLite to do the checkpointing. So that's not implemented by us. They do document their whole wall format. So it's pretty easy to read. It's basically the page data, and then it has some check sums and, like, an incremental check sum and some salt and little all those things as well.

So there's a lot of, like, little checks within there just to make sure that you're not, like, getting the wrong data or getting, whatnot. So, yeah. So it essentially handles it on its own and just copies the the settings. Okay. And so when you reach that point, let's say it's 4 megabytes, when you reach that point, you have lights you being livestreamed, you have decided, alright.

It's time to transfer this into the full on the full on, dot SQLite file. So then you presumably have some sort of internal state that says the last whatever check sum or the last page that you have sent off to s 3 or s 3 compatible. And so then you investigate the WAL file yourself, find where that is, send anything after that, and then let, SQLite do the rest and put it back into the dot SQLite file? Yeah. Pretty much.

And we actually have one little difference is that we actually have a it's called, like, a shadow wall. So, essentially, we're copying stuff out to, like, a a directory next to the database. And that's it'll store kind of the incremental things to send off because you might not have connection to s 3 or wherever you're sending to for some period of time. So it doesn't prevent you from checkpointing, without sending off. Oh, that's interesting.

Shadow wall Mhmm. That when you let SQLite commit, in case you can't reach s 3, you have put it aside such that when s 3 is reachable again, that'll be sent off? Yes. Yep. Oh.

And you can track stats of this stuff within, like it has, like, Prometheus metrics. You can attach, you know, Grafana or whatnot too. Super clever because you wanna stay man, I'm just learning so much. That's why I keep repeat I keep repeating it back to you to make sure I have it. So you wanna stay you need to be safe because you gotta stay out of the way of the database.

And one place that could fall down is you're trying to send off changes and east one is down again. You can't reach it. And now everybody is trying to use this database that you are blocking because s 3 is down. And so to get around that, you've got a shadow wall that you can just stash. Yeah.

Exactly. Did you did you bake that in from the beginning, or did you have did you have that? Early on. I mean, part of it is that you kind of need to know the page that you read, just to make sure that it lines up. Because every page that's written essentially has this rolling check sum that goes through the whole right head log.

And then the event that your, you know, LightStream dies for whatever reason and comes back, it needs to be able to check to make sure that there's not new transactions written or that hasn't been a checkpoint, things like that. So it does a bunch of those checks, but it has to have the previous data in that shadow wall to actually Yep. Check it against. Okay. That makes a ton of sense.

Okay. So And then it just kinda worked out that it happened to be a shadow wall, and it it's disconnected from s 3. That makes that makes a lot of sense. So they really say want downtime, like from your your disaster recovery solution. That sucks.

No. You don't want, then you are the disaster. That's the opposite of what we're looking for. Okay. So you send incremental, backups off to s 3 or s 3 compatible, right, which would include at this point a Backblaze r 2, Cloudflare something something.

Yeah. We also do GCS, Sorry. Google Cloud Storage. We also do Azure. I should know the other ones, but they're yeah.

Most a lot of them are s 3 compatible. So Okay. So then in s 3, what would you find? Would you find a bunch of little files that are all, like, checkpoints? Or what do you find in s 3?

Sure. Yeah. So it has essentially 2 subdirectories. You have one that is gonna be your snapshots. So you can configure LightStream to snapshot every, by default, I think it's once a day.

And then it'll, next to that, it'll have like a a set of wall files. And those wall files will essentially just keep incrementing up, and they can get replayed in order. Cool. And then you basically can reproduce. It's nice because you can actually reproduce the state of your database at any point in time between the snapshot and now.

So if you did do this, like, you know, delete table kind of business and you need to get back to, like, 5 minutes ago, we can do that. Man, that rules. Okay. Yep. That makes a ton of sense.

Okay. And then what is the recovery story or, like, the restore story for LightStream? And how far do you go to aid the developer to do that recovery? Sure. It's just a restore command within LightStream.

So you say LightStream space restore, and then you give it, yeah, the name of the database kind of where the replica where it's going up to. It'll pull it down, and then it'll essentially start with a snapshot. And then just pull down a bunch of those little incremental changes and keep replaying those. It actually uses SQLite to actually replay them. That can actually take a little bit of time if you have, like, thousands of them.

Yeah. So, it can be beneficial if you actually snapshot more often if you have a lot of changes. So maybe once an hour, depending on your database size. But the nice thing about, like you Oh, go ahead. No.

No. No. Please. The nice thing about, like, B tree databases, which is what SQLite is and BoltDB, is that, like, just by default, they tend to have a bunch of space in them. You tend to, like, split trees a bunch, so you end up having you never have, like, a, like, really tightly packed, database.

And that plus is, like, all text data, typically Mhmm. Is that you can actually just compress the the crap out of it. So it's like, you know, you might have your database is, like, you know, 16% of the size of the actual real thing. It's crazy. So it's not it's not we do, I think, l z 4 compression, so we'll do that and send it up.

How do you, how do you make, how do you trick SQLite into replaying these these incremental walls? Do you just, like, I'm gonna guess, and then you tell me why I'm wrong. So you pull down a snapshot, which is like a, you know, dot SQLite whatever. Then you pull down a wall file and then open and close some sort of connection, and SQLite says, oh, there's a wall. I'm gonna commit it.

Is that it? Bigger. Bigger. Yeah. I mean, we'll pull down the the wall files in parallel behind the scenes and then basically do a bunch of open and closes.

Control what shows up what what's visible to SQLite with naming conventions and stuff. Yeah. And we can actually improve the the open time or, like, the the restore time if we actually did it ourselves. It's not that complicated to replay the wall, honestly. So I might do that in the future, but, yeah.

It's not it's not too bad. But there's just there's something so, I if I were you, I would feel such great relief to know that that's not your job. Your job is just to like put the stuff where it goes and then let the, you know, the 25 year history of SQLite handle, you know, doing the right thing. Yeah. Which feels kinda like, yeah, I don't have to worry about that.

That's a whole set of problems you don't have to develop for, test for, future proof. That just feels like, that feels kinda nice. Yeah. I mean, there's definitely that benefit as well. Yeah.

We got into a lot more of the the nitty gritty gritty details of, SQLite behavior when we did, like, light effects next, and we can talk about that as well. But, so, yeah, it's that part's not too scary. At this point, we actually had to do replays of walls and things like that. Mhmm. But, yeah, no.

I I agree with you. It's it's nice. It's not it's not my problem kinda thing. Yeah. It's not your problem.

I love not having problems. Okay. So let's talk, unless there's anything else about LightStream, I feel like the level of complexity you were like, that was too easy. Let's double up and write something else. So unless you have anything else on LightStream, I wanna talk about FS.

Sure. Yeah. No. Sounds good. So my, my understanding of Light FS is another it's like, this is so great.

I feel like I love your style. It's it's, another approach of making SQLite do something without telling SQLite that it's doing anything. So so my my take on litefs is, SQLite has this layer that, like, integrates or or talks to the operating system. Is it, like, VFS? Is is that what they that the layer they call it?

So the so you can do a VFS within SQLite, which is called a virtual file system. Uh-huh. And it has its own API within SQLite. But the problem with that is you actually have to like, SQLite has to know about your underlying system. It has to get loaded in manually and all this stuff.

And we really wanted to make it very invisible. So we actually use FUSE, which is like a file oh, god. I should know the name of or what this stands for. It's a user level file system within Linux, basically. Okay.

And so I thought where this was going was instead of, you know, using the the Windows or Linux or the OSX like VFS, it uses LightStream or LightFS. That's wrong. So tell me how how LightFS works, because clearly I I had the wrong idea on this. Sure. Yeah.

I mean and this actually stemmed out of LightStream. So light LightStream, one of the versions like, one 2 of the requests we got more than probably anything else was that people wanted to be able to have something to roll over to, like switch basically the primary node, like have a failover when they do deployments because you can get down downtime otherwise. And there's things you can control with that or, like, get around with that. So people wanted to fix, you know, the downtime when you switch over, and they also wanna read replicas. So you might have, like, your main server in Virginia, but you might you know, it's 100 milliseconds to Europe or 250 milliseconds over to Asia.

So, like, you might wanna put read replicas over there and make them super fast for everyone in the world. Mhmm. And I originally tried to do that in LightStream Mhmm. And it I kind of got something working. But there's just like, since you don't have control over SQLite and, like, the actual, the right calls and the reason everything, you just have you don't have quite enough control to really guarantee that you're not gonna get corruption or that you're not gonna send wrong files or whatnot.

It's just it's a really touchy thing, when you get down to that level. So the idea was basically to kind of take over some control over that. So we made this essentially pass through, file system called litefs. Mhmm. And what it does is when SQLite does a write to to the file system, it'll go through our file system.

We so can detect it, and we basically pass it off again to the underlying the real file system. So we can track every single write call that goes through, and Mhmm. That gives us also, we can track our the locks on the actual database too when they because it's like a bunch of locks that they they use. So we can detect when a a write starts, when it ends, and all kinds of things. Just a lot a lot of it's from the locks so we essentially can demarcate the start and end of all transactions, and then we can track all the rights that occur in between them.

And then we can kind of read out those changes for each write Mhmm. And then stuff them into what we call an ltx, light FS transaction file. And Okay. That's essentially like a little package of, like, a change set for a transaction Mhmm. That we can ship off to other replicas.

So So they can replay that same transaction, and it's, again, just like page changes. Mhmm. So we can have essentially a primary, and that gets all the changes, and then it can ship them off to other, replicas. So help me understand, and this is this is my low level systems knowledge or rather my lack of, low level systems knowledge showing. How do you sneak in?

I I'll be honest. I don't know what FUSE is. I know it's all caps, and then and I know that neither of us know what it stands for. But how do you sneak in in between SQLite and the actual file system? What what what does that even mean?

Sure. Yeah. So it's a so I did actually remember what it is. It's it's file system and user space. So you're essentially writing an actual file system.

Okay. And has calls for rights and, you know, make directory and all that stuff. Mhmm. And they're they're actually kind of fun to build. Honestly, I've I've built one a long time ago where you like a good like the guy that builds databases for giggles.

So take that with a grain of salt, everyone. No. Maybe one like a long time ago where it's like I wanted to be able to see my GitHub. Like, look through GitHub and, like, you can, like, basically look at GitHub as a directory tree. So you can, like, look at repos and users and, like, there's Oh, it's pretty cool.

Yeah. Yeah. It's cool. Yeah. Yeah.

Just a little throwaway kinda fun little project. You got me. That's pretty cool. Yeah. I think it was like mentoring some some students one time on like some stuff and we're like, oh, let's make a project.

And it's just what we came up with. So Yep. Just a random anyways. Yeah. Those students are gonna be just fine if that was their little little projects.

They'll be okay. So we ended up, oh, yeah. So, yeah, file system. So, essentially, they're, like, loaded up as a file system as you would, like another file system. I mean, there's a there's a mount command within Fuse.

So, essentially, like, it's literally a file system. Okay. And then you also give it the a target for where you want it to write its data, which is on the Mhmm. The normal file system that you're already using. So it kinda has its own data directory and whatnot behind the scenes on your file system.

It's just it kind of makes a little shim in between the 2, essentially. And then for the user, for the developer, how do they make SQLite aware of litefs? Sure. So they just have to point it at the directory that comes up from that file system, and then they can start making databases just in the way they normally write them. Like, you just open a connection, write your database.

All the changes are happen invisibly, and they get kind of shutted off behind the scenes. So it looks exactly like a regular SQLite database. Mhmm. Just on a different file system. Man, that is super clever.

Okay. So you have, light f s is a custom file system that is basically like a like a decorator or a proxy or something. And so you're listening, not listening. Their act SQLite is actually writing to you. Mhmm.

And then you're passing that through and setting it aside. And then when you reach a certain point, maybe the transaction closed or you've gathered a few transactions. I don't know. When you reach a certain point, you fire this this custom packet off somewhere, and light FS is waiting on the other side to receive this this unified trend like, this file format that you both speak. Mhmm.

And then you turn that back into SQLite on the other side and pass it through to the underlying file system over there? Yeah. Essentially, it's I mean, yeah, more or less. So we we package up every transaction separately, and then it stores on the primary node for some period of time. And then the other nodes actually connect up to the primary, and they'll just continuously stream down anything that comes in.

So Oh, I see. From the time you actually write the transaction, you know, it takes less than a millisecond to actually replicate, plus any latency, like physical latency. Yeah. Plus some speed of light. Okay.

Yeah. That dang speed of light. Yeah. I know. We'll get it one day.

So you have these other you have these replicas. These other these other servers have light FS on it, and the other servers are aware of where the primary is or rather, I guess, what the primary is. Mhmm. And it has some process there that is constantly pulling or there's some something fancier to watch for new transactions it needs to pull. Is that right?

Yeah. It's just the constant connection with its Okay. Yep. Okay. Stream yeah.

Streams them down. And then, yeah, it has a bunch of checks on me and all that stuff in there to make sure it's, everything ends up in the appropriate state. We actually do, like, a a rolling check sum, which I don't I'm sure other databases have done this. I don't think we're novel by any means. But, like, it's interesting that we actually we'll take the pages that are changed, and we'll essentially do an XOR check sum to remove the old version of the page from the check sum and then put in the new version of the page as a check sum.

So can actually compute, like, a a rolling like, an actual checksum for every single transaction in the database. Mhmm. So we actually can really clearly see if we have any corruption anywhere in the process, which has been nice. And we can actually gather some go ahead. Yeah.

No. I was gonna say that seems like rule number 1 is Yeah. You've gotta make sure that everything that's supposed to get there gets there and nothing that's not supposed to be there gets there. Because if you end up with, you know, a corrupted replica down in Australia, you're hosed. Like, that's that's a that's the that was what you were trying to solve here.

Yeah. And people some people also wanted the failover piece where, like, you could have multiple primary or not primaries. You can have multiple candidates essentially. So you might have 2 that are in, one region, and they can just flip flop when you do a deployment. We actually use console.

Like, HashiCorp's console, is one of the the things we use for that. So it's optional. You can either have, like, a fixed primary, one machine is always a primary, or we can actually have it where it does a leader election, and then it can flip flop over. But so we have some looser guarantees when you use something like console about who can actually like, when a transaction has actually persisted, and these actually get to the other candidate. Mhmm.

But you can have a scenario where you get it right that's actually written down and not replicated to the, the candidate yet. So when it does flip, like, you could lose that, you know, little sliver of time. Mhmm. We need to be able to check to make sure that the checksums, you know, if they do change between different nodes, that we can re snapshot and send those down to the replicas. Okay.

So talk to me about, how do you do? So you've got, somebody you got a primary in Dallas and a replica in Australia. Somebody in Australia does a write. How do you do write forwarding, inside of of fly. Io?

Because I understand this is, like, kinda y'all's y'all's bread and butters. Put put stuff on the edge, and we'll figure out the rest. And then is it possible outside of Fly dot io? Sure. Yeah.

So, like sequel or sorry. LightFX isn't specific to to Fly. We we certainly want people to use it on there. But it's open source. It's all Apache 2.

So you can go do whatever you'd like with it. And we'll work on any system, really. There is some stuff we do in there. So one one issue we ran into after we got, you know, stuff working was that, we originally had like, probably the best way to do it is to have the application send the rights, like, send all rights to the primary, and then you can read all your reads from the replica. But, again, that speed of light thing where you might send a right to the primary, but then you read from the replica before the right actually gets down to the replica.

So you can get this out of order kind of appearance in the application. And we spent quite a while trying to figure out the best way to to to manage that. And, essentially, like, you know, we don't we don't want everyone to have to, like, check for, like, a transaction ID on every single read they do and do a bunch of, you know, playing around to make sure they have, like, a consistent view of the data. So we actually have a an optional, like, HTTP proxy you can run-in front of your, application. So it'll actually check for the, essentially, the the right transaction when it went in, and make sure that your reads are caught up before the next read transaction comes in and does that kind of invisibly to the application itself.

So we kinda, like, sandwich, like, your applications here, and and there's, like, a proxy here and, like, a file system here, like, you know, lovingly hugging your application Yeah. To give it consistency and all that. So, Yeah. So right now, it's it's mainly on the application side we typically recommend. We did some work with write forwarding within SQLite, but it was just it was more of a pain, I would say, it's worth, typically.

Yeah. That sounds that sounds pretty complicated. So what's the what's the story on like, if somebody somebody did wanna spin up light f s, what is the, like, how do you how do you get it? How easy is it? It sounds sounds like you've got it pretty nailed.

But tell me, what would what would one do? Yeah. We have some some docs on on Fly. You can check out as well. I think it's fly.iodoc/litefs.

Mhmm. You can go on there and, essentially, you we have it all bundled up as a as a Docker container or Docker image. The funny thing is we actually don't use Docker as, like, the thing because you actually need to pull the binary in to your your own Docker image. So we just actually copy it out, which is it's like a really convenient way to use Docker to, like, copy binaries if you're using Go because it's all static binaries. And, yeah, so we essentially you can copy it down to your your development for your, your Docker image, and you run it and then it kind of runs your application as a supervisor.

And then beyond that, there's if you have an application where your your writes all go through, like, post and patch and all those kind of standard HTTP verbs, and then your reads all go through Gits, then it kinda just works more or less. You essentially just point your your application. You can actually do it with, like, WordPress. I think we've done that before. That is If we just spin up WordPress and, like, point it at light FS, and it just magically does the right things, which is kinda cool.

I bet that felt pretty cool. Yeah. Yeah. That was that was a good one. If you if you can do it if you can just make it work with WordPress, that's how you know you've got it you've got it sorted.

Oh, yeah. That's pretty amazing. So what do you see for what do you see for the, like, philosophical demarcation between LightStream and LightFS, and where do you wanna take both of them moving forward? Because is this what you do at Fly? Is this your full time you just you just write SQLite stuff and talk about SQLite and build LightStream and light FS?

Is that all you do? No. No. It's not all the it's not the one thing I do. I do that, but I also do we do a bunch of, like, back end work reliability stuff.

We do a lot of, You you do that? Mhmm. Yeah. Oh, okay. So just kinda I thought you got to test.

Just play in the lab all day. Sometimes. Yeah. Sometimes we do. Yes.

I mean sorry. What was the question again? The philosophical demarcation between the two things, and then what your vision for the future of them is. Sure. So LightStream was I kinda realized, like, adding all the replication stuff and all that just made a lot of pieces a lot more complicated or a lot more complex.

And essentially, LightStream is kind of a disaster recovery only solution. Like, that's all it does. So if you have a single node and you wanna stream up do a streaming backup somewhere to s 3 Mhmm. It's great for that. Like, you can do that all day long.

And then LightStream is never gonna be more than that. That's kind of its its goal. And then LightFS is kind of the distributed systems. Like, you need replicas or you want replicas, you need, you know, low latency and all that stuff around the world, or you want failovers, all kinds of stuff like that. It's has a lot more complexity.

We also have a thing called LightFS Cloud, where you can stream your backups up to to fly, and we'll save them, give you, like, point in time, recovery as well. So, would it be reasonable to say that you could use light f s to get embedded replicas everywhere and also use LightStream to stash all that away on s 3? You can. It gets a little more complicated, but it is possible. You can run it.

Because you might have multiple primaries. That's kinda where the issue comes in. And you'd essentially have to replicate each separate primary. Oh. So it's not as much fun doing both.

But, but LightFS cloud is kind of the disaster recovery side of the LightFS piece. Okay. Okay. That makes sense. And what do you see, beyond or maybe inclusive of building out Light FS Cloud?

What do you see for the future of Light FS? Sure. I mean, I think we might I mean, Light FS Cloud is paid service. $5 a month. You can get backups up there.

And I think, you know, we have some work we wanna do around moving some of that stuff into, like, a simpler version into LightFX itself. So you can do streaming backups to s 3. It gets a little more complicated when you don't have, like, a single service that can kinda manage Mhmm. Like, leadership and things like that. So there's some little hairy problems to get in there.

But I would say as far as light FS, I mean, a lot of it is, you know, I think improving performance is always a good thing, optimizing and just squashing bugs. But, you know, I don't I don't have, like I wouldn't say I have, like, grand features. I mean, we have some things that people have requested, like having a an HTTP query API. So you can actually query light effects over HTTP, which I know. Actually, we I think we talked to, actually, to see what their format was for that to try to use the same one.

Mhmm. So it's, yeah. There's there's been a handful of requests like that. Yeah. That's that's super interesting.

Yeah. But I like to keep it mostly simple, though. I mean, I I think a lot of database companies just tack on features all the time. I used to be an Oracle DBA, like, 20 years ago. Oh, yeah.

And, like, the number Honestly, I see. Okay. The number of books that you have to, like, read just to, like Mhmm. Understand the basics of Oracle is crazy. These are just these are just my secret right here.

I mean, that's yeah. They just keep going. You kinda know a lot to Yeah. Run these things. Okay.

I like that. It feels to me like, this is, like, LightStream and LightFS are spiritual cousins of SQLite in that way that it's, like, small, fast, reliable, and kinda done. Like, there's you know, SQLite will continue to add things as we invent, you know, CTEs and window functions and all that stuff. They they continue to push forward. But in terms of, like, hey.

The you know, Lite FS is good at what it does. We'll make it faster. We'll make it more reliable, but we're not gonna just continue to bolt nonsense on there. It feels like, ah, it has a home with SQLite. Y'all are y'all are twins.

Yeah. I really love projects that have a scope and like, like bullet was kind of like that. At one point I just I've marked it as 1.0 and I was like, this is done. I mean, obviously the bug thing bugs can come up and whatnot. But essentially, like, I wasn't adding any more features.

Like it's key value story. Like you get keys and values. Like there's not a lot more than that. Yeah. It's pretty pretty straightforward.

Yeah. So, yeah, that's kinda where that I I like projects that are like that. I feel like Mhmm. There's too many projects to just go way too deep and, like Mhmm. Add too many things.

And I feel like whenever people talk about their favorite projects, it's never like, oh, I love this grandiose complex thing. You're like, oh, I like r sync or like what like, some, like, simple tool does one job. It does it great. Yeah. So The same with, like, their favorite little Mac apps or whatever.

It's like this random little thing you've never heard that solves this one problem perfectly, and everybody's in love with it. Oh, yeah. Yeah. Monodraw? Have you ever used Monodraw?

No. What is that? It's like a, an ASCII art editor. So you can draw diagrams in ASCII art. It's like I mean, it's a good product And Mhmm.

Honestly but they're not, like, adding crazy features. There's no, like, sharing to Twitter, I don't think, or, like, you know, whatever. Like, every product seems to add. Mhmm. It's great, though.

There's no AI in there? Not not yet. Not yet. Yeah. Yeah.

No. God forbid. Yeah. Okay. I kinda wanna finish by talking about, this, like, sequel lite renaissance and, like, what is your which I think you have helped drive with these tools.

So what is your perspective on why now? You know, it's been around since the year 2000. But why now? What what what is driving it, and what where do you think, like, its best and highest use case is? Sure.

Yeah. I think that I think the Renaissance is driven by a couple things. I would say complexity is a big thing Mhmm. Where it just seems like there's so many dependencies you need to add to a project just to do a hello world at this point. And it's just it's a little absurd for certain things.

I mean, not for I mean, everything has its place, obviously. Sure. There are tools that are written for a reason, but you don't need to do some Google scale project when you just need it to do it. Mhmm. So I think complexity is a big piece.

Just having one file you mean, like, the, the rage against current complexity and you feel like this is part of swinging it back? Yeah. Yeah. For sure. Okay.

Especially for certain projects where you don't need all that stuff. Like Mhmm. When you have SQLite, you don't really need a cache, like Redis or something or memcache. Like, everything's right there right next to you in memory half the time. And, like, adding a cache would be ridiculous most of the time.

So I think that there's, a lot of pushback around that. I mean, also other tools like, you know, people use Elasticsearch, but you may not need that based on the full text search that's built into SQLite. Or, like, you have JSON support within SQLite. And, like, there's all kinds of little great tools within there that do a lot of the job. It's kinda like I write and go mostly.

And, like, the Go standard library has, like, 90% of what I need for, like, most applications, surprisingly. I I've never used Go, but that is the thing that I keep seeing on Twitter. It's like, oh, yeah. Go Go has that. Oh, Oh, yeah.

Go has that. It's like, that seems cool. It's Yeah. That's great. It's Yeah.

So, I think there's a lot I think there's a lot you can do with just a basic, super light app, which is great. Mhmm. And I think the other thing too is just the the speed of our machines is just wild at this point. Where, like, you can really I mean, you can serve I mean, again, I wrote Go, but, like, Go and SQLite, I can serve 100 or thousands of requests per second with a a fairly minimal setup Mhmm. Which is wild.

And, you know, most people are getting maybe less than one request per second on their app. Like, you don't need some crazy deployment. Or at least not to start. You know, you can always what's that? We We are getting less than yeah.

It's like we get thousands of requests per month, and you're like, oh, okay. You'll be just fine. Even per day. I mean, there's like 8 86000 seconds per day. I mean, like Yeah.

Yeah. So I think that there's there's a place, I think, for SQLite, for, I would say, the vast majority of applications. And once you start to scale, there's reasons to scale. There's reasons to break things apart Mhmm. Especially when you get, like, a big organization.

But a lot of times, you're writing, you know, either, like, an internal app for somebody or you're starting a startup or those things. And those don't need to be crazy, I don't think. Yeah. I'm also a big believer that, like, whenever I mean, I think it's been said where, like, every time you scale up, like, an order of magnitude, you have to rewrite your application. Maybe that's not, like, quite the exact I get that.

We'll go with it. Yeah. Yeah. It's not the exact good to me. Yeah.

Idea, maybe every, you know, 2 orders of magnitude you gotta restart. But, Yeah. I think it's you eventually have to rewrite anyway, I would say, to some degree. And that may be that you move to a different database down the road if you need to, but you don't need to plan for that at the moment. Yeah.

And I think one of the one of the selling points of SQLite is that it's, you know, it's very powerful, but it's relatively constrained. And if you need to move on to Postgres or MySQL, you're gonna have a pretty easy time doing that. Like, you'll have to set up a bunch of stuff, but your actual queries and stuff most likely are gonna work, especially if you move towards Postgres. I feel like they took a lot of inspiration from Postgres versus MySQL. Whereas going from Postgres to SQLite is gonna be a lot harder.

Yeah. That you don't want it to go that way. So it does seem like the graduation story, should you need to get there, is is pretty good. It's pretty straightforward. Yeah.

No. Yeah. I think 100%. And there's a lot of cool stuff in Postgres. By all means, like, I don't mean to knock any other database.

You can use Mongo if you want to. Like, there's a place for everything. But, yeah. Yeah. The graduation stories.

The what's what's the opposite of the on graduation stories? Yeah. Yeah. It's gonna be rough probably. Or, yeah, getting held back, something like that.

There's cool stuff like lateral joins and, Postgres. You know, that's cool. Mhmm. But you don't have that in SQLite. Mhmm.

Although, I would add, like, a lot of things you don't actually need when you actually have a database right next to you in memory. Mhmm. Like, I I feel like a lot of, like, complex, big, huge queries, you could break up into small queries if you don't have any, you know, per query overhead, like, latency. If there's no 25 millisecond penalty to get even get to the database, you're not having to do everything in one shot. Yeah.

You can do, like, hundreds of queries in a a page, and that's not a big deal in in SQLite, which is nice. I think the SQLite docs show how many queries are run on a single page. Like, 200 queries. It's like yes. It's like 220 queries.

And then it says, you know, page generated in point, you know, 0.001 seconds. And you're like, ah, okay. That's kinda cool. I get it. I get it.

So I like I like that. Okay. So is there anything else you wanna cover about SQLite? This, I'll tell you what, this has been I've learned so much and this has been such a blast and I'm very grateful for your time to to educate me. So is there anything else you wanna cover about SQLite, litefs, lite stream, or fly dot io?

No. I think it was great. I mean, I think that there's still a lot of potential out there, like, of things people can build on SQLite and tooling. And, you know, there's always, actually, one one thing I really wanna, like, make, and I kinda, like, half made it. Here we go.

I just didn't actually, like This is what I'm talking about. Give it to me. So, like, I did, like, a proof of concept of this, and it worked. Okay. I just didn't actually build it out, and it kinda needs to be written in something more low level, like Rust or sig or c or something like that Mhmm.

Is that I want to make, basically, like, a a driver for SQLite that will run how do I explain this? So, like, it connects over SSH and then communicates with the SQLite 3 binary on the other side to perform queries. So you actually get it actually works like as a, this is such a crazy idea. It's essentially like a a remote connection for SQLite. It's client server SQLite.

It's client yeah. 100% client server SQLite. So I actually called it I think it was like SQLite. It was like a mix of, like, SSH and SQLite. And, but the idea is not I mean, you could use this in something, you know, production.

Like, if you have, like a lot of times, if you have, like, workers, they might be in separate nodes and might need to connect up to the database. But it was mainly for, like I find the biggest hurdle people have is for, like, GUIs. Like, they need to have like, they wanna have a GUI for SQLite, and it's hard to do that when you have a remote SQLite data. It is. Yeah.

That is definitely hard. Yeah. This is insane, and you should absolutely you should devote all your time to this now. Okay. So the Internet said you need to do this, and you should you should devote all your time to this now, because that sounds that sounds amazing.

This is like this is that kind of like, this is that kind of crazy where you're like, yeah. This is this is crazy. I'm gonna do it. I like it. That sounds that sounds awesome.

You did a proof of concept. How far did you get? I got it working. I mean, we're actually so it does some stuff. I I actually made a request to to SQLite to see if they'd make a binary format.

Because the biggest issue is actually connecting, and then you have to, like, you know how it'll say, like, SQLite and then do, like, a a greater than sign for the, the prompt. Like Mhmm. Essentially, for results and, like, for parsing out errors and stuff like that, I need to, like, remove all the prompting it does, and it has some like weird prompting rules. So it's actually not that, not that hard. Yeah.

It was really more just a bunch of weird little things. Yeah, But it actually does work. It'll actually communicate over and run a query and get the results back. And, yeah. So anyway, like this in the Laravel ecosystem, of which I'm a part, where it's like, you're a mad scientist.

The stuff the stuff that you do you know, this guy the stuff that he does with PHP, it's like nobody should ever do that, but this is awesome. And I feel like you're the you're the SQLite mad scientist. This is crazy, and I love it. I think there's some cool people doing cool work as well. Like, Alex Garcia does, he's done a bunch of extensions.

I think he did, like, a vector database within, SQLite. Philip O'Toole did RQ Lite, which is basically SQLite over yeah. We used to work together, actually. It's funny because That's cool. We didn't work on SQLite at all at the same time, but we separately just did our own paths.

That's cool. And we're like, oh, hey. We're up doing this stuff. But, yeah, he does that as, like, a distributed system over raft and, which is a consensus protocol. Mhmm.

So, yeah, there's a lot of cool stuff out there. And then I know that Somebody just wrote, Yeah. There's, of course, Turso. And then somebody just wrote, a, like I don't know if it's wire compatible, but somebody just wrote a read a Redis clone in c Oh, I did see that. It's a red red cut or something.

Red cut, I think. Yeah. I saw that. I think I think that's right. Yeah.

I think Cloudflare does d one and Uh-huh. Like, it's it's nice. I like communities that are still pretty small, and I feel like Mhmm. Still kinda early days. We're, like, everyone's not super competitive.

So, like, some people will borrow ideas from other people and, like Mhmm. I borrowed some stuff from Terso for their HTTP API and Mhmm. Yeah. I think their local replicas use some of the stuff from Light FS, and so that's fun. Yeah.

It's fun. I I I do like the proliferation of, like, good options in this space. And funny funny enough, like, one of the I feel like one of the drawbacks that everyone is always talking about, is at least on Hacker News is, like, SQLite doesn't have types. And you're like, it's a long story. But Yeah.

If you're using it for a key value store, that suddenly is a great benefit. Yeah. Because you can put anything as Anything of it. For the value. Yeah.

And so it's kind of fun to see people, like, do a, you know, a Redis reimplementation because it's like, well, you can just throw anything in there and we'll hang on to it for you. Yeah. I used to work a lot in, like, key value stores and really it's like it seems like it has lots of types compared to that. Mhmm. Yeah.

So yeah. I guess that's true. Yeah. It's kinda funny. Okay.

Well, I really appreciate you doing this. Like I like I said at the beginning, I've read all your work. I'm very, I'm a big fan, and so this is a delight for me. Where can people, keep up with the stuff that you're doing? And I guess where can they follow you?

Sure. Yeah. I'm on Twitter. I've been as active lately, but, I have some other work I'm gonna be doing soon, so I'll probably be a bit more active there. Yeah.

The fly blog is usually a good spot for for content as well, or just open an issue on GitHub or, yeah, just reach out. I'll say hi. Okay. Well, thank you so much. This has been wonderful.

Yeah. Thanks for having me. Appreciate it. Yep. See you.

Full Course

SQLite for Production

Summary

Video Transcript