High Performance SQLite is more than just an SQLite course, it is a general SQL masterclass. The course has unlocked new breakthroughs for query performance in my day-to-day, it's a great level-up!Eckhardt Dreyer
Build with and integrate SQLite into your production applications.
I've made my test data available for you to use and follow along.
So today, I'm talking to my friend Carl. Carl and I didn't ever work at the same company, but we worked together. He used to be at at a database company called SingleStore, and I wrote the SingleStore driver for Laravel, and so we got to work together a bunch. But these days, Carl's doing something extremely interesting, and so I wanted to talk to him. So, Carl, welcome to this not show about SQLite, and tell us, what you're up to and where you're at now.
Thanks, Aaron. Super excited to be here. Yeah. So I mean, as Aaron said, I I worked at SingleStore for 10 years. SingleStore used to be called MemSQL.
So it's a little database company. We started with this dream of making, you know, high performance combo of, like, analytics and transactions in same system. And I think we did a pretty good job. Single store, after 10 years at single store, I decided that, I really wanted to try some new things. Still in the data space, but just doing my own thing.
And so I decided to leave Singlesore and start, something new. Currently, I am technically an entrepreneur in residence at Amplify Partners. Amplify Partners is a venture capitalist in San Francisco. And I decided to take on this role because it allows me to pursue my own ideas for some some amount of time. So it's sort of starting my own company without committing a 100% into going right into, like, finding product market fit and finding, like, that sort of company drive.
So it's sort of a a wiggle room space. Yeah. With Amplify, I have sort of started a company, I guess. It's called Orbiting Hail. It's just me right now.
So it's more just like me and bunch of ideas, around data and databases and web technologies. And most recently, I released a project, my first project called SQL Sync, which is probably we're gonna be spending most of the time talking about. SQL Sync is basically completely vanilla SQLite. Nothing nothing changed there. And the wrapper basically basically implements multiplayer SQLite.
So you can have SQLite running in a bunch of different web apps or the same web app, but in a bunch of different tabs or windows or whatever. And they can all communicate with each other through SQLite as sort of a central database point of communication. So that's what I've been up for. So let's, I do wanna talk yeah. I mean, I do wanna talk about, SQL sync a lot, but I, this entrepreneur in residence thing, that's super interesting.
So you were you were, like, my my perspective on you at single store was you were, like, the mad scientist. Like, you knew you knew you knew everything. I didn't ever really know what your job was, but you were, like, oh, if there's a problem, you just talk to Carl. Carl will fix it. And so you came out of that.
And how did you get hooked up with this, this VC firm? And are you the only entrepreneur in residence over there? Great great questions. Yeah. So I started chatting with Amplify about 2 ish years before I left single store.
Mhmm. Not not with the intention of starting anything or leaving a single store. Mhmm. It's just that while I was a single store, as you said, I was a bit of mad scientist. My my last project I'm glad you agree with that.
I'm I'm glad I'm glad you felt the same way. Yeah. You know, everyone at startups fulfills different roles. I was definitely in the, like, generalist space where it was sort of, you know, if you don't know who to do something, you can always send it to Carl and he'll figure it out. That was sort of a role I really enjoy, at especially at a small super agile company like SingleStore.
So so yeah. The last project I had to work on at SingleStore was adding WebAssembly to SingleStore. So the idea that my team had was what if we took WebAssembly and we allowed people to write WebAssembly programs and embed them directly in the database, as user defined functions, table value functions, user defined aggregate. So basically, add extensibility to the database via WebAssembly. And so my team was working on that, which is super super fun.
And through that, I met Amplify because Amplify was really excited about web assembly. So I met, Renee and Sunil through that sort of connection within the, like, web assembly space and and everything. Stayed stay in touch with them. And then when I eventually got to the point where I was, like, oh, maybe I wanna do something new, I talked to pretty much everyone in my network including Amplify. And Amplify sort of proposed this idea of doing an entrepreneur residence.
It's not something that, you know, every VC does or they do with everyone. But for people like me where I I knew I wanted to do something but I wasn't entirely sure what it was. I needed that that space and that time to sort of figure it out. That's exactly what the entrepreneur residence program is sort of designed to to provide. That is super interesting.
It sounds I mean, it sounds awesome. 10 years, by the way, is forever. So I I can understand that you, like, were looking for a new challenge after 10 years. I don't know if I've done anything for 10 years. Coming up soon in in December, I'll have been married for 10 years, so that that counts.
But, like, I don't know what my longest job is. Maybe 4 or 5 years. So now you're out there. You're the you're the entrepreneur in residence. You're out there.
You, you have created this thing, called SQL Sync. And you say it's vanilla SQLite, which I am intrigued, but there's some sort of wrapper around it, and it works in the browser. And I played with your demo for a little bit. And so, like, give me the give me the high level of, like, what the, I guess, what the use case is, and then the high level of how it's implemented, and then we can dive into some technical details. Yeah.
Yeah. So so we'll start with sort of like the use case, like, why why would you want something like SQL sync? Mhmm. My my dream of SQL Sync was to just make it really easy to make multiplayer web apps that specifically needed, like, a relational data model. Right?
So, like, there are many ways to implement multi layer web apps. There's a lot of companies right now playing around with these ideas. You know, like, Electric SQL, for example, which I'm pretty sure you know pretty well because I think they're you're working with electric SQL right now. Right? I'm working with, Terso with Libs With Terso.
Sorry. Sorry. There's too many there's too many SQL. There's a lot. Yeah.
So you're working with Terso but, there's also Electric SQL which is doing, really similar things to to do, like, you know, collaborative sort of actually, they're using sort of more like postscripts. They all do they also use SQLite under the hood. So there's a bunch of companies doing this. The idea was, can I make this just, like, super super easy? That was, like, idea 1.
Idea 2 was, can I eliminate the, like, can I can I make the way in which multiplayer actually is implemented simpler to reason about? So that that was sort of option 2, and we can talk more about how I did that and, like, how I think I did that. I hope it's easier to reason about at least. And then, so so that was the 2 things. And then option 3 was, like, can I keep this as vanilla SQLite as possible?
Like, can we make this, like, very, very much just, like, it's just regular SQLite, and then we're just doing something creative and and sort of magical to make it multiplayer. Okay. So sort of trying to combine those goals, into something called SQL sync. Okay. So this is great.
I love this so much because when you're when you're the the host of the show, you get to ask all the questions. And I have so many questions because I understand very little of what this is what this is. And so this is, like, my learning opportunity right here, and it delights me. So let me ask you this. You you talked a lot about, like, real time and collaboration.
I'm not super familiar with something like Party Kit. Is this in the same space of something like that, or do you know what that is? Because I've seen a lot of people talking about, like, add real time with Party Kit. But as far as I understand, your distinction is relational database real time. Is that close?
That's actually like a really really good question. Like, a really insightful question. Because we can sort of break up how you implement multiplayer for an app Mhmm. Into a couple different domains or or like Okay. Different areas.
Party Kit satisfies the transport part of multiplayer. Basically, you could think of Party Kit as providing, like, a session back end that a bunch of different clients can all connect to and can exchange messages. And then once you have a party kit back end, you can host a data structure in that back end that all of these different clients can sort of communicate and and exchange information about. And so party kit, you can think of as essentially a a sort of like a a stateful serverless process that a bunch of different clients can all connect to without having to preconfigure the existence of that server. So, like, in traditional server architectures, I would spin up, like, my VPS or, like, my Amazon EC 2 instance or or whatever some instance that would be my server.
And then all my clients would connect to it. Right? So I'd have one big server for all my clients. Party Kit sort of takes it one step further to say, well, actually, what if we have sort of, like, one server? We make those servers dynamic so they come up whenever they're they're needed and they go away when they're not needed.
And we associate each server with, like, a a concept of, like, a session. So if you wanna use Party Kit to implement, like, a chat, for example, you would implement a party, which is essentially a durable session server. Oh, silly. But yeah. Okay.
A party for each of your different, like, chat rooms. Right? And so everyone who's connected to the same chat room would be connected to the same durable sort of, like, dynamic serverless server. And in that model, I know you I know you don't write PartyKit, but this is you seem to know everything about it. In that model, who owns the durable part?
Is PartyKit a SaaS? Or who like, the serverless thing, someone has to orchestrate it. Who is that? I I will caveat. I may not know everything about I already I probably make some other things.
So if anyone on the internet catches me on the please let me know, you know, I don't They they will. Don't worry. They'll let you know. But but as far as I know, party kit, was based on or or is based on Cloudflare durable objects. Got it.
I'm not sure if they have, like, other other kind of back ends. But you can think of this durable object abstraction as, like, a of the infrastructure that is necessary for something like party kit to exist. Right? Okay. And the reason why I sort of know a bunch about it is because single, sorry.
Sorry. I almost said single store. It's been a couple of years and I still say it. Yeah. SQL sync, uses durable objects as its infrastructure provider.
Right? For the exact same reason that party kit does. So Okay. And we're saying when we say durable objects, we're saying capital d, capital o. So CloudFlare Durable Objects.
Yeah. Exactly. So it's a durable object service at Cloudflare. Okay. And SQL sync needs that same sort of functionality because, basically, the way that SQL sync works from a networking perspective is that we have a bunch of clients.
So we have a bunch of different like, let's say, you and I are working on an app together using SQL Sync as our multiplayer system. Right? So the to do app, which I guess we can, like, link somehow and when you post Yeah. I'll I'll put it in some notes. The to do app is a good example.
So let's say we're both using this to do app. So it's a shared to do list. What's happening that what what sequencing does is it uses Cloudflare durable objects to allocate a durable object that is for the database that backs that to do list that we're both working on. Okay. And so our 2 web browser tabs, for example, are not communicating directly with each other, like, peer to peer.
It's communicate they're both communicating with that central durable object as being that sort of what what we call in sequencing terminology. I call it, like, the authoritative version of SQLite. Like, it's it's the authoritative database. Okay. So in other terms, would source of truth be right?
Yep. Okay. So let me say it back to you to make sure I have it. So it sounds like Party Kit is similar, but different. It sounds like PartyKit, primarily sends and receives messages.
Was that close enough, you think? I think that's that's a more or or rather, like, PartyKit is more general than than SQL sync. In this sense, that you you take PartyKit and that provides a nicer and, like, easier to use thing than Cloud for Durable Objects for initializing these, like, shared objects and making easy connect to them and managing all of that. And then you sit your app on top of that and you need to provide your own data structures and your own message types and you need to provide your own Got it. Uplift management and all of these these different aspects of multiplayer.
Okay. Equal sync Okay. SQL sync technically could be built on top of Party Kit. I guess it's a good Got it. Area.
Okay. So then, maybe it's more accurate to say, party kit is a little bit lower on the abstraction tree, and SQL sync is a little bit higher and handles more for you. Is that right so far? Yeah. Okay.
So then on the more opinionated, basically. Okay. Then on the SQL sync side, you also, rely on a durable object as the authority or the source of truth. And then presumably, you've got, SQLite databases in the browser that's doing some sort of, local first stuff and then coordinating with the source of truth to then have that be distributed to the other browser or browsers. Is that Yep.
Is that close? Yeah. Exactly. So so Okay. Now it comes down to the question of, like, how do you do that coordination?
And that's that's a big aspect of the multiplayer space. Yeah. Okay. So I think I have a little bit of the lay of the land architecture stuff. So how do you do that coordination?
Like, well, that sounds like honestly, this whole, like, local first discussion that has been going on on Twitter, that is just very it's very tiring. But local first seems really hard because then you have, you have 2 potential sources of truth and maybe n many if, you know, a bunch of people are connected and writing stuff to the database. Then how do you do the conflict resolution? I know there's, like, proper, you know, computer science algorithms for it. But tell me for, an accounting major, how do you do conflict resolution?
That seems really hard. Yeah. It it is super super hard. So there's there oh, man. There's so many different ways to address this topic.
And so feel free to, like, stop me and and ask me so many questions as I go. Because it's a pretty complicated and I would it's not even just complicated. There's a there's a lot that is involved with conflict Mhmm. Handling. Right?
So the first thing that I think is important to, to talk about is in in my mind, I split up conflict management to, like, 2 categories, like, 2 sort of broad categories. And I call them I don't know if this is the the best term for it, but it makes sense to me. So I'll I'll use it to explain it. Mhmm. I call them, like, physical conflict handling and logical conflict handling.
Okay. So physical conflict handling is conflict handling that ensures that the data structure that everyone sees Mhmm. Is not corrupted. Okay. You said you said physical conflict handling ensures that the data structure that everyone sees is not corrupted.
Yes. So I'll give you like a a simple example of that. Let's okay. Wait. Let me actually think of a good simple example.
Yeah. Yeah. Make it simpler. Make it simpler. Yeah.
Man, like, all the simple cases are, like, so simple they're that they're almost not that instructive. But Does does let me ask you a clarifying question. Does structure you you were you were, careful to say the data structure is not, corrupt. Does that stand opposed to something else? Is it the data that you count as logical conflict?
So physical as structure and logical as data? Yeah. Sort of. So so I guess a a way to say this is, like, for example, let's say that we have a, like, a set. A set is it's like a list without an order.
We don't care about the order. We just care about the contents of the set. Okay. So, like, let's say we have a set data structure. And that data structure is like, if you're using, like, JavaScript and you're just doing, you know, basic web app program is probably just represented as, like, a JavaScript array.
Right? Or or a JavaScript map depending on how you manage your set. Okay. The super simple data structure. And like, if we are if we are trying to get to a consistent view of that set, so, like, you and I have 2 different versions of the set, and we wanna get to, like, the same version of the set.
There Okay. Can we can we make it concrete? Can we say, I have some items and you have some items? Yep. Okay.
So Aaron has, items a and c, and Carl has items a and b. Is that a fair starting point? That's a great starting point. So I have a and So I have a and c, you have a, b. Carry on.
So given that sort of starting state, physic like, when I talk about physical sort of correctness, let's say that, use like, just we're gonna implement, like, a very simple, set algorithm. You're just gonna set send your whole set to me. Okay. So I have a b and you have a c. So you send me a a c, and I look at my 2 sets.
I have a b and a c, and I, like, merge them together. Right? Okay. And now I have a b c. I see the the full set of both of our our sets.
There are 2 things that are happening in that merging though. That's, like, that's very subtle. One thing that's happening is, like, there's an actual, like, data structure manipulation where I'm manipulating the the JavaScript data structure which is like, I have an array on the left side. I have an array on the right side. I'm iterating through the array and, like, checking to see which objects are not contained and moving them in.
Right? There there are physical operations happening. And then there's a logical operation which is when I'm merging these two things, I don't wanna end up with a b a a b c. I don't wanna end up with that. I wanna end up with a b c because it's a set.
Okay. So that's what I mean when I'm talking about physical and logical. Physical is like the actual act of manipulating the data structure. Uh-huh. And logical is like the act of enforcing business logic.
Got it. So let me see if I got this. Physical would be putting the 2 sets together. Logical would be looking at that result and saying, hey, a a is nonsense. Those were similar items in the discrete sets.
And when we put it together physically, potentially, we got a a b c, but then we add the logical part, and it's like, come on y'all. A b c. Let's do a b c. Is that right? Yeah.
That's exactly right. Okay. So okay. So so that's that's, in my mind the distinction between, like, physical and logical. And and I Let me it's like so simple that it's it's almost like not that interesting.
But let's talk about a relational database. So I think probably most people watching, your your your shows and your, you know, your channel and stuff would are familiar with, like, relational databases. But generally, you have tables, which are basically, like, sets interestingly. Mhmm. And you have tables and one really common thing that that exists inside relational databases is like a foreign key.
So you have a table a and table b and like a row in table a references a key of a row in table b. K. Right? So that's a foreign key relationship. When we are, like, when we are implementing a multiplayer system for something like a relational database, we have to ensure that when when a client so like, let's say I have my version of the database, you have your version of the database.
When we are manipulating these two databases, we have to ensure that 2 things are are true. 1 is that my my physical database never becomes corrupted. Like, if it's SQLite, SQLite has a internal file file format, store format, that's quite sophisticated. There's like b trees. There's a lot of internal pointers.
There's a lot of, like, pretty low level sort of things that have to be correct. Otherwise, SQLite will just be like, I don't know what this is. Like, it's just money You mucked it up. I can't open it. Game over.
Because this is the physical side of rep of of like Okay. Where we have to make sure that no matter what happens that whenever you or I look at our database, it's always in a correct and consistent state from a physical perspective. Seems important. That's that's state that's sort of the first aspect of multiplayer. The second aspect of multiplayer is we also want to ensure the logical correctness.
I e Mhmm. If I have a foreign key that points at, like, something else and that thing doesn't exist, that's like a logical error, for example. Yep. Okay. Love it.
And so going back to, you know, the original the the source of this question is just like Mhmm. How does SQL sync work? And how do, like, how do you implement multi player on data structures? The reason why I went down this, like, very long tangent now that I would think about it is that Hey, listen. We got all the time in the world and this is great so far.
So don't be shy. Keep going. So the reason I went down this tangent is because I actually knew it's very important when you're reasoning about multiplayer systems to think about these two things as, like, separate things that both have to exist. But Okay. You need to, like, be able to reason about them them separately.
Because achieving physical consistency may not result in achieving logical consistency. Right. And so you may need, like, another layer on top to, like, achieve that logical consistency. Okay. Okay.
So that that that's a side number 1. Mhmm. So Oh, there's more. So so now, I will talk about how I think the next the next logical thing to do is talk about how SQL sync actually does the replication aspect like the multiplayer aspect. Please.
And then I think probably from there, there might be some interesting things we could talk about about, like, other systems and how they differ from SQL sync. Great. Love it. So so SQL Sync, the again, going back to, like, my original goal of SQL Sync, I wanted to make something that was hopefully easy to reason about. Now, any multiplayer system is not easy to reason about, but I wanted to try to keep it, like, as simple as possible.
So SQL sync models the distributed state as every single every single so I'm gonna let's let's give things terminology so we can talk about them because Mhmm. I don't have any slides to show. So this is just gonna be, like, in our head. So we're gonna call the code that is running inside, like, web browser tab. We're gonna call that the SQL sync client.
So the client is gonna be like the code that's running, like, on your computer. Okay. And then we're gonna call the code that's running in the CloudFlare durable object, the SQL sync coordinator. Coordinator. Cool.
Love that. Great. Right. So we have a client, we have a coordinator, and the goal of the client is to And can I pause real fast? You could have potentially n many clients connected to one single coordinator.
Exactly. Okay. Carry on. And actually I'll clarify even one more thing, which is every coordinator maps 1 to 1 to a SQLite database. Every coordinator maps Okay.
So every coordinator has one SQLite database that is the authority. Yep. And that is part of the coordinator's purview is keep the authority, and then all of the clients connect to that coordinator that is sitting next to a single SQLite database. Yes. Exactly.
So has it within it or what? Like, as a as a web browser tab, like, let's say that you wanted to open 2 different SQL sync databases. Mhmm. In this case, you would be connected to 2 different SQL sync coordinators. Got it.
Okay. Each coordinator manages 1 SQL like databases. Cool. Love it. Alright.
Perfect. Keep going. Cool. So, so we have this coordinator up in the cloud somewhere, some serverless magical thing. And we have our our local client.
And then both you and I have a little client because we're, like, let's say, going back to the to do list thing, we're both working on the same to do list. That means that we're both connected to the same coordinator. Now SQL sync, we'll we'll just talk about, like, the basic replication flow. So the first thing that happens is, like, let's say that I add to do in my local SQLite database. Let's make it groceries.
We're sharing a grocery list. We're we're best friend roommate. So you add milk. Great. Great.
I add milk to my my local SQLite database. Mhmm. So the first thing that happens is that SQLite just straight up commits that and just shows me right away. Like, no network connection at all has happened. Okay.
So in in the client, in the actual browser, you've got a SQLite database that has milk written into it. Bingo. So so SQL syncs first and primary goal is committing that to the local state, like, durably on in the local, client. And then immediately, we're we're showing that to the to the, to the code so that the client can render it. So there's no need to have any kind of, like, you know, in a lot of, like, front end, like, react code, for example, or or view code, you would have your your, like, your database say over here and then you'd have, like, your client side cache code.
Right? So the whole idea is we eliminate that client side cache. You just you can you can synchronously query SQL sync and get back that milk right away. Okay. So you have in this in this architecture, you have achieved, local first in that you hit you type in milkenter.
Boom. It's immediately in the UI. Now it could, and we haven't gotten there yet, but it could take a second to get out to Cloudflare and to the other person. And if that fails, presumably you do something in the local UI. But local UI is immediately updated, so you have achieved this panacea of of local first that everyone is after.
Is that good so far? Yep. That's exactly right. Okay. Alright.
Carry on. What happens next? So so now SQL sync's happy. It's like, alright, great. We've committed this thing.
But it does it it does it in a little bit of a tricky way. And this is where the first aspect of where We love we love trickies. Yes. This is great. The SQL sync uses regular SQLite, but SQL sync Okay.
Tricks SQLite. And what it does is it, what I did is I implemented a a VFS. VFS Yeah. You did. Like a virtual file system.
Yeah. And so when SQLite goes and tries to, like, write to the disk, it thinks that it's, like, writing to, like, a normal file system. But my VFS code tells SQLite that it's, like, it it handles all the rights on behalf of SQLite. And it it just does some magical things, and it tells SQLite, it's all good. Like, you're good.
We have the right, you know, you can read it. It's all good. Okay. So let me let me pause there for for people who aren't in the internals as much as you are. The the architecture of SQLite is such that, you know, there's there's a bunch of stuff on top of virtual machine.
When you finally get to the bottom, it has to write to the disk. It's pretty important. And the way that that is implemented in SQLite, to my understanding, is through this there's this abstract layer called the VFS, and then there are 2 concrete implementations, 1 for Unix system, 1 for Windows systems, and then everybody has their own. 3, what's the 3rd? Memory.
Memory. Yes. Okay. Perfect. So you have these 3 that are, concrete from DRH himself, SQLite guy.
But you can totally write your own VFS, which is what I thought LightStream was, but that it's not the case. So you have written you have written some sort of, VFS that conforms to the interface. So SQLite is happy to hand you data and trust that your VFS is good, and you do something with that data. And so this is kind of like one of the ways that you can modify SQLite is you just kinda sneak in there and you're like, I'm writing to the disk now. Give me all the data.
Is that right? Hey. I hope you're enjoying this. Just as a reminder, you can learn more about SQLite at high performance SQLite.com. I'll get out of your way.
Back to the interview. Yep. Exactly. So, so my VFS presents like, if it's, just like a little aside on the VFS stuff because Yeah. Your yours might find this this sort of cool.
So SQLite VFS looks like a sort of, when you implement it, you're basically implementing like a Unix file system abstraction. Like, you have the ability to, like, create files, delete files, truncate files. Right. Yeah. I don't even think you can rename files.
It's a very simple file system abstraction. And then once you have the file handle so, like, you give SQLite, like, this file handle thing, which it just thinks that it's this, like, opaque blob. Mhmm. And this file handle has all your normal files to some operations like, you know, writing to the file, reading from the file, and so on. Right?
So, it's a very simple abstraction. And so what what the SQL sync VFS does is it actually internally maintains this it's like a, like, it's like a layered file system, basically. Okay. So it it maintains what what it calls the, like, snapshot, which is the which is a snapshot of the SQLite database that has been fully committed on the authoritative coordinator. Okay.
So, oh, that's good. Okay. So hang on. So the VFS knows somehow through communication with the coordinator, it had some sort of high watermark of this has been committed into stone and you can trust it. Yes.
And then beyond that is, like, the pending or something. Exactly. So so, like, if you think about the first time you, like, load the web page. So you load the web page, like, the grocery list app. It has no database.
But let's say that we already have had been using this database for a while, so it has a bunch of merchant list items. The first time I load that app on my, like, phone, for example, it's gonna download that SQL like snapshot. Boom. Mhmm. So it has a full copy of the database as of some, like, known authoritative point.
Okay. And so that's that's its snapshot. And that's where it's gonna start from. And then when I go into the app and I say, like, create new grocery list item, add milk, it's going to internally, the VFS, like, when SQLite goes and writes pages into the files or tries to edit the SQLite file, it's going to keep those edits to the file as, like, a separate layer. It's gonna present to SQLite, like, as c when SQLite looks, it's gonna just see one file that's just contiguous.
Mhmm. But internally, I sort of split it out into snapshot, and then, like, all the edits, like, local edits, basically. Okay. Hang on. So you have your snapshot that you can trust, and then somebody writes eggs to the to do list or the the grocery list.
Do you write are you keeping track of like, do you go ahead and write that, to the SQLite database just in like a normal form, and then write it aside to a pending changes file? Or is there some way in the SQLite database that you mark it as not committed or pending or whatever? So I do store some some information about the transaction that was executed, and we'll talk about that as well. But internal internally, like, on the file system abstraction side, the file system just keeps, like, the last authoritative snapshot, which is, like, the the base version of the file. And then it just keeps track of, like, all the edits to the file.
And these are just, like, raw edits to, like, the bytes of the file. It doesn't even know what those edits are. It just Got it. Okay. Make changes to the file.
So it's almost like a bin log. This is that right? It's not even like a log because it just it it simply has, like, 2 layers. It just has, like, snapshot, which I which is that, like, you know, version 1, and then it has, like, all the changes to that to that version as just, like, essentially, like, a sparse page map, essentially. Okay.
Okay. Alright. Keep going. Keep going. Okay.
But the reason that SQL sync does this and, like, the value of this, like, why why is this valuable? It allows SQL sync to instantaneously roll back to the snapshot. Like, all SQL sync does drop this, like, layer of, like, all your local changes, and it's instantaneously identical to the original snapshot from the server. Yeah. Right?
Okay. This is a very cool property that we're gonna need for, like, the late a later point. K. Love it. That's good foreshadowing.
I love that. Yeah. When are we gonna need it? Alright. Keep going.
So what we've what we established so far is, like, when we add milk to our Gertrude list, we we are doing regular SQLite stuff. SQLite just runs the transaction just like normal. SQLite sees like a normal file system and writes the change to the normal file system, like nothing has changed from SQLite's perspective. And all we've done internally is we've just sort of tracked. This was like the last version of the database that we, like, know is fully consistent and shared with the authoritative state.
And then here's all, like, the local changes that have happened. So now the question is, what's next? So what's next is, let's talk about transactions. Mhmm. So regular, like, relational transactions, you know, you have, like, your begin, insert 1 row, insert 2 rows, you know, select whatever, commit.
Right? This is like how relational databases work. SQL sync has a transaction system that it forces you to use. So one thing about SQL sync that is weird is you can't just write directly to SQLite. Like, it doesn't give you, like, a normal, like, SQLite connection handle that you can just use and run an insert command.
Okay. That's interesting. Instead, it forces you to and this is a design decision that's specific to SQL sync. It's not necessarily the perfect design decision. It's something I'm gonna change in the future.
But let's just say what it does right now. Okay. What it does is it forces you to write all of your transactions, in web assembly. It forces you to write all of your transactions WebAssembly. It forces who to write their transactions.
As in the developer of the app. Okay. So as a developer of an app who uses SQL sync, you write what SQL SQL sync calls, like, the mutator. And k. The mutator is essentially a web assembly blob, like a blob of web assembly code, which you can compile from Rust or other languages that support web assembly.
Okay. You write some some code, you compile it to web assembly, and this blob implements an API. And on the, like, on the input side of the API, the the side that your app sees. Okay. Essentially, it looks like almost like a normal web API.
Like, if you think about implementing, like, a rest API or, like, a graph QL API Mhmm. On, like, the input side, it's just a bunch of handlers. So for example, in this grocery list app, we would have, like like, add grocery item, remove grocery item, you know, all of our all of our, any type of endpoint in our app that Mhmm. Is changing the state of the system. Okay.
Okay. I think that makes sense. So if I'm the developer and I'm writing the add grocery item endpoint in air quotes, what am I talking to if not a raw SQL connection or raw SQL like connection? So I I guess I I was a little bit overly aggressive about that. Inside your web assembly code, you get like a Mhmm.
Completely normal SQLite connection. Got it. Okay. It's just that you have to you could only access that SQLite connection from inside this web assembly object. Got it.
Okay. So you have this concept, this this firewall concept where, this what did you call it? The mutator? The mutator. Yeah.
The mutator is allowed to talk to the SQLite database, whereas the application code must talk to the mutator. Exactly. And so it Okay. Like, if you actually look at your when you're done with building your app on top of SQL sync, when you look at the code, it sort of looks like all your mutations are just, like, hitting, like, a web endpoint. Very similarly to how they would look if you were using like a regular web server.
That's kinda cool. Okay. So maybe I need need to name these things differently. I think the mutator thing is too much. I I don't know.
I mean, there's a lot to learn, and I'm I think I'm following so far. So I think you're doing okay. So carry on. Yeah. So so basically, we have this mutator, which you can think of is just like like a really lightweight API server because it's this web assembly blob and it can run, like, because it's web assembly, we can actually just run it directly inside of the client.
So it's almost like we brought, like, our API layer, the part of our API layer that is able to mutate the database. It's almost like we've teleported that down into the client. Okay. So why must it be in in WASM? It doesn't really have to be in web assembly, in terms of, like, all of the ways I could build this.
Uh-huh. The reason why I chose web assembly is a couple of things. The first thing is that the mutator logic has to run on both the server and the client. And we'll talk about that in a little bit. But basically Both the coordinator and the client?
Yes. It runs on the coordinator and runs on to the client. And so we needed a portable, like, it can be easily portable sort of abstraction for this, like, arbitrary code that the developer is writing. I could have used JavaScript. I think that would be, like, the other logical choice because we can run JavaScript from the client, we can run JavaScript from the server.
But the reason why I went with WebAssembly is because JavaScript is harder to bundle in, like, a really clean and, like, sort of sanitized way. Boy, isn't it? Okay. That makes that makes a lot of sense. Okay.
Whereas, WebAssembly, sort of, naturally bundleable. Like, you get, like, one web assembly object, like, you're done. That's that's the bundle. And I I know that I don't have to worry about sandboxing. I don't have to worry about anything.
It just is this nice, like, opaque blob that I can treat as an API. I I don't wanna I don't wanna jump ahead, but I think I see where this is going. If if if the mutator must live in the coordinator as well. But, alright. So keep going.
So now we have we've got a a a SQLite database in the client. We have this firewall that is this web assembly, mutator. And then on the application side, the application code speaks to the web assembly thing as if almost as if it's an API, but it's all it's all right there. It's all local. It's all in the client.
And then that mutator talks directly to the SQLite database. Yep. Okay. So, if we sort of visualize what that code looks like in the client. So I don't click on, like, the add grocery button, I type in, like, milk, I hit enter.
Mhmm. That is literally going, like, in my react code or my view code, it's gonna be calling that which is like add grocery item mutator in my web assembly code. And that internally is going to run a transaction against SQLite. It's going to mutate the database and and that's what happens, like, locally. So then what SQL sync does, and this is the fun part, is that we record in a log.
We record that we ran add grocery, like, the add grocery mutation. Mhmm. And all we have to store is essentially the name of the mutator in the web assembly object. Mhmm. And then all the arguments to the mutator.
Mhmm. Right? So we just have this relatively, like, lightweight log. And this log actually could be potentially represent very large changes to the database. Like, you could have a mutator that said, delete everything.
You could have a mutator that said, like, add a 1,000,000 rows to the database. So the rep or the change size doesn't actually relate directly to the, like, mutators. Totally. That's sort of a nice property. You can think of these mutator this mutator log as essentially a representation of all of, like, the business logic transactions that happened on your app.
This almost feels like event sourcing where you're just keeping track of the events to later replay them. This is exactly event sourcing, actually. Yes. So Love it. I was hedging.
I was hedging. I didn't wanna come in too confident. I guess I guess exactly is probably not right, but it's very, very Oh, don't walk it back now. I was right. I was exactly right.
Well, I have to walk it back because I know this is going on the Internet. And, like, sometimes people like you gotta let it say exactly. Good. Okay. So basically, we record this log locally that says, okay, like, Aaron on his computer said that I'm gonna add milk and then he said I'm gonna add eggs and they said I'm gonna add whatever.
And this is just a local log. Again, we haven't hit the inner network yet. When your computer has network access, your computer is gonna connect to the coordinator. K. And it's going to send the log of all mutations you did on the database.
Okay. Just the log. It doesn't send any changes you made to the database. It sends the, like, description of the changes you made to the database, I e the Okay. List of mutations.
So to clarify, there are 2, as far as I understand, there are 2 sets of, like, uncommitted changes, we'll say. In the SQLite database, you have a high mark of what is committed by the coordinator, then you have other stuff on top of that. But that's not what we're sending. What we're sending is a stream of names of functions and function arguments to be replayed by the coordinator. That's exactly right.
Okay. Alright. So now the coordinator, is connected to both of our computers, let's say. So it's connected to my computer, it's connected to your computer, And both of us are like anxiously adding all of our grocery list items for the upcoming week because like we know we need to go to the grocery store. And so these the coordinator receives this like completely random list.
Some of the items come from Carl, some come from Aaron. And what it has to first do is it has to decide on some ordering of all of these mutations. And so the server the coordinator has the advantage that it's like a single authoritative sort of state. And so it just decides on the ordering is based on when it receives each of these mutations. Okay.
So it has its own global log and that log contains all of our mutations put together into one log, ordered by whenever it was received by the server, by the client. Okay. So there's no timekeeping in the clients? Exactly. So the client doesn't need to worry about time.
We just send the transactions as we create them, the transactions and mutations as we create them to the coordinator. The coordinator simply orders all the mutations into this global log that is per database that it manages. Cool. Mhmm. And then it what it does is it just simply applies that global log using the same web assembly object that we're running inside our clients Mhmm.
Onto the SQLite database. And so it's essentially building a new version of c the SQLite database that contains all of my my changes and all of your changes. Mhmm. And we'll talk about the well, let's actually, this is a good time to talk about conflicts, because this is exactly where conflicts actually is gonna happen. Right?
Right. So if you're offline for 20 minutes and I'm not offline and I add milk, but I remove eggs, and then I don't know what the conflict there would be. Yeah. It turns out grocery lists are really easy to do automatic confirmation. Shoot.
Let's say I change the name of eggs to eggs parenthesis two dozen, and you delete eggs, then what? Yeah. Exactly. That's a that's a pretty we can start there, and then we can add more conflict, or more complex conflicts later. Okay.
So this is where we can talk again about this, like, why did why did Carl talk about physical and logical conflicts? Like, why is that? So I just thought it was interesting. But sure. Tell me why.
It is interesting. I mean, I'm a huge nerd about this stuff. So this is super interesting to me. But because we've ordered, like, all of the mutations globally on the coordinator, and the coordinator just runs, like, one mutation at a time against against the SQLite database. We've eliminated the possibility of there to be any kind of, like, physical conflicts in the SQLite database.
Because SQLite knows how to handle running one SQLite database. Because SQLite knows how to handle running one transaction at a time. If we can represent the transaction in SQL, SQLite guarantees that it will not corrupt the database. Right? Correct.
Now this would be at odds with, like, a naive version of of multiplayer would be like, let's I'll just edit the database and you edit the database, and let's just exchange all the changed, like, low level pages, like, the actual Mhmm. Changed bytes in the database. And let's just hope that it works. In this case, we'll probably corrupt the database very quickly because we're just, like, exchanging, like, random bytes about a file. Hilariously, SQLite will actually handle, like, a lot of changes this way, without corrupting.
But there will be a bunch of changes which it can't handle. Like, for example, it moves a b tree around or it changes changes certain things. Vacuum something up on Exactly. There's a whole bunch of changes that could happen, which in which exchanging all the edits we did would would involve physical corruption. So the first thing that SQL sync does really well is that it it completely solves physical corruption by just basically guaranteeing that there's like a single writer.
There's only like, we're writing each of these transactions in order and, like, the coordinator, we we can eliminate any kind of, like, physical conflict from Yeah. And you're staying outside of, like, the realm of, danger. So you're just, like, interacting with SQLite as it was made to be interacted with. You're not mucking about with the bytes in the file. Exactly.
So then there's there's the logical conflict. So the logical conflict that you described, which is, like, basically, you change the name of a grocery list item and I delete it. That that conflict, like, needs to be decided, like, what happens in that case, meet is a decision that the the developer has to make. That is fundamentally a business logic decision. Different solutions.
Right? One solution would be the last writer wins. Right? Like, if I delete and then you edit, your edit just disappears Because it or or potentially, we want the edit to, like, restore the the item. Right?
I deleted then you edit it. We want that edit to, like, bring the item back. Right. Potentially, we want to do some kind of, like, weird merge. Although, you can't really merge, edit and delete.
But let's say we both edited the the title at the same time. The merge would be, like, let's look at the titles and then let's, like, merge them together in some way. Okay. This is fundamentally a business logic decision. Like, it's something developer has to decide and the decision is going to have a lot of trade offs.
Like, some of those decisions are gonna be much more efficient to implement than others. Like, last writer wins is probably the most efficient. Just overwrite whatever is there and Mhmm. Forward. Whereas some kind of, like, conflict detection which involves some kind of, like, three way merge or some kind of, like Mhmm.
More complex thing is gonna be more more sophisticated. SQL sync allows the developer to manage that themselves inside of their WASM, their web assembly logic. Okay. The reason we use a web assembly instead of just having, like, the I mean, we could just express the mutations as just simply SQL commands. Right?
Like, why do we need to add the web assembly layer? We add the web assembly layer because that layer is where you logically should handle mutations. Or conflicts. Sorry. And you need some sort of you need some sort of programming language to do that.
You can't do that SQL thing. Code the business logic of conflicts Mhmm. Somewhere into the system. And SQL sync does that by allowing you to encode that into your web assembly program that you write. Okay.
And so your web assembly program, which let's say that it handles, it has a rename operation. And let's say that it sees that you you as the developer want to encode the property that a rename operation should, like, restore an item back from the dead if it was deleted. Okay. So in order to implement that in SQL sync, you'd have to implement first of all, you'd have to keep around the state of a deleted item. And so traditionally, in in most relational systems, the way we do that is that rather than deleting a row, we just have a a column which represents the row was logically deleted.
Column and then also mark that as false if it was marked true. So column, and then also mark that as false if it was marked true. Okay. So hang on. Okay.
So I, as the developer, have to anticipate that one possibility would be that I rename something that you have deleted. And in my so knowing that knowing that that is true, I would have to modify my delete mutation to tombstone it or like soft delete it or something. And in my rename operation, would I, in my rename operation, always rename it and set deleted at to null, or is there some sort of conflict method that is called? Or do or all of the like, does the rename setting the deleted to null, does that handle the conflict itself? Do do you see what I'm asking?
Like, is there a on conflict callback? Yeah. So, there's no on conflict conflict. You just you should implement the conflict management in pure SQL. And SQL Got it.
Is pretty good at managing this type of conflict. In a more complex type of conflict rename the column. Mhmm. And we do it at the same time and the question is, like, we wanna get, like, a merge version of those 2 things. You would have to add some additional metadata to your to your data structures, like, to your actual tables to be able to help your mutator know when it should perform, like, a merge and when it shouldn't perform.
But the nice thing is is that the mutator is, like, arbitrary logic. It's it's it's basically, you know, just arbitrary code. And And so the idea is that, like, you can run a select query in your mutator. Like, you can you can Oh. Curve the database.
You can get back some information about it, and then you can make a informed decision based on that observation. Cool. Which would be exactly what you would do if you were influencing some kind some kind of basic multiplayer inside, like, a traditional application architecture, like, in traditional back end API. Mhmm. Okay.
The idea of SQL sync is to, like, provide a experience of, like, writing this, like, mutator layer that is more similar to, like, a traditional API on top of a regular relational database than needing to learn about, like, automatic like, data structures to implement some kind of, like, automatic conflict management. Right. So this is, like, a major thing where where SQL sync is sort of at odds with, like, other systems. Yeah. Yeah.
So if I'm understanding if I'm understanding correctly, all of the I think it's important that you said, the conflict resolution is a business logic layer. Because at that point, I I I happen to agree that some library deciding who wins may not be what I would have decided. May not be what is right for my users or my app or whatever. And so what the stance that you have taken, is that conflict resolution must be handled by the developer. And the way that you do that is by providing this stream of events that can be played locally and at the coordinator.
And I guess the code is the same, but it doesn't matter until it gets to the actual conflict resolution. Yep. Exactly. Okay. So the so just to, like, fast forward to sort of the, like, full round trip.
So so far we've, like, we've essentially both edited the database. We've both trip. So so far we've, like, we've essentially both edited the database. We've both set our, like, change streams up to the the coordinator. The coordinator has built, like, a new version of SQLite, like, the database that that reflects all of our mutations together with conflict handling and everything.
So that is, like, there's this new authoritative version that's like, this is the version that we should all see, basically. Mhmm. So the last step of this is getting making it so that both of us can see those, like, new correct authoritative versions, and continue to make changes over time. Right? Okay.
And so the way that last step works is that the coordinator sends down raw changes to be, like, underlying file that backs SQLite. So this is where we're actually sending down, like, physical changes to the SQLite file to each of our clients. Changes. Okay. So you're sending down raw changes.
So it's in the coordinator. In the coordinator, you have, you have the database at a particular state. You receive a dozen events. You play those dozen events, and then do you do a diff of what it was before the dozen events came in? Like, what the actual underlying SQLite file was before the dozen events, after the dozen events, and then you send the diff, which is just like a bunch of bytes back to each client?
Yes. Yes. We send a diff. The way we produce the diff is very efficient. So we don't have to do, like, a full diff where we But basically, yes.
Like, so SQL sync is going to on the coordinator, it's gonna collect essentially the specific set of bytes that have changed between the last time that it sent a snapshot to you and to me. And interestingly, like, those two snapshot points can actually be different. And this is a very complex thing because you could, like let's say that you came online after I came online, your SnapDoc could actually be, like, reflecting a different starting point than mine. You're totally right. And so the coordinator actually takes that into account, and it builds custom diffs for every single client to be able to fast forward them as efficiently as possible to the latest version of the So does every client send their last known state along with the stream of events or their last, sort of committed state?
Yeah. Sort of. You could think of it that way. Like, essentially, the coordinator knows, like, the last committed like, the last authoritative LSM on every single client. Okay.
And then just for my edification, are you, like, listening to SQLite somehow and seeing which pages are changed? How how are you doing that efficiently? VFS. It's always the VFS. Son of a a gun.
Okay. Yeah. But the the VFS runs on the client, runs on the coordinator. The coordinator's VFS does a slightly more sophisticated thing, which is that it it sort of tracks changes as as a, like, a log of changes, like, a a series of dips to the, say, SQLite file. And then it can use that log to be able to efficiently build, like, Delta records to send to different clients.
Of course. That's freaking that's awesome. Okay. So then at that point, the coordinator has a diff a change set that could be sent to Carl and Aaron, and those could be different Yep. But they are correct for the clients that are connected.
Yep. Then what happens? Okay. So now your client so, like, think about your computer. Your computer actually has 3 different versions of the database at that point when it receives you the diff from the server.
It has the original authoritative state from when you started. Yep. It has local changes to the SQLite file that you've made. Mhmm. And then it has remote changes that the coordinators made that may actually reflect other, like, other people's changes that you don't even know about.
Okay. Right? So you have sort of 3 different versions of of the database file. And the way that SQL sync works, and this sounds really insane, but it totally works, is This all sounds insane, but I love every little bit of it. So, yeah, tell me tell me the insane part.
Alright. So this is the insane part. The insane part is basically SQL sync performs like a rebase operation. It's very similar to like how we get get rebase works. So step 1 is we throw away all your local changes.
So Naturally. Tech like, temporarily, SQL sync rolls back the coordinator. Okay. So the it rolls back to the last, like, version that we we started from before we made any local changes. Like, the last state of nowhere is is correct.
Then what it does is it fast forwards to the the coordinator authoritative version. Mhmm. So now it now now it just has one version of the database. It new authoritative version. Mhmm.
And then what it does is it needs to basically so so locally on your computer, you have a series like, you have a list of mutations. Mhmm. And some of those mutations have been applied by the coordinator, and some of those mutations are still pending. Like, they haven't yet been applied. Maybe you made them really, really recently or they're just, like, they haven't been processed yet on the coordinator.
Okay. So hang on. You're saying that the coordinator sent me back a diff, and I've got, 10 mutations, and 5 of those may be represented in that new stream of bytes or that that new diff. But it's possible that between the time the first five were sent off and handled by the coordinator and come back, I've written 5 more. Exactly.
So to make this look more concrete, like, let's say that you add, like, eggs, and then you add, milk, and then you add broccoli. Right? Yep. So the coordinator you send, like, just because of, you know, everything is happening concurrently and in parallel. So it's all happening at the same time.
So you send to the coordinator eggs and milk, but you haven't sent broccoli yet. So broccoli is still sitting there in your, like, outbound queue. Right? But you you know that I have all these 3 mutations. Like, these 3 mutations matter to the user.
I I can't I can't suddenly make, like, eggs disappear. I have to continue to show it. So we send those 2 mutations, eggs and milk, to the server. The the coordinator applies those mutations and then sends us back a diff of the database. And that diff of the database now may include eggs and milk.
But you don't know if it does. Mhmm. So Right. Because it wasn't what was sent back was not mutations. It was a a diff of a underlying file that is And it's basically completely opaque.
Like, we don't Yeah. Look at this thing. We just sort of treat it as, like, bytes. We apply it to SQLite. And then the question we have to ask the question is, like, of those 3 mutations that we had, we have eggs, milk, and broccoli.
Were any of them, like, are any of them offline? Like, are any of them committed in the authority version? And the way that SQL sync does this and this is an idea that I camp with at single store, when I was, like, working on some some, projects that they have there, and and it applies really well to SQL sync. So that what this is is that SQL sync on the coordinator, when it applies mutations, the mutations are editing your database. Right?
Like they're they're adding milk, they're adding eggs to your database. In addition, SQL sync implements, like, adds a hidden table inside SQLite. It's a regular SQLite table. Yeah. And this table is has one row per client that has ever worked on the database.
One row per client. Okay. So both, like, we're both connected to the same database. That table will have 2 rows. 1 represents Carl, 1 represents Aaron.
Mhmm. And that that row just simply records the client ID and the last like, the highest sequence number from that client. Like, the highest mutation ID from that client. So importantly, every mutation has a an increasing ID. A monotonic ID.
Yeah. Monotonic. There it is. Okay. So you I send off IDs 1 through 5, and at the coordinator level you update the last one that I saw was 5 for Aaron.
Yep. Then you send me back the diff which includes the secret hidden table. And so then if I'm jumping ahead, the diff is applied. So all my local changes, to the actual database are thrown away. My mutations are hanging out.
You apply the diff, then I'm guessing we inspect the secret table and say this is committed state at the coordinator. You can trust it. The last thing that the coordinator the last mutation that the coordinator saw was number 5. So we'll truncate anything below or 5 and below and then replace 6 through 10 locally and send them off. Bingo.
Hot dang. How about that? That is pretty freaking cool. And that is a full end to end multiplayer relational database. Wow.
That is very awesome. I must admit. And it's, I mean, it's not simple, but it's, like, straightforward. It's pretty straightforward. That's a Yeah.
That's a that's a good I don't know anything about these systems, but that's a good one. That's a great one. Yeah. What I like about it is that it's like I wanted to try to keep this again, like, my goal is I want vanilla SQLite. Right?
Sure. So if it's vanilla SQLite, I can't change how SQLite represents, like, its internal data structures. Because if I do that How would you want to? Yeah. Right?
So if I I want completely vanilla SQLite, I need all of this internal related structures, how it source data, how it manipulates b trees, how it does all of that management Right. Right. Has to be vanilla. Unchanged. And so I knew by definition, like, I couldn't manipulate the, like, physical layer of SQLite.
So I have to keep that. And then the second thing is is, like, how do you implement conflicts? And, you know, you can do logic logical conflict management where you actually add a layer between the, like, the the user and SQLite. And you have and basically, you have to process all of the, like, rights yourself, and you have to basically logically do, do your own sort of, like, rights to your own tracking data structures and then forward them SQLite. Like, you act it like as like an intermediary between the client and SQLite.
And they didn't wanna do that either because I wanted to expose, like, a raw SQLite transaction API to the user. Like, if you can if you wanna use a SQLite extension, you can install into SQL sync and use the extension. Like, there's nothing about it that knows literally anything about your data. It's completely opaque to SQL sync. And that was, like, that was the goal.
So I so I basically built all of the things all of the trade offs that SQL sync has and all of the the things about it that are interesting and unique are all driven from the fundamental goal of, like, completely vanilla SQLite. Like, your app should mostly feel like working with regular SQLite. This is, this is something I feel like keeps coming up, when talking to people about SQLite is they don't necessarily want to touch SQLite. And I feel like a big part of that is, 1, SQLite is very good. But 2, SQLite is incredibly stable.
Yeah. And so to get in there and start dorking around with the pages yourself, that I mean, frankly, that seems very risky to me because, you know, they've spent, what is it, 24 years now making it super stable. And so it seems like a lot of people are coming at it from the outside. Like, you've got your you've got your VFS. LightStream has its, like, little sidecar that's, like, keeping a transaction open and kinda sneaking into the wall file and mucking around with that.
And I really, like it seems like, it almost seems like a a fun challenge. Like, how can I do everything I want without opening this black box? Yeah. And it sounds like you did it, which is pretty awesome. Yeah.
It did feel really cool. Like, SQL sync, I'm really proud of. I I think it's it's a You should be. Okay. So tell me, SQL sync is fully open source?
Mhmm. Okay. And what is the plan like, what's the plan for, so is is everything you described where you're at right now, all that's handled, all that's done? Yep. Yes.
Okay. So what's the plan moving forward? Yeah. So SQL sync as of, like, right now, it works, pretty well, But it doesn't scale very much. Like that's sort of the the fundamental, like, limitation of it.
One of the biggest limitations of SQL sync today is that it requires every single client and coordinator to have a full copy of the entire database. And so it works pretty well for small databases, like, in the, like, megs size. It works great. It it's super fast at replicating state. It it can, you know, you can handle, like, relatively high amount of mutations.
It's a fairly good system for just small databases. But Okay. If you scale up to, let's say, a gigabyte, that's you just automatically a note non starter based on the the need to have to sync the entire every single device. So the limitation there is sync time. Is there a limitation of SQLite in the browser?
Yeah. I mean, browsers are pretty limited in, like, how much they can store. Like, browser tabs, you have you have memory limits that you're allowed to certain amount of memory you're allowed to consume as, like, a JavaScript process or, like, a web worker process. And then there's also, like, storage limits. So if you're storing in, like, IndexedDB or if you're storing an OPFS or you're storing in, like, any of the different ways of storing data inside the browser, all of those have their own limits.
And those limits get pretty bad pretty quickly. And generally, like, I try to avoid ever putting, like, more than, like, half a gig of data in in a browser tab. Once you get higher than that, like, there are ways you can get higher than that, but it's it becomes a lot more iffy and sketchy. And it's almost at that point, you're like, you might as well build a desktop application where you can have, like, dedicated storage, and you can actually work with the user. Because there's also the aspect of, like, most browser storage is not even, like, known by the user as to, like, where that storage is.
Right? Mhmm. It's treated as more of, like, a cache layer than, like, a durable storage layer. When you're storing, like, database state you potentially don't wanna lose. You may wanna actually know where that is.
You might wanna be able to back it up. Like, you might wanna be able to access out of band. So long story short, the current state of SQL sync doesn't really scale due to that limitation. Mhmm. It has a couple other things I'm not super happy with about it.
Those are very, very technical, like, very specific to to how I implemented it. One of the things is, like, I'm I'm not super happy with, like, using web assembly. Web assembly is great for systems people, like, for for low level programmers, for people who are willing to go get their hands really dirty. It's pretty, like, fancy at this point. It's also great for Rust programmers because it actually Mhmm.
Has like a nice user experience in Rust. But pretty much everyone else is sort of sucks. So I'm thinking about I'd like to have, like, more of, like, a JavaScript experience because I think JavaScript has it's just way easier to integrate into your app if you could just write, like, a JavaScript mutation as opposed to having to, like, build a web assembly module just to implement a mutation. So those are some of the the changes that I wanna do. But the short term change, like, the thing that I'm working on right now.
So basically, I released SQL sync back in, like, late last year. This year, since January, I've been working basically full time on a new project that I'm calling graft. K. And so graft is a storage engine that k. Will power SQL Sync.
Okay. Go on. And yeah. This is the first time I've actually talked about it. A storage engine that I'm building.
What is the storage engine? Like, it's essentially it it let it gives you sort of a file like abstraction. Like, it's here's a here's a blob and it lets you read and write bytes to that blob. Okay. But it it implements the way it reads and writes bytes to that blob.
It internally uses these data structures that are very efficient to synchronize over the Internet. And so for example What what does this replace in SQL sync? It replaces the VFS layer that I implemented, basically. Okay. So the VFS layer in SQL sync, works reasonably well, but it, it was implemented in a very, like, simple way.
Like, we just simply have, like, a layered file system. We track all the edits to the file, like, at every single transaction point. Mhmm. And so it has a lot of duplication over time. Like, if you edit the same page multiple times, for example, you'll store, like, multiple copies of that page.
But the clients don't actually need to care about every single version of that page. The client literally only Right. Be able to fast forward to, like, the latest version of the page Right. When the client synchronizes. So graph is a it's a sort of special purpose storage data structure that is really efficiently designed to do fast forward replication.
So it's that's that's its whole thing. And I'm building it as, like, a low level library that is independent from SQLite and from SQL thing. Because it itself is actually just a useful way to basically say, like, let's say you have a file in point a and you wanna have that file in point b, graph will do that for you. It will copy it. But where graph differs from something like, let's say, r sync, is that graph will continuously replicate changes to that file forever.
And it will do so using a close to minimum amount of bandwidth. Okay. So that's what I'm trying to that's what I'm building now. And that that will eventually power SQL sync. And also I'm experimenting with using it to do some other really cool things.
One of the ideas I had was, in the, like, data science and, like, AI model space, being able to, like, efficiently move, like, portions of AI models and, like, specifically, like, training datasets or, like, weight files between different machines and being able to only you only need to move, like, what you need right now. So, yeah. The the other property of graph that's cool is, like, it does fast forward replication, and it also does lazy replication. So I can you can when you're reading from graph, you can choose to only read a subset of the file. And so in SQL Think land, what this means is that I'll be able to run 20 gigabyte databases and have them run inside your browser tab.
Okay. So hang on. Let me let me let my brain catch up here a second. So you currently have this VFS that sneaks in, keeps track of stuff. You're writing graphed.
Is that does that sit next to the SQLite file, or is that what the is it like a file system itself? Like Pretty much. I I don't fully understand how it works. Yeah. So you can think of graph as providing, like, it's it's a it basically provides you a file system abstraction Okay.
Without the it doesn't provide the, like there's 2 parts of file system. There's there's, like, the directory metadata tier is the thing that implements, like, directories, file metadata, all of the things that you think of as, like, when you go in, like, LS inside your Unix terminal. Mhmm. That's the, that's, like, the what I think of as, like, the metadata tier. The graph doesn't provide that.
But what graph provides is it provides, like, a file back end. So if you Got it. If you create a file and then you back it with graph, the graft will provide the, like it will store the bytes of that file and it will add all these superpowers. Like, you'll be able to, like, replicate the bytes of those file somewhere else, and that somewhere else will be able to only pull the bytes that it needs of that file. It won't have to read the entire file.
Man, I thought SQL sync was, like, wizard stuff. This is I feel like this is next level. This is wild. Because at that at that at that yeah. I guess.
At that point, the whole, like, you're opening yourself up to physical corruption. Right? Because you're just, like, slinging bytes around, aren't you? Well, so Graph has a property which is that it's single writer. So Graph does not implement any kind of multiplayer layer.
It literally is just like, I have a writer at point a, and then I have infinite number of readers. Got it. I want all of those readers to see the same version of those files Got it. Okay. And I want them to be able to update themselves, like, to play this version of the file very efficiently.
And I want them to be able to, stream the version, like, the the file as it changes, and I want them to only have to pull the portions of the file that they care about. But it doesn't implement any kind of multiplayer layer. So it will it will implement the like Okay. Coordinator to client synchronization aspect of SQL sync. Got it.
Okay. That makes a lot of sense and is very very cool. So at that point, if I'm following correctly, at that point you've got graphed at least on the coordinator. And the whole the whole rest of the system basically stays the same more or less, and you're sending streams of events. And the difference is when in the coordinator, those are written to the underlying dot SQLite file, graft is in there.
It's got its hooks in that file, and it knows the bytes that were changed. And it just it just sends those bytes out in a more efficient way than is currently happening with, SQL sync. Yep. Exactly. Cool.
So cool. Man, that is wild. Yeah. So that's what I've been working on for the last, almost 6 months, I guess. And I I'm getting hopefully close to the point where I can actually have something released with that.
But that this this project, like, SQL sync was about 3 months of, like, full time work. And this project is probably about twice as big as SQL sync. So I'm hoping to pick it up in, like, the next month or 2. Wow. Cool.
And so one thing we didn't touch on, but it sounds like SQL Sync might be great for desktop apps. Is that right? Yeah. It's it's another area that I wanna really explore. It's like desktop and mobile apps.
Yeah. Like, it it it I sort of originally designed to really focus on, like, the browser, but there's nothing really apparently specific about it. The nice thing about SQL sync being pure SQL lite is it sort of works wherever SQL lite works. Right. Yeah.
One thing I am starting to experiment with though, which will make it even better is like, in SQL sync, there's there's 2 layers of SQL sync. There's the SQL sync wrapper, which implements, like, the web assembly logic, and, like, the multiplayer, like, rebase logic and stuff. That's outside of SQLite. And then there's, like, the part that's inside SQLite, which is, like, side of SQLite. And then there's, like, the part that's inside SQLite, which is, like, the VFS layer, the physical layer, stuff like this.
With graph, one of the things I'm experimenting with is building it as a completely native SQLite extension. So my current, like, graph test system that I use, actually just uses, like, I use brew install SQLite. Mhmm. And then I, like, install graph into that SQLite, and I can run Just like dot load Yep. Graphs?
Yeah. The rules. I'm going to the point where, like, you'll actually be able to drop graph into, like, iOS native SQLite and just add it as an extension or you could drop it into, like, wherever, you know, bun SQLite. Like, whatever implements native SQLite, you should be able to drop graft in. And suddenly you get basically, like, graft will implement just the physical tier of SQL sync.
So it will give you, like, one way super high performance replication. And then then SQL sync will be implemented as, like, another sort of layer that will be very, very lightweight. So you can add it directly on top of SQLite without having to edit SQLite. So I'm trying to build this in sort of, like, a more modular way, so it'll be easier to Mhmm. Port SQL sync to more places.
Wow. Dude, this is awesome. I I I knew that, like, SQL sync was cool, and I played with it in the browser. I was like, oh, this is really cool. This is blowing my mind.
This is very, very cool. Oh, I'm glad you enjoy it. And and thanks for getting me to, like, open up and talk about it, because I I realized I've been bottled up in me So it's it's fun to chat about. Yeah. I I had the easy job here.
Okay. So, two things. What is next for SQL Sync and Graph? I know that, what's next for SQL Sync is Graph, but, like, where's Carl in a year or 2 years? Is this does this become a a hosted offering?
Does this just bolster your credibility? Like, what's what's the goal for Carl here? So my immediate goal is make something that's useful. Like, I don't really care about, the the next steps as much as I care about making something that's actually useful and not just like a toy. SQL Sync as of right now is more of a toy than something that, like, I would be comfortable someone with someone building, like, a product on top of.
So my goal right now is to make something that's, like, I could really stand behind and say, I would use this to build a real product. And I knew we actually I'd like to build a product on top of it. Like like, make something real that that people can play with and use that's more Sure. More than just a toy demo. So that's sort of the the medium term goal.
Well, short term goal is finished graft. Medium term is, you know, something useful that is is interesting. And then, like, we'll see where that goes. If it goes towards people are like, hey, this is really useful and I wanna build a product on it, then I think I wanna start a company that's focused on that. Like, just completely making it as easy as possible for people to build interesting things on top of that technology.
Cool. If it's sort of like people are like, well, it's useful but maybe there's other solutions or the the market's too saturated or whatever, like, you know, if it doesn't make sense, I think that there's a lot of ways I can either pivot this or or sort of go into, like, a slightly different direction. One of the reasons I'm building graph the way I'm building graph is because I don't wanna pin myself completely into, like, multiplayer sequel light. I wanna be a little bit broader. And so because graph is just a storage engine, you can use it to do a whole bunch of cool stuff.
Like, if you're just replicating, you know, cool, like, model training data, for example, like, you can use it to do that. So I'm trying I'm trying out a little bit more of an experimental, like, breadth first approach. And then hopefully get some feedback later this year once I launch some stuff open source. This, this aligns with one of my deeply held beliefs that progress creates progress and motion creates motion. And, like, you started with SQL sync and it's, like, very good.
And then you realize, hey, there's this other thing that is needed. And it's potentially a lot broader and a lot bigger, but you never would have gotten there without following your curiosity in the first place. Exactly. And I've always, yeah. I mean, I I really like like, you you've been a proponent of, like, just, you know, you have to build your own luck.
Like, you have to Mhmm. Do that. And I I believe in that, like, very strongly. Like, I think by by pursuing what I find most exciting, most interesting, and taking it very seriously, getting really nerdy about it. A, I'm like learning.
B, I'm getting myself in a position where I can help others. And then c, I'm hopefully building something that I would wanna use myself. And if I wanna use it, then maybe someone else would wanna use it. Sort of my thought. God, I love that.
I'm so happy for you. This sounds like a lot of fun, and this sounds like purely within your zone of genius, and I'm just so freaking pumped for you. Very kind of you, Aaron. Thank you. Yeah.
Well, thank you for telling me all of this. Tell people where they can keep up with you, where they can keep up with SQL sync, and where they can hopefully, soon in the next few months learn more about graft. Yeah. I mean, so SQL sync dot dev, is like the main host URL for SQL sync. There you're gonna find like a blog post I wrote that goes pretty much in detail about a lot of the rationale behind SQL sync, how it works.
You'll also be able to find the to do list demo, which I encourage you to play with, try to break it, see what happens. And then, you'll also find a link to my, the, like, SQL sync, Discord server. That's probably the best place to go. Like, if you're interested in any of the sci fi talk to, you wanna chat with me, go on the Discord server, ping me, you know, I'm always happy to answer questions, Always happy to find more people who would like like nerding about these topics. Mhmm.
And I'm happy to chat about it. Awesome. Well, thank you again for your time, and it's just always so fun to talk to you. So thanks for doing this. Same with you, Aaron.
It's nice to see you. Yeah. You too. Alright. We'll talk soon.
Yep. Talk to you later. Bye.