Production SQLite with Turso and libSQL

Alright. We are, back into I I don't know if to call this a podcast because there's actually no podcast. These just go on my on my YouTube channel. But I've been talking to a lot of people in the SQLite world, and now we get to talk to the CEO of Terso. So Glauber Glauber Costa is here with me today, and we're gonna ask a a lot of questions about SQLite and LibSQL.

But, Glauber, do you wanna introduce yourself first? Aaron, thank you so much for having me. What a pleasure to be here. Happy to introduce myself. I'm I'm Glauber, and I've been in the software space one way or another more or less into since the year 2000.

That was when I started get getting interested in, you know, computers, software, things like that. In fact, the first code that I wrote, there was not a Pascal code to calculate the date of Easter because that's the kind of stuff we're learning at university, was was PHP in 2,002, something like that. So I've done I've done, like, way back then. Everybody keeps telling me that PHP is nothing like what it used to, and I'm just getting started, understanding a little bit better of the modern PHP. But be it as you may, like, the thing that really, that really ticked my, you know, and that I really liked was low level systems programming.

Mhmm. And I also took a very early liking to the then crazy idea of just write software and release it for free, in open source licenses. You know, today, this is how everybody does it. Back then, it was this completely coco idea. So the in in the intersection of those things, there is Linux.

So when I I was learning some c's, so I started looking to the Linux kernel and trying to understand a little bit more how the operating systems work. So I started contributing to the Linux kernel, as a volunteer with and when I say contributing to the Linux kernel, don't think too much of me back then. It was really some, like, tiny couple of things here and there. Around 2004 and then in 2005, I just got hired by Red Hat. So I had a career doing this.

I worked in a lot of things in the Linux kernel, storage systems, file systems, virtualization. Virtualization was the largest part of of my tenure and joined a startup in 2013 that initially was doing something related to operating systems, but then pivoted to a database company, and that is Cilla today in the NoSQL petabyte scale, space. That is how I learned about databases. I didn't really have any exposure to databases before neither, nor as a user, by the way. So because I never really built applications except in the very early beginning, I was just, like, working in the kernel.

I was actually not a user of databases. I I just did not know how to I I still don't know SQL that well. Although ChatTPT knows it extremely well. So I rely on on ChatTPT to help me whenever whenever I do need. But for 8 for around 8 years, so from 2013 to, essentially 2021, I was at CELA.

And here now, I had a 1 year stint at Datadog, but clearly realized that Datadog was, by the way, a fantastic company. I appreciate them a lot. But I really like to be in small teams and building things and getting things done from the ground up. So founded a startup, and here we are. That is quite the journey.

I'm glad to hear I'm glad to hear you're one of us. You're a PHP enthusiast. Miss, forget you haven't touched it in 20 years, but you're a PHP enthusiast. We'll claim we'll claim you nonetheless. Yes.

Exactly. And then, yeah, that's, quite a journey from not being a database user to basically writing a database and now being a CEO of a database company. So You you know Dax from Twitter? Yeah. Of course.

Everybody knows Dax from Twitter. Everybody knows Dax. So it's some people like him. I love him. You know, some people don't.

But, like, I was I was talking to him the other day, and then he's he was telling me that his career path was unorthodox because he learned NoSQL before he learned SQL. And I said, dude, I wrote to NoSQL database before I learned any of them. Yeah. Yeah. Those are both pretty pretty unorthodox.

I thought my path from being an accountant to a programmer was unorthodox, but I think writing a NoSQL database before learning, yeah, was more unorthodox. So The lesson the lesson here the lesson here, Aaron, is that there is no orthodoxy. Right? We're all, like, all carving our own paths. Yes.

That is the lesson here, and something I like about our industry, by the way. So you caught us up to, you just you left Datadog after a year wanting to, like, be on a smaller team, and that was, I think you said, 2021 or something. So now we're in 2024. You're CEO of a company called Terso. Terso.

You have, you're the steward of a SQLite fork called Mhmm. Libsequel. So catch us up between 2021 when you you're like, I don't kinda wanna go be part of a smaller team and how we got to forking SQLite. So I always liked, being in small teams. So that that is something there was always something that I appreciated, and Red Hat became a massive company.

But when I when I joined Red Hat, it wasn't really. So when I joined Red Hat, I think it was around 500 people, which, again, is plenty big. But if you will compare if you will compare to the options, like, I I did an internship, so I did have, like, a small internship at IBM. Like, there's no comparison, although now they're 1 and the same to some extent. Like, especially back then, like, IBM had 100 of 1,000 of people.

And then you look at Google, Facebook, like those companies, they're hyperscalers. You you have your head count in 100 of 1,000. And I never liked this very much because, you at this point, unless you're one of the top executives, which you don't start your career as, you are a cog in the machine. And I never I I always wanted to be a creative. I wanna I always wanted to be a person pushing things forward.

So this thing about, like, just go do open source and do whatever you want was very attractive to me. And Red Hat had a culture like that. When I left Red Hat, Red Hat was something like 4,000 people, and it felt too big for me. Right? Yeah.

So so I always like that. I just thought that I needed to just go at this point in my career to be in a large company like Datadog and see how it is. And there were a lot of things that I liked about it. I'm not gonna lie. But Pekka and I Pekka is my cofounder.

And we always even back then, like, I know Pekka for 15 years, his career path is pretty much almost exactly the same as mine. He we worked together in the Linux kernel, and then we worked together at Sila. We always wanted to start a company together, and we knew that, like, what do you wanna do? I don't know, but let's do it together. That that that was our mindset.

I think pretty much like like you and Steve. I mean, something that that I appreciate a lot seeing the dynamic that you have between you and Steve. Except that Steve sounds like a decent person. Pekka is a freaking asshole. I I don't I don't like Pekka very much.

But but but the thing is I can move get Pekka on next so he can defend himself. No. No. But but, like, we we have this love hate relationship that we will we're always we're always like that and always, like, for some reason, we're always throwing punches at each other. But secretly, he likes me.

I mean, it's not mutual, but I think he likes me. But but No no wonder you vibe with Dax from Twitter. Y'all y'all have the same the same funny, sense of humor. Hairstyle. He comes from the hairstyle.

But but but be but be it as it may, I think that by the end of 2021, we were both ready to embark on that journey. Mhmm. When I left Scylla, which was, beginning of 2020 late 2020, I wasn't really, and partially because you all remember what happened in 2020. I thought it was just a lot of risk to just go do a start up. I just had a kid, recently at the at the time.

I have 3 now, and the world was completely bananas. We never nobody knew what was going to happen. So I just felt like at that time, starting a company was not the best career path for me. Maybe it would have been. I just couldn't take the risk.

But late 2021, I think Pekka and I were both ready to do it. And initially, Pekka was also not on board, but he came to a point that that he was. And then at the in September 2021, we started this company. And the company was called Chisel Strike, and was a project that was in the data space, but it was not a database. One of the things that Pekka and I were always talking to each other is, like, hey.

We don't know. We just know that we're gonna start together. We don't know what we're gonna do, but please let's not do another database company because that sounds insane. Mhmm. Especially especially after 8 years of Silap.

Let me tell you, if you're listening to this and if you plan to start a database company, maybe don't because it is really hard. It is it is really hard. And and the reason it's really hard is just there are so many like, look, if I may if I may, I thought that database there were too many database companies. Mhmm. And because of that, I do not wanna start a database company.

But then we we got acquainted with the JavaScript ecosystem, and the JavaScript framework. So, you know, maybe starting database is not that bad of an idea because they're more way more JavaScript frameworks than database companies. It's true. It's true. But we were using SQLite.

We were using SQLite very, very heavily in our project. So the project was essentially, I think you could have it was not our idea either, but it was fair looking back to frame it as a JavaScript framework Okay. To write data applications. So that was that was the idea. And our idea was, look, embed SQLite to a runtime.

So you would have something like Deno running with SQLite right there, and and you go deeply in those technologies, change them if you need so that they essentially become 1 and the same. And then you can write type you can just write TypeScript that is back like, persistent TypeScript. That was the idea. So you write an array, and that array is persistent. You don't mess with databases.

You don't do anything like that. So that was essentially the idea of the Chisel Strike project. So we started doing this and SQLite was the obvious choice for us because it's pretty much the only database that you can hit this level of integration with with anything, really. We did this for around a year. And in October, around September, so exactly 1 year, around September in 2022, we were growing frustrated because we saw a pile of bodies of people who tried to contribute code to sequelite.

One of them, and I don't know if you have managed to hit the author of that. You'll be a great guest for your the Internet about it. There was a project called DQ Lite Mhmm. That we're doing things very similar to what we wanna do. Because think about it.

It's a it's a Java it was like a JavaScript system. Because think about it. It's a it's a Java it was like a JavaScript system to write persistent applications. Then you need to deploy this thing in some somewhere and you need to distribute data. So one of the problems that we wanted to solve was how to do distribute data with SQLite, at least something that scales horizontally.

And the architecture that we had in mind was very much like what dqlight had. And then we saw that they'd actually tried to contribute all their code to SQLite, and the answer was just no. Not not gonna happen because, you know, this our this project is not open contribution, which, by the way, I wanna make it clear, I fully respect. I don't think they're doing anything wrong about that, which is a miss I wanna clarify this point because this is something that I understand that some people has as a misconception that I don't like this. Now this is not the way I would handle a project.

I come from a different culture. I come from this background on the on the Linux kernel where the Linux kernel was essentially this one big party in which Linux Doorvalds essentially said, this is never gonna run on anything but my PC. And then 30 years later, it powers everything because people came with those requirements and said, let's stretch it in this direction and in that direction and in that direction. So this is how I I would run a project, but I don't think there's necessarily, and there is in fact some merit to what SQLite does. And there are some almost everything in life, there are advantages and there are disadvantages.

So you can't be too strict and say this is the way, this is not the way. But but but for us, we were really just like, look, the changes that we wanna make to SQLite are quite they can get quite intrusive, especially, like, if we go down this path, it it it can get quite intrusive down the line. I don't think we wanna be maintaining, like, just this private fork because there's a lot of problems with that as well. Like, there there are trust issues because then you're not running real SQLite and people wanna understand, like, so do I trust this thing? Do I don't trust this thing?

We also saw a lot of people who have done forks of SQLite before, which by the way is my understanding that the author of SQLite at least externally encourages in the sense that you don't like it, just fork it, which is the open source way. I'm not saying that he's happy or unhappy about that, but, you know, that that seems to be the the, at least, his stance from for what I'm seeing. We saw a lot of forks of cColite that were just cColite plus this one feature. So cColite plus encryption, cColite plus this, cColite plus that. And this also didn't resonate well with us because that's not, you know, that's not how Pekka and I built communities in the past.

And the number of people that you can find that are, like, SQL like that interested in this niche thing that you're doing is usually Mhmm. Pretty small. So we had this idea, like, what if we just fork SQLite as a whole, and create the open contribution version of c.colite? And we did it. And the the funny thing is that we had a Pekka and I met.

This wasn't our first idea. Our first idea was let's rewrite this in Zig. Of course. And and Of course. Look.

10 years 10 years ago, I think that is probably what we would have done. Uh-huh. But we're getting too old for this. Yes. That's exactly how Cielo was born, by the way, just rewrite an existing database in c plus plus.

No way? Yes. Cielo was a rewrite of Apache Cassandra in c plus plus. That that's how it started. Yeah.

With with a different architecture and and etcetera. And it's not that it doesn't work. I mean, it can work, but we just I I think we had PTSD through the experience. Yeah. There are a lot of problems.

Like, it it still is you know, it's been more than 10 years, almost 10 years now. And there are still some some small compatibility issues, some things that work in Cassandra that don't work at Ciela because you're gonna be chasing them forever. So, we thought that, hey. Look. I mean, we we are in a point of life that we just wanna ship stuff.

We don't wanna spend, like, 2 years just writing something from the ground up. We've done it, like, you know, just the and then we don't wanna be chasing compatibility issues forever. So let's start let let's just fork it, and then we add the stuff to and then the second thing, it was like, okay. So what is the minimum amount of code that we need to write to show that, hey. This is a fork of SQLite and and, you know, that's what this so what what are the things that we need to do to get people interested?

Because we really wanted to create a community of people that would come and build with us. That was our main goal. I said if everything fails, but we build a community here of developers that are building this thing together, we succeeded. Right? Just a now my kids are not gonna like that I succeed like that because I'm gonna be poor, but at least, you know, internally, I'll be happy.

Or at least I can claim that that I'll be happy. I don't know. And the answer to that question, what is the minimum amount of code that we needed to write to prove that, you know, and to show the direction that we wanted to take after maybe 2 or 3 days of deliberation, we got our answer. The minimum amount of code was 0. We just did not write any code.

We wrote a letter essentially saying, this is what we intend to do. A manifesto. Pretty much a manifesto. And and we did it on purpose because we we thought that, hey, if we it doesn't matter what we claim. If we write replication code, which we later did, we're gonna be known as the SQLite plus replication.

Right. If we write whatever encryption code is yet another Ccoli plus encryption. If you write this code, we're gonna be associated with whatever the first thing it is that we put it out there. And and look, the reason we're doing this really is that we wanna extend this thing in all possible directions and and we wanna hear from the community and we wanna give the community an opportunity to extend the way they like. So let's not write any code.

Let's just write a manifesto and and see what happens. If if people hate it, they hate it. If people like it, they like it, then and then we'll see what happens. And then a lot of people on Hacker News actually accused us of being all talk and because we didn't write any code and etcetera. But by and large, we had a positive reaction.

And we got 1500 GitHub stars in 2 weeks. Wow. And, you know, we decided eventually that, hey. Look. I mean, we don't really wanna run a database company because that sounds crazy, but it seems like there is a lot more here.

And for comparison, Chisel Strike at the time had a 1,000 GitHub stars after 1 year of work. Mhmm. And and it's not that GitHub stars are the most important thing in the world, but they are a proxy for, like, the the amount of interest that this thing generates. In 1,000 GitHub stars, I mean, a lot of projects will never hit it. So it's not even like a small number.

It it's a decent number for a company that has been doing this for a year. It it's it's plenty good. But we had in our hands, like, this project here. And and then this other thing that we just released and in 2 weeks got 1500 GitHub stars, like, which is 50% more. So it's pure interest with no code changes.

So the last part of the puzzle was, you know and by now, there was already November. The last part of the puzzle was just back and I say, okay. So what is the business here? Because we are, after all, a company. A fork of is not a business, so we need to figure out something that is a business.

And then we came up with the idea of Turso. We coded a private beta of of of what Turso is today, essentially between November February, like, December to February, put it out there, and the rest is history. It was a huge success. So at that point, at that point, you guys was at that point, you guys was it just you and Pekka at that time doing, chisel strike? No.

No. We had a we had a, you know, we had raised a fair amount of money, and we hired we hired a bunch of people. So there were a lot of and there was a very uncomfortable time with the company because nobody truly knew. And we were very upfront when we saw the the results of that and say, hey, guys. Look.

I mean, just, it feels to us, like, this is a better direction, but I don't think we're ready to pull the trigger until we see a couple more data points. So couple of engineers kept working on the Chisel Strike tasks that they had. I said, I don't think we're ready. You know, it's also we discussed. I mean, can we let's just get the whole company to come and and do this new thing for 3 months just to but it can also be counterproductive because none of us truly knew exactly what we wanted.

So Pekka and I really just had to sit down and say, okay. This is the direction that we figure out we wanna try to go. So it was not going to be very productive to just bring the whole crew anyhow, but it was an interesting transition period at the company because people didn't really quite know what was going on. But by February, I think it was obvious, it was fairly obvious because once more in February, we put this landing page out that just said SQLite at the edge, sign up here, and we got 500 emails, right, just on on a landing page like that, plus the GitHub stars and and and people people started coming to the Discord community. So what what more validation do we need?

Right? Just let let's just Yeah. People the whole company. So around that time, we just completely abandoned Chisel Strike and and went all in on on. Hey.

I hope you're enjoying this. Just as a reminder, you can learn more about SQLite at high performance SQLite dot com. I'll get out of your way. Back to the interview. Okay.

So you and Pekka and the company are working on Chisel Strike, which has some sort of dependency on SQLite. SQLite. And you're starting to look at you're starting to look at SQLite, and you're like, man, this is kinda frustrating. We can't land any of these changes that would make Chisel Strike faster, easier, better possible, any of the above. And so you're looking at SQLite, and you're like, this is kind of frustrating that they're open or they're source available and not really open source.

And like you said, that's neither here nor there, neither good nor bad. It is reality though. It is. Like, if you're trying to land if you're trying to land changes and you can't, well, then you're kinda hosed. And so you and Pekka are looking at it, and you're thinking there might be there might be an opportunity to do this thing that we've never wanted to do, which is do a database.

And so Yeah. You you fork the database, you fork SQLite, and you write a manifesto. And the manifesto just sets out your vision because the if you can't sell the people on your vision, then there's no sense in spending all this time writing code that people That was still the intention was not your people the company towards it because, the the thought was, obviously, we wanna create a successful community because everything you try to do, you should try to be successful at what you're doing. Yes. But but our our thinking was, I don't think there is long term, we'll be healthy at Chisel Strike.

So that was still the main goal. We're not gonna have a healthy outcome running a specialized fork of sycolite. So our Mhmm. We either abandoned sycolite, which we didn't wanna do because we really loved it, or or we try to create a community around this thing that will have all of those changes because then the then the the credence comes from that community. So say, hey.

I mean, just a Sure. So if we if we succeed to create a community here, then then Chisel Strike, the project, will benefit. So that was the goal. But in my mind, creating a community on this would take 6 months to a year, and it will be a very slow process and and etcetera. And it wasn't what happened.

So the the the huge success of of the Libsico fork was what led us to think, okay. So this finger was is in fact so successful that that, maybe we should just do only that and and the rest of the thing that we're doing is is not that important. Right? This seems to be resonating a lot more. Mhmm.

But it was not our original intention. Yeah. So you you had forked it to serve the needs of Chisel Strike. Pretty much. And then were surprised by the reception and thought, hang on.

This is getting more interest than our current products, so we should at least consider exploring what this looks like. And this is interesting, I think just as kinda like a side principle, for people listening that, like, once you're in motion, things just happen, and ideas just present themselves. And I feel like this is the same story as tailwind. I mean, Adam Wavin was live streaming himself building out a SaaS product that he was working on that was he was gonna use to sell his info products. And while he was streaming, everybody was like, hey.

What's that CSS library you're using? Hey. What's that CSS library? And he's like, it's not a library. It's just this file that I have.

I carry from project to project. And enough people asked that he finally just was like, great. I'll package it up, and it's Tailwind. And now, of course, we know Tailwind is an empire. And so I think it's just so like, it's such a good life lesson that once you get in motion, that just that just begets more motion and and more opportunities and more things that you can do.

And I just love that that is your story as well. It is. And you have to be paying attention and open to that change. Right? And and it's also when you when you tell the story, of course, you're telling a sanitized version of the story.

You don't you don't put all the struggle that goes inside you. But if I if I may be open about that, I mean, you you do have that struggle because when you tell it, it it sounds obvious. Oh, I had this sign and then that sign, but you also have doubts. Right? Yes.

It it's something like, okay. So I've been building this thing for a year, and there is a there is a community here which it it, you know, dwarfs in comparison. I think Chisel Strike, the Discord channel at the time had 400 people. And after a year of of Chisel Strike, Thurso today, our Discord channel has 4,000 individuals after the 1 year build. Now after 1 year of building Thurso, I can clearly look back and say, well, that was definitely the right choice, but it didn't necessarily feel like it at the time.

It was it was a very scary choice. Right? So it was okay. So maybe this is maybe we just got, like, a lot of people who are ready to press a button and and say star, but does that mean that they're gonna use this thing? And and I just say and at some point, you have to take the plunge.

But, that that's essentially what happened to us. Okay. So you you and Pekka and the crew, you fork, SQLite into LibSQL. You write a manifesto. You get a ton of interest, and then you, like, decide to turn the ship.

And so what is the first thing, what's the first thing you and the gang put into Lib SQL? And did that, would did that come directly from y'all to serve the needs of Terso, or did you start by incorporating a bunch of, like, open PRs that were on SQLite that were never gonna be addressed? Or what was the first move for Lib SQL after the manifesto hit and people are like, yeah, let's do it? There were no needs of Turso at the time, so definitely not. But, we the first thing that we we did was to just write, allow you to create user defined functions in WebAssembly.

So Cool. SQLite allows you to create and by the way, this is something that is just not it ended up not being very interesting for us at Terso. It is not something that we allow people to do in the Terso platform for a variety of reasons. But it is the first thing that we added to to libsql. SQLite allows you, as you know, to define functions.

Those functions are usually something that you write in c and and just put there and then you register the function. Now we wanted a something with a create function syntax in the language. Mhmm. That that SQLite, you just called a a symbol, c symbol, let's say, and register the function. So we wanted a more flexible way to do this.

This is something that we, we also happen to have, at the time somebody in the team that is incredibly highly skilled with WebAssembly. He had done a similar thing for Sila to allow you to write WebAssembly functions. So it kinda just felt like, let's see. Let let's try and and and let let's see if there is any interest here. But at the time, it it was still very experimental.

And then the second thing that we added, was essentially the ability to do a native replication. And this is still one of the core parts of what Durso is doing. And so this native this native replication, at what point was this for Chisel Stryker at this point? Are you you y'all starting to bat around Terso and this native replication enables Terso to exist? More more the latter because, by now, this was already again November, December.

So Turso did not exist yet as a concept, but Pekka and I knew a couple of things. So we knew that we we wanted to try to put something out there to see if we could build a company just around the fork and not around the other bits. So that was number 1. And and we also we knew we would involve replication in some way. The the the code that we had for Chisel Strike was way different.

The the internal code that we that we had for Chisel Strike was using Raft for consensus, and it was much more complex. And so so we had this this idea on our mind of what Truso could become, and then we wrote the replication code, to serve that. Right? Gotcha. So let's get let's get a little bit technical.

So there are there are some ways from the outside in to do some replication, and people, like, treat SQLite as a black box and kinda do their stuff around it. How does native replication work in Libs SQL? So SQLite has, and the other day, I got into a Twitter fight with someone about that. But part of that is that people sometimes don't know that SQLite has two modes of operation. And the fight was around whether or not SQLite allow reads when it's writing to the database.

And the answer is obviously it does. But the reason but the reason the fight exists in the first place is that it didn't, at some point. So sequelae has 2 modes of operation. One of them is called the journal mode. Mhmm.

And I think most people don't even use this anymore. Modern SQLite has something called the wall mode. So wall is called write ahead log. So in the in wall mode, you still cannot have multiple writers, but you can have readers concurrent to writers. And and even the concurrent writer thing, I I keep thinking this is so unimportant.

Like, people have this idea that, oh, I can I can if I if I write something, the other writer is gonna fail? So I can't really do concurrency and SQLite, which in wall mode is completely untrue. At the database level, at the lower levels of the database, yes, those rights will be serialized and you're not gonna have 2 at the same time. So adding cores and adding threads will not help you, but what that means is that the second right will wait for the first. So it it's not from from the point of view of the user, you just do a bunch of concurrent writes.

It just works. Right? You're limited in throughput, though, because you cannot do this from multiple cores. So that's essentially what it means. So there is there is this data structure in SQLite called the wall that allow those things to happen that is already there.

So the code that we wrote replaces the SQLite wall. So it does 2 things. It it first of all, it it adds hooks in the sequelite core library so that you can have an external implementation of the wall. And then it provides this external implementation of the wall. So every change that you Libsico does not work in journal mode, by the way, because of this, because the core will be doing rely relies on the wall.

The wall as as the name implies is the right ahead log. So the right ahead log contains the newer changes. The way it happens is that you make changes, so you don't make changes to the database file. You make changes to the right ahead log, and you keep writing different segments of this right ahead log. And then every now and then, you come and compact the database file together with the new segments of the wall.

Otherwise, you would end up with a infinite amount of space. So every now and then, you have to compact that back into the database file. So we have, first of all, the hooks to allow anybody to come up and provide their own virtual implementation of the write ahead log. And then we and then we have one particular implementation of the write ahead log that is the the the implementation that allows replication to happen. So I can ship this right ahead log to a different instance, and that instance can receive the this new right ahead log.

And then write that in the SQLite disk format. So when when in memory, those things are not like SQLite to allow replication to happen. But on disk, when it's materialized, it is always in the SQLite format because then what happens is that if you open that file with SQLite, you would still be able to to read it just fine. Interesting. Okay.

So you have y'all have taken the write ahead log idea and made it abstract, and presumably the, you know, the SQLite implementation is the first concrete version Yes. Of that. Yes. And then you wrote a second concrete version. So this is already feeling similar to SQLite's internal architecture for, say, the VFS stuff.

So you have a you have an abstract and a concrete. And so you've done the same thing, but at the for the wall mechanism. Left sequel lights has one concrete implementation, then you have this, replication friendly one as a second concrete implementation. And what you're saying is that as it is represented in memory, you're able to do something special where it sends it off to somebody who's listening, but once it writes it to disk, SQLite or Libs SQL will pick it up as if it's a regular wall file. That's correct.

Yes. Okay. So then the implementation that, like, sends it off. So, how do you set that up? How do you set up, like, who is sending?

Who is receiving? Where are conflicts happening? Who's the source of truth? Talk to me a little bit about that story. Let me take a step back if I may.

So one one one of the things and and then I'll converge those things. One of the things that, a lot of people will also claim is that I love SQLite, but I am deploying my things on serverless. I am deploying it on Cloudflare Workers. I am deploying it on Vercel. I am deploying it on AWS Lambda.

SQLite doesn't work well in those environments. Mhmm. So that was another thing that we wanted to make happen. So to make that happen, we also wrote something that initially we put in a completed in a separate repository, which in hindsight was a big mistake. And then we fixed this mistake.

The good thing about life, Aaron, is that if you make a mistake, 9 times out of 10, you can fix it. Most of the time. Yep. Most I'm not claiming here that all mistakes are fixable, but SoftSure SoftSure does have a Software software approach is 10 out of 10 fixable mistakes. Yeah.

No. No. 9 and a half out of 10. Let's let's give some room for for doubt. But so we wrote something called the Libs SQL Server, which today is part of Libs SQL in the same repository and it's all in the but at the time we wanted to the reason we wanted to do this is that we didn't wanna muddy the waters between, like, what is the thing because people see SQLite as this thing that is just a file.

What is the thing that is the file? What is the thing that is the server? But I think our insistence on getting this into different repositories ended up muddying the waters even more. Because especially for Truso, we heard people saying, oh, those guys are sketchy because they're claiming that their things are open source, but you go look at the repository that they point you and it's just the library, like all of the server functionality. And we say, no, but the server functionality is in this other repository.

But by the time I have to explain this Yeah. Right, I already created a bad impression. And, so we we merge and also, and also had a lot of practical build time complications as well of having to Yeah. Synchronize to repositories. So we merge them together.

So there is there is something on the Deep SQL called the Deep SQL Server. And the Deep SQL Server is essentially a protocol plus an implementation. So the protocol is also open and anybody can implement. It's a protocol plus a implementation of SQLite over HTTP. So it allows you to essentially do SQLite over HTTP, including transactions, over a fully fully stateless HTTP.

So you can do interactive transactions. You can do whatever you want. And this allows you to use Libsico on Cloudflare Workers because what you do is that you put a server close by. Now the the the binary is not going on Cloudflare Workers, but that's fine. But now you're connecting to to Libs SQL over HCP.

And when we released Urso, by the way, that was what Urso did. I have those replicas across the planet. Mhmm. And and and now you can connect to the closest one. And I how to select what is the closest one for you.

That that was it. Right? But this server, like, this server is what handles replication. So the server is where you connect you. When Tursu today, and Libsico in particular, have 2 modes of replication, which is server to server replication.

K. In server to server replication is something like I have a primary, which is the source of true, already answering some of your questions. The model is very simple. C collet is very simple, and we want to keep everything that we do very simple. That is Mhmm.

The the goal of this project. So there is a primary server, let's say, in Dallas, close to you. That's a good choice. And and you can then create a replica in Toronto close to me. Those are 2 servers.

You connect it over HCP and, Turso will and this is a part of Turso, not Libsico because this is mostly infrastructure setup. Torso will, connect you to the closest replica. But if you're really talking about pure open source Libs SQL, again, it's very simple. You have a primary there. You have a replica somewhere else or in this one server or in this other server, and then you just connect to one of them and get the data that you need.

Simple and and easy. There is another mode. And and in this mode, again, the the primary is the source of true and the replica post the primary. So, hey, what is the this is the last replication index that I received. Send me your stream since that point.

And then I keep track I keep track of the replication index, and I'm always asking the primary, give me all of the data since this point in time. And then the primary is is, the primary is not sending things before the replicas ask for it. So it's a pull mode replication. And then the other thing that we added later, which is one of the things today that people absolutely love about TURSL is that if you think about it, this allows you to use SQLite in those environments that you couldn't before. So you can start tutorial, you do your Laravel tutorial with pure SQLite.

And then if you deploy this to a serverless environment, you can still use the same SQLite because now you're talking over HTTP. Right. But but that's not really how SQLite works. I mean, the the a lot of the SQLite magic happens because of the file thing. And again, people already loved it that you could develop in a file and then just switch the URL and and that goes over a sheet that was already magical.

But what we allow today to happen is something that we call embedded replicas. So embedded replicas is essentially a file. And in the same way and the same way that a server replica can query the primary and every 200 milliseconds and say, give me the Delta since that point in time, The library itself can do that for you. So, you now have a application running SQLite as you would, any, at any time. And all the rights, the rights are slower in all fairness, at least for now.

We, we, we plan to fix that in the future, but the rights are always going to this primary and getting back to you. But when you read, you read from the file and even if you have other writers, right, even if you have other writers, if you don't have other writers, you can use this as a backup solution. It's not that interesting, but but but but it does work. But the powerful scenario is in which you can have now many writers. So you can you can essentially scale out your application.

Mhmm. Each one of them will have a local file. So you're always reading from the local file at 0 millisecond latency, for for networking and every configurable by you, 500 milliseconds, one second, or whatever you want, you pull this primary, and then you just get the data back. And that data is materialized in the file. Right?

This, I think, is the most interesting and powerful thing. This is I think this is my favorite. So to to say it back to you, you have a you have a primary somewhere. So you have you have the primary, let's say, in Dallas. And then on every every server or edge node or wherever you're deploying your application, you can have these embedded replicas.

And so anytime you have a write in your application, your application would have to have read write set up already, which, you know, most most frameworks have. So Laravel has the concept of a writing database and a reading database. Feel like that's pretty normal. So you send all of your rights to the primary, and then the Lib SQL that's sitting next to your deployed application is constantly pulling that primary and pulling changes in. Mhmm.

So when 80% of your traffic, 90% of your traffic is reads and not writes, which I think is about standard, 90% of that traffic is hitting the database that's literally sitting on the same disk right next to you, right next to your application. But you still you still get that that benefit of SQLite being embedded, but you get the benefit of a more traditional or a different style of database where you can have, a bunch of writers coming to one point and then replicating out. Pretty pretty much. As long as you have a file system, you can do this. And which a lot of miss one misconception that that means that it doesn't work on AWS Lambda.

It doesn't work on Vercel. That is half true because it's usually not the recommended setup, but you do have a temporary file system in those environments. Mhmm. And this tends to be recycled. So for for a small amount of data, for a lot of data, it doesn't work because for for for us for a large amount of data, you have to be downloading the whole file all the time.

But for a small amount of data, and we actually have a blog post explaining that, you can write the database to the temporary file system, and check and and check the because if you have to now contact the server to see if this is up to date, you already lost the benefits. Right? Might as well do HGP. So what we recommend is that you use the temporary file system, you check the latest date of modification, like the time stamp on the file. If it's less than, let's say, a second ago, it means that you just read from the file.

If the modification timestamp is more than a second or if there is no file, that means that you've got a fresh container or or or you've got recycle or last update was last was appeared that you don't consider fresh anymore, then you just sync first and and then you read. So you can do this for a small amount. The thing is that on if you don't have a long standing server, this is gonna happen a lot more often and it just that is not worth it if you have, like, a gigabyte of data. It still works well if you have, like, a maybe a couple megabytes of data. But if you're deploying something, let's say, on Flight IO, on render, on CoA app, on AWS CC 2, anything where you have a server, you have a file system, you can even use, like, ephemeral volumes because if your file system if your container dies and you have to replace it, at that point, you TURso will automatically, you download the whole file again because it's the first sync.

It's just that you you provide no index, you get the whole file. In a long standing server, that doesn't happen very often. So it's still worth doing it. So you don't even need a persistent volume. You can use just whatever temporary file system, system, a a ephemeral volume.

You can use whatever you want. That's that's interesting, and that makes a lot of sense. So is the is the is the blessed way to use Terso in a serverless environment most of the time without those caveats, the HTTP method? Yes. Yes.

Yeah. So it sounds like in certain cases, it might be okay to pull it down and put it in the temp storage. But for the majority use case, you're gonna use that HTTP mode, of Terso, which, yeah, that makes a lot of sense. Do you see a lot of people, as far as like your your customers, do you see a lot of people deploying in serverless environments? Or do you see a lot of people putting it in something like Fly where it's like quasi serverless, but there's still there's still actual servers, long running servers.

Fly has nothing serverless about them. It's pure servers. But it feels so serverless because you can Yep. You can, like, you can still SSH in so it's not serverless but it feels so serverless. You have the granularity of a server and you can have things like flight flight does allow you to, like, have something that comes up, serves a request, and and goes down.

Yeah. But you always this is always backed by a server. Right? Mhmm. So where do you see a lot of people like, what's your split, if you had to guess, of where people are using Terso?

I think the is I think it will be close to 5050, And it might and it might be a little bit more, like, in better replicas, we're transformative. I mean, when we release this, we we just got a completely different class of of people coming. Right? Just it was very I think JavaScript is still our leading language. But when we released Urso, it was, like, by a large margin.

Like, 90% of the people were really coming from JavaScript. After embedded replicas, I think this became a lot more appealing to a and this is, I think, just because most of the function as a server things run with JavaScript. Yeah. Now we have a lot of PHP developers coming. You know that because I told you, we didn't have the bandwidth yet to do a official PHP PHP driver, but we had 3 different community contributors contributing PHP drivers, for for usage with Thurso.

And then we decided to sponsor one of them that that seemed ahead of the pack. But again, look at that. I mean, before you can even have a driver, people are already, like, hooking something up to the best of their abilities because they really wanna use this. So this idea of embedded replicas is just really, really powerful, and it's something that marries the benefits of SQLite, that people have been knowing and loving for 2 decades with how you run a production postgres like my SQL like database. Right?

Mhmm. Yeah. The reason the reason I like this embedded replica, system is because most of the majority of traffic is gonna be reading their database and not writing. And so you get the majority of the benefit by just treating, SQLite as a nearby read, and then you can write and those can incur a little bit of penalty because it's less frequent. And and I wanna and I wanna clarify that there is a performance penalty in writing because now you have your writing not only, like, you're not gonna be as fast writing remote and writing locally.

That that is obvious. And also the way we write, like, it's it's a little bit more expensive than a normal write as well because now you have a full round trip to get things to come back. Because we do wanna provide read your own rights consistency. If your replica writes something, you will read your own rights, which is much better than a fully distributed cache in in which you never have those guarantees. But you don't have to set up anything for that to happen.

So it's not that you have to be aware that you are all you have to do is to write. So the code doesn't change. The code is still reads and writes. Oh. Reads and writes.

There is no setup involved. It's just that it is faster or slower. That's it. So from the perspective of the application so from the perspective of the application with, embedded replicas, am I just writing and Lib SQL intercepts that and sends it off to the primary? Yes.

That's the ticket. Yeah. That yep. That's the move. So you don't you don't you don't have to have an external oh, now I'm gonna write.

I'm gonna do an external thing, which is my impression was that you were understanding that way. Like, are you you just write specially. You don't even have to do that. They're just slower because you're now being intercepted and writing to the primary. But from the perspective of the code, you just read and write.

There's nothing there. I like that. I like that you hide that you hide that behind your implementation or your, library, not mine. And and once more, this this is only possible because we forked the whole thing. So we have all of those entry points where we can add the stuff that we needed.

Right? Right. Okay. So the thing that I've seen recently that I think I think your, very smart cofounder Pekka has been working on is We'll agree to disagree on on that, but then yeah. I saw that y'all just released, some sort of vector search embedding stuff for which, of course, is like everybody is clamoring for vectors these days.

So tell me, what your what your very talented cofounder has been cooking up over there. Yeah. So, just to clarify that I am mostly joking. Pekka is actually sometimes quite smart. Smart.

But here here's the conclusion that we got, like, every database company today faces pressure to respond to the rise of AI. K. But what we never wanted to do was do it just for the sake of doing it. So we it just doesn't match our style. That's that's not what we wanna do, and that's not who we are as people.

Like, we wanna make sure that what we whatever we're doing has a a big benefit and and the market will appreciate. Otherwise, we've got tons of other things to to do. Mhmm. We we went through a laundry list of ideas of, like, do we have a natural advantage for SQLite for this thing in the AI market? We even said, can we run SQLite in the GPUs?

And we found some academic papers of people who tried and got good results and, you know, we explored a bunch of those crazy directions. And eventually, we understood that vector search is not that different than any other form of database, operation. And the same things that people love about ERSO, that people love about SQLite, and that people love about this idea of, like, hey, run something locally, replicate do embedded replication. And just one of the things that people appreciate about Tersu so much, how cheaply we can make the service service available with a very generous free tier. All of those things would be just as applicable to Vector Search.

There was an extension. There is an extension, which I understand is getting deprecated. They call SQLite VSS that some people in our community were using to to do vector search of SQLite. So you can load this extension. This extension allows you to create a virtual table on SQLite.

And on that virtual table, you you can you can essentially do vector similarity search. We talked to a lot of those users, and then we understood that one of the things that they didn't like about it was that loading the extension was not great. Like, just that it's this this thing I mean, it's okay, but I have to do something about it. And then every time that I'm using it, I have to load the extension. The developer experience, a lot of people told us is it's just not there because I have to create a virtual table and then I have to then get this virtual table to work with the rest of my data, which is not in the in the virtual table.

Imagine, for example, you have a table of users and then you have to have some vector representing that user. Mhmm. How how do I query those things together? None of those things were things that that people in our community like very much. And now, as I said, we have a 4,000 strong community on Discord, so we have a lot of data to just, you know, go and and and chat with with some of those folks.

And then we said, look. The whole reason for the Leap SQL project is to be able to change SQLite. People don't like the fact that this is an extension. People, but people keep they keep telling us that, look, other than that, this sounds incredible for vector search because another thing that we didn't explore, creating a database on SQLite is just creating a file. Right?

So it takes a 100 milliseconds and it can create a 100 thousands of them if you want. And 1, if you wanna do a database per user, go do it. If you wanna do a database per body part of the user, go do it. If you wanna do a database per hair, you know, hair strand, you may able to do it. I I I can't.

Like, just just do it, man. It'd be it may would it be a no op for you? Exactly. Yeah. But I do have a lot of body hair, though.

So, like, if you're counting the whole thing, I need a lot of Keep going. Keep going. Move on. But but, lot of people seem to be interested in that. Look, if it if it wasn't for the fact that this extension is cumbersome, and it had a lot of technical limitations, Durso will be a perfect database for running those AI applications because it can run, like, per per user context.

They just put different users in different databases. I can put this on my mobile device. Embedded rap because, by the way, we are a little bit low. The drivers are not that great, but conceptually, they can work on on embed on mobile devices. It's just that working with with React Native and and etcetera, it's complicated.

We're fighting that battle now. Mhmm. But conceptually, people say, oh, imagine that. I could I could keep those things completely separated and I could push to the user mobile device. I could do all of that, but, like, loading the extension, it it's hard.

And, like, the extension itself has a lot of problems. So Pekka, essentially, out of his wisdom, understood that the whole reason for this project existence SQLite and let's change SQLite natively. So we now has a vector column. And this is what we ended up doing. So if you can create now now you can create a text field, you can create an integer field, you can create a blob, you can create a null, or you can create a vector.

Right? And it just works like that. And this is something that the community so far has been appreciating very, very much. Everything you wanna do, by the way, if you wanna use this without Urso, just purely locally with Lib SQL, you can do this as well. Because if we're talking about all of those things that you can do, that doesn't mean you have to do them all at once.

So you can just put this on a mobile device, for example, and now you have native vector search. And we also implemented a a indexing algorithm to index those vectors. And and it just becomes a part of your SQLite table. You you will write a query like a select, star from movies where year is higher than 2020 and the vectors are similar to those other vectors. Just do that and limit 3 in order by the author.

Right? You write a SQL like query that now has vectors on it. That is super cool. How long did it take how long did it take y'all to to, implement this? And what is, like, the I guess, what's the overhead or the maintenance complexity?

How how easy was it to do? So again, it does help that Pekka and I have a lot of experience with c, and and very low level stuff. The hardest I think the hardest part was really, some aspects of the developer experience are not exactly they didn't end it exactly being how we wanted because some parts of SQLite were very hard to change. So but but by by and large is is fantastic. It took us around 2 months, and by the way, it was exclusively Packer working on it.

We didn't have any other engineer. Pretty good. Yeah. You you haven't seen Packer is a monster. It's a he's out of the world.

Apparently. Yeah. And and I mean, so for context, the SQL parts of the SQLite code base are 2 decades old. And I well, from what I understand, DRH writes writes his c in a very particular style that can be a little bit, obtuse. And so to be able to do that in in 2 months is pretty impressive.

I I appreciate the way you phrase it, Aaron. I think you're just a much more polite person than I am. Or or with a richer vocabulary. But yeah, this, this again, and I wanna clarify that, SQLite is just a fantastic piece of software is extremely reliable, but the coding style is hard to follow. And and Mhmm.

I think as a result of being maintained by a single person or close to a single person for 2 decades, and it it is quite hard, but, look, go look at all history that Pekka had in the Linux kernel. I I was not a the high profile guy in Linux. I I just create the Zara, but, like, Pekka Pekka was, and still is. I mean, Pekka is still one of the best engineers we have on the team. And as the CEO, I think that if I have to claim any credit on this, the credit was exactly for me to look around, and this is not to diminish the rest of the team, which are also fantastic engineers.

We have people working from us. For example, the maintainer of the Tokyo library in Rust, and and others that are extremely competent engineers. But they don't know c because they're all kids. And kid kids for me are, anybody under 40. I think we can confidently call, like, you, Aaron.

You're a baby. Yeah. You're a baby with your own babies, but, you're still still That's right. Many, many babies, but, yeah, still a kid. But but look, what I told the company is that I will moonlight as both CTO and CTO for the time being, like, I'm running the show now on both fronts, and you guys just leave Becca alone.

And now now that it got to this point that it's it's it's in beta, but it's released. Mhmm. Now we can start off loading parts of that, and we can just look at this as a normal project and and go into stability cycle. But, you know, Pekka alone doing nothing else just did this in 2 months. Man, that Yeah.

He sound he sound he sounds like a a pretty, a pretty beastly engineer. That's very impressive. So what's next? Like, in terms of, I guess, both Lib SQL and Terso, what's next? They could be the same or they could be different?

What's what's on the horizon for each of those things? It's still incredibly important for us to grow this community. So if you are a person that wants to be a part of that, this is why we started. We have a company around that now, and we moved the whole company towards this, but we never lost this dream of, like, be making this a large community project, which we did, by the way. We have over 60 contributors to Libsico today.

Man, that's awesome. So so I would love to, I would love to leave some of this future up to you, the listener, and, you know, show up. And and we have contributors today doing things that we would not be otherwise doing it. And this is exactly what we wanna see. That said, we do have a couple of things on the horizon for, Turso.

And, obviously, the things that we're gonna be putting into the project are first and foremost, for the benefit of the company, the the the things that we actively get involved with. First of all, we need we want to do more vectors. So we wanna, there are a couple of things that Pekka are still working on and starting to hand out to other people. And not to get you technical. Audience heard of it.

If not, get me follow me on Twitter, ask me the question. For for embedded replicas, we also wanna do offline writes. So again, today, you all you still depend on the primary should be writing. So you always write to the primary that is low because you're writing to the primary and, and you can only write when online. We want to have a mode in which you essentially make your server the primary and Altuso just becomes a distribution point.

So, we call this offline rights. That's one of the things that we wanna do. This would move us even closer to the fantastic model of, like, doing SQLite things, and, look, double down on a lot of things on the initiatives that we have. I love it. Man, I love it.

I'm so glad that you came on and explained all of this to me. I'm And it's obviously very, very interested in the underlying technology, but also the business angle is super fun for me. So thank you for taking the time to do this. If people want to follow you on Twitter, and and, argue with you about SQLite journal mode or wall mode, where can they find you? They can find me on Twitter atgl CST, as in Glauber Costa.

Please don't follow Pekka because Pekka and I have this competition internally. He he just wrote a book that is terrible about latency. I don't recommend anybody buy that book because it's been every now and then he comes to me and say, hey. Hey. I I saw that many copies of the book.

And then he's trying to use this to essentially brag. And I just don't want to give him the father. So just please, if you're listening, just don't do it. But I'll leave the links in the book in the show notes. Aaron, did break my legs by doing this, man.

Okay. So follow follow glauber not gl But not Becca. Glcst, which is a great, great handle. Five characters. Pekka Pekka is even better because his name is Pekka Amberg and then his handle is p Amberg, which I should not have said, but, now that I'm just just don't follow.

You can visit his profile. Just don't follow. There you go. Go admire his profile. Admire the book, but don't don't take any action.

Okay. Well, thank you again for doing this. And y'all y'all go follow golly. Y'all go follow Glauber and check out Terso. And until the next time, talk to you later.

Thank you so much, Aaron.

Full Course

SQLite for Production

Summary

Links

Video Transcript