Video Transcript
(keyboard clicking) In this module, we're gonna look at SQLite's, full-text search capabilities. And I gotta be honest, they're pretty good. The full-text search available in SQLite is surprisingly good, given SQLite's narrow focus and the fact that they claim SQLite competes with fopen() and not with MySQL and Postgres. This full-text search totally competes with MySQL and Postgres, it's very good. If you need something more like faceted search, you're probably gonna reach for Elastic or something like that anyway. But for basic full-text search like you would have on a website or in an app, it's gonna be pretty great. There are a lot of flags and a lot of features and a lot of ways to tweak full-text search. I am gonna try to keep us on the straight and narrow and show you an example that is very useful for an application that is pretty run of the mill, not anything esoteric. You'll see in documentation, and in articles and videos, talk about full-text 3, full-text 4, and full-text 5. We're only gonna talk about full-text search 5, often called FTS5. That's the most recent version. I don't know what happened to FTS1 and 2. Those are lost to history, but FTS5 is built into the amalgamation. So odds are you don't have to do anything, it's just there. It is implemented through what's called a virtual table, which I think we've talked about briefly, but at this point, I can't really remember. But a virtual table is kind of like a shadow table or a fake table or a copy of a table that you don't really have super much control over. And so what we're gonna do here is we're gonna have a videos table that's gonna contain most of the videos from this course, along with their transcripts, and then we're gonna create a full-text search table to search those transcripts and return those videos. So that's what we're gonna do here. I'm gonna show you right now how to set up your first full-text search table. Okay, here's the table. Here's the non full-text search table. This is just the normal table. I've removed a lot of stuff. The actual production table obviously contains a lot more, like the links to the actual videos on Vimeo, the duration, whether they're free or paid, that kind of thing. But what I've done here is I've just pulled kind of the text content out of our production application. And so there is a lot of typing that I'm gonna have to do here. And so I've opened several tabs here with all of the commands already pre-written so that we don't waste time watching me type. So with this table in mind, with this videos table in mind, what we're gonna do next is we're gonna create a virtual table called video_fts. And so this virtual declares it as that virtual table that we talked about. And then it says USING fts5. So that's that FTS5 thing that is built into the amalgamation. And this is the virtual table driver. Other virtual tables are like CSV and then maybe file or readline or something. I don't remember hardly any of them clearly. But FTS5 is the driver for full-text searching. And so we have a couple of things going on here, which I think are interesting. So I'm only gonna bring over the title and the transcript. If you look back, actually, I can just type that one out. If you look back over here, there is a slug column. I'm not gonna bring that over. And what's interesting about that is you don't have to bring everything over. It's gonna search over every column in the whole table. And so if there's some column that you don't want to be included in the full-text search, don't put it in the full-text table. That's a pretty good start. So we're gonna leave that one out. I'm going to declare that the content table is the videos table, which will be very important here in a little bit. But what I'm doing is I'm telling SQLite these two things go together. And then this is extremely important: in the content table, which we've already declared as videos, in the content table, the row ID, or the primary key, the row ID is the ID column. And this is how the virtual table and the actual content table are linked together. So when we say, "Give me some stuff from the virtual table," it's gonna search through the virtual table, grab all the row IDs that match our full-text query, and then it's gonna go back to the content table and say, "Hey, I've got all these row IDs "and I have been told to look in the ID column "of the videos table to get the actual rows." And so declaring these two is pretty important. Another interesting thing about the FTS5 virtual table implementation is you cannot declare types at all. Can't declare types, you can't declare primary keys, you can't do anything. It's a very limited virtual table. And in fact, if you try to declare a type, it's just not gonna work. So let's go ahead and create this table. So we have virtual table videos_fts, and then I'm just gonna pop open the sidebar here and you'll see it actually made a bunch of different tables. And this is all internal implementation. So you shouldn't really, you shouldn't really look at this. Well, you can look at it, but you shouldn't really change anything here. And this is where, if you look, this is where our FTS content has already been populated. And it honestly, populated is a little bit of a lie because we haven't actually populated this full-text search table. What we're seeing when we look at it is the underlying content table, which is the videos table. But that full-text search index hasn't even been created yet because we haven't inserted the data. So unfortunately, this is a drawback in my opinion, of full-text search in SQLite with content tables. You are responsible for keeping the content table and the full-text search virtual table in sync. And I'm gonna show you how you can do that. It's not that difficult, it's just kind of annoying that you have to do it. But first, before we do that, let me prove, hopefully, let me prove to you that the index doesn't exist even though it appears to exist. So if we look at this select * from video_fts, hey, we've got all the data, that's awesome. Well, that's a lie. If we do where videos_fts match, and we'll go over the whole searching thing in the next video. But let's look for a word that I have definitely said at some point in this course, which is index. And we run that, nothing, nothing at all, because we haven't actually created the full-text search index, we've just created the virtual table. So what we need to do upon creating the table is we need to populate it. We need to populate it, and we need to say INSERT INTO video_fts the full-text search table, the row id, which is that secret ID, title and transcript, and then we're just gonna select the ID title and transcript out of the content table. So we're just saying, "Hey, copy everything over." And if we copy everything over, that's gonna be fine. And then if we were to search, or rather select *, it doesn't look any different at all. However, if we run it again, now you see, great, we have 32 videos where I said the word, index. That's too many videos to talk about indexing, frankly. But now we know that the full-text search is actually working. What happens when you update a column in the videos table? So let's do this, let's change introduction to this course to asdf qwer. Hopefully I haven't said that before in this course. And so we've got asdf qwer and now if we do select * from video_fts, hey, that looks like it was propagated. Ah, do not be fooled, my friends. If we go back over here and we say asdf qwer, it's in fact not propagated. So again, when you just select from the virtual table, you're gonna see the underlying content. However, the index itself does not automatically get updated. So one way that you can fix this, and this is the brute force way. This is the canon way, and then there is a fine-tuned scalpel way. And so one way that you can fix this is by running this seemingly magic command, which I'm just gonna paste right here. This is bizarre to me. This is a strange interface in my opinion. INSERT INTO the full-text search table, the full-text search table, VALUES('rebuild'). If there's ever been a magic string, this is a magic string. And there are in fact a few magic strings like this, and that's how you interface with the full-text search table. So if we were to run rebuild, it's going to rebuild from the content table, which is kind of nice. And then we were to run this again, you see now the underlying full-text search index has in fact been updated. So one thing you can do is you can say drop table videos_fts, we're just gonna get rid of everything, and then we're gonna try this again, create table, and then if we were to search for asdf, that's not there. But if we were to just straight-up rebuild, now it is there. So rebuild can be a viable mechanism instead of doing this INSERT INTO, SELECT, FROM. You can just issue a rebuild. Ah, that seems fine to me. So this is a way to get the initial set of data into the full-text search table. But then after that, you're dealing with this. This looks hairy, it is literally 12 lines of code. If you remove blanks, it's probably 10, I would think. Yeah, so it's 10 lines of code and this is just, these are just triggers. So what we're doing here, I'm not actually gonna run all of these, in fact, why not? Hey, why not? Let's live a little. Okay, so what we're gonna do is an AFTER INSERT, an AFTER DELETE, and an AFTER UPDATE. And so from here, we're going to INSERT INTO the full-text search, kind of the same thing, like kind of the same thing as this initial population where we insert the ROW ID title transcript as the ID title transcript. So row id, title, transcript, new.id, new.title, new.transcript. This is part of the trigger syntax. You have access to old. and new. So after an INSERT, we want to populate the full-text search with the new values. Let's add those white spaces back, 12 lines we can deal with. After a delete, we're gonna do a little bit of magic again. Here's another magic string. So we're inserting into FTS the table name, fts, and then the column values. And for the table name, we're inserting the word, delete with the old information, which you have to pass through. Very esoteric interface, I must admit. Very esoteric interface. And so when a row is deleted out of videos, then we're gonna delete it out of the full-text search table. Finally, AFTER UPDATE, there's no update. You cannot update a full-text search. You have to delete and insert a little. Eh, a little wonky, not my favorite. So CREATE TRIGGER, let's get some space here. CREATE TRIGGER, CREATE TRIGGER. Now, if we were to come back here and we were to update the videos table and say... What do we wanna say, what's something that definitely did not appear in the content of this, let's say foobar. It took me so long to come up with that and it wasn't even very good. So we've updated the videos table to have a foobar. And now if we were to search for a foobar, then it worked. So these triggers, what these triggers are gonna do is they're going to observe, they're going to watch this videos table, and anytime something changes in the videos table, it's going to propagate to the underlying full-text search table, which is very, very important. So if we come back over here and we say, "Introduction to this awesome course," awesome, that's what I should have said. And then we were to do awesome, you'll see there we go, "Introduction to this awesome course." And apparently, I thought 15 other things were awesome, which is about right. So this is all you have to do to set it up. I know that that feels like a lot, but after you do that, you're done. You're done, it's going to keep itself, it's going to keep itself in sync. If something goes catastrophically wrong, you can issue a rebuild command. In the next couple of videos, we're gonna look at how do you search these tables? What operations or operators are available to you? How do you highlight matches? So maybe you wanna get the transcripts back and highlight the word, awesome or the search term, wherever it appears, we're gonna look at that next. But now you know how to create a full-text search table that is driven by a content table.