Become a SQLite expert

In the video, I show you how to improve search results by ranking them for relevancy in two ways. First, I explain how to use a rank column from the full text search table to order results, which works well for basic searches. Then, I introduce the bm25 function for assigning different weights to columns, helping you fine-tune searches by prioritizing certain fields like the title over others like transcripts.

(keyboard keys clacking) So far we have created, populated, kept up to date and searched a full text search table. But there are a few more things I want to show you, because I want you to be able to implement this in your applications. I want to teach you practical things. So, if we were to run this as we have been running it, we would be returning results to the users that are correct, which is always a good start. We want to be correct first, but it's not entirely relevant yet. So I'm gonna show you how you can rank the results by relevancy in two different ways. We're back to this query here, which we join in the videos table 'cause this contains the actual data that we're looking for and we're pretending that the user has typed on the website indexes. And so if we were to run that, you see, I mean, yeah, but look at this. SQLite's structure comes first and composite indexes comes ninth. That's not very good. So what we can do is let's go ahead and just add, let's just select title to narrow it down a little bit. So we're gonna select videos.title, and then there's this secret other thing, the secret other column that contains the rank. So if we look at the rank, you see it's this big floating point number and the smaller the number is, the more relevant it is to the user's search. So you can see that this negative 1.0 is at the top, but in fact negative 1.6, that one that we looked at earlier is frankly a much better match. So we can simply say order by rank. And this is provided by the full text search table. We didn't create it, it's just there. So now if we do that again and we run it, boy, that if I'm a user and I searched indexes, introduction to indexes is in fact what I would be looking for. This is very good. This is a great start. And if you stopped here, I would be happy. I'd be totally fine with that. In this case, however, there are cases where this might not be exactly what you're looking for. So imagine you've got 10 columns. So instead of just title and transcript, you've got title, transcript, comments, slug, maybe you're searching across the URL slug as well. And you want to like, you want to give different weights to different columns because the word index showing up in a title means a lot more than the word index showing up in someone's comment, right? So we have another option here. We have a a function. So you have to do open, we have a function here called bm25. And if you run that, it's not gonna work. And so what you have to do is you have to pass in the full text search table. And now we've got BM 25 and it didn't do anything. Look, they're all identical. They're all the exact same. So that's not entirely helpful. Where it gets helpful is you can pass through different weights. So remember when we declared this full text search table, we did, we did like create virtual FTS over title transcript using FTS, whatever, right? We did something like this. The important part is right here, title, transcript. So when you're using the bm25 function, you can follow this order and give different weights to different columns. In this exact example, it is more valuable if the word shows up in the title than if it shows up in the transcript. And so here we're saying, hey, we are, we wanna assign 10X the weight to the title versus 1X to the transcript. Now, we could do, if we had multiple columns, we could say five for the transcript and one which is the default for any column after that, ten five when we only have two columns. Doesn't make a lot of sense because you're saying this is worth 10 and this is worth five, which reduces down to this is worth two and this is worth one. So when we only have two columns, let's just say that a title hit is worth five X a transcript hit. And now if we were to run that again, you'll see it does, honestly, it does slightly change a little bit. I don't know if it changes the order a little bit, which is a good indicator that we had it right the first time. If we wanna overweight it and say a title hit is worth 15 x, that is what we would get there. And then of course we're just looking at it. But you could plop that right down there and say rank it by that. Now this looks pretty good to me, introduction to indexes, automatic indexes. So we're getting all of the ones with index in the title. Here's one which must have a bunch of mentions of index in the transcript to beat out these down here. So this is, this is an interesting case and I wonder if, so these up here are a strict match on indexes, so I bet those get more points where this one down here is a singular index. And so I wonder if we changed it to index. Yeah, it changed it. It changed it up quite a bit. So now we have covering, obfuscation, duplicate, composite, and then it looks like altering schema. I don't even see it on here. So that that changed it up. That changed it up quite a bit. So bm25 is a good way to fine tune the ranking algorithm based on your specific data. And so what I'm not telling you is that bm25 is better than rank. I don't know. I don't know your use case. I don't know your data. I am telling you bm25 exists and might be incredibly helpful for your application.

Full Course

SQLite for Production

Summary

Video Transcript