Video Transcript
(graphics clacking) On the road to practicality, I'm gonna give you one more tip. The the last practical tip was sorting the results by rank or by BM25. This one is showing the user where the match actually occurred. So, you know, when you're on a website and you're typing and you type something and a whole chunk of text comes up with your search term highlighted, that's what we're gonna do. Let's take a look. So if we run this, again, we're matching on index. And what if in our UI we want to bold, highlight, mark somehow the match. We don't wanna do that on the application side because we don't wanna re-implement full text searches, right? So we have two options here and I'm gonna show you the first one first. Well, that makes enough sense. Highlight. The first thing you pass through to highlight is the name of the virtual table. And then we're gonna target again, one of the columns by which we declared that table. So we declared this table with title and then transcript. And so that is the order that we're talking about, and it is zero indexed. And so we want to hit the title column. We're gonna hit the title column. And then you have two more arguments, which I'll just tell you in one second. And we're gonna call this as title... We'll just say that this is title_highlighted. So here you have the ability to say what the opening tag is and what the closing tag is. So just so that we can see it nice and clearly, let's just throw a bunch of square brackets on there. And if we run that, you'll see, look, there is the exact word index. There is index again. And so that's kind of useful. You might wanna do something like this if you're displaying it. If you're just throwing it straight back out to the front end, you could have marks like that, and then your CSS could hit that and do some sort of highlight. I don't super love this because indexes don't get highlighted. We can use one of our search tricks from earlier and say the token must start with index, but then after that, let's hit a few other suffixes. So we're gonna say, this is the prefix followed by a wild card suffix, and then that looks a whole lot better. I'm gonna change this back from the one you would actually use to one that is easier to see at a glance here. And now you can see all of those things. Indexed, indexes, index, and indexing are all being highlighted. This would be a pretty good user experience in my opinion. However, here's the problem. This is returning the entire content of the column with those marks added. If we were to do that for the transcript, that's too much text. That's too much text if we have a Command + K, search bar that up and we start typing. And then you see 6,000 words of a transcript. Terrible idea. There is another function called snippet, which this function is very, very similar except that it extracts some portion around the match and just returns that. So let's take these same arguments here. We're gonna take all of these guys. And instead of hitting column one, which is title, we're gonna hit column two. Well, instead of hitting index zero, we're gonna hit index one. And we're gonna say as... We'll call this transcript_snippet. But we have one more thing, in fact two, we have to pass in the ellipses. So whatever you want the missing content to turn into, that's what you put here. I'm gonna put ellipses first and then we'll say, let's limit it to 64 characters. Now if we run that, you'll see exactly what you could put in your UI. So you'll say dot, dot, dot. Meaning there's something before. Let's create another index, big highlight. And here we see, so your table itself is an index, and if we... Let's just drop this down to, like, 14 and see what happens. You'll see... I'm wondering if you see dot, dot, dot at the end. There you go. So dot, dot, dot at the beginning and dot, dot, dot at the end. You could change this to whatever you want your ellipses to be. You could just change that. And if you could then look for that on the front end and change it to be styled however you want. So the snippet. The difference between the highlight and the snippet is the highlight will pass all of the data back with the matches surrounded by your start and int tag. The snippet extracts a portion of data as defined by you. The length is defined by you, and then it will highlight the matching tokens and then also truncate the rest of the data and put in your value for the ellipses for the missing data. So you could use these in any combination that you see fit. I think this is a pretty good example of where to use highlight versus snippet. The titles are always gonna be really short, so we might as well just highlight them and return the full titles. Also, I am imagining a search results page and I want to see the full title. I don't wanna see the truncated title. I care, as the user, about the title. Then in the body, I wanna see just the portion of the transcript that is relevant. So this is a pretty good use case, but again, neither is better than the other. They're just differently suited for different needs. So I would encourage you to use both of them where you see fit.