Video Transcript
The last thing I wanna show you about JSON is not a JSON function. We've covered most of those. But I wanna show you how you can index JSON. So in the broader database world, there are kind of two ways to index JSON. The first is on the Postgres side, they have what's called GIN indexes, G-I-N, and those are general inverted, I don't know what it stands for. I just know that they're called GIN indexes, and you can put an index over an entire blob. On the other hand, in the MySQL world, GIN indexes don't exist, and you have to index particular keys. SQLite falls on the? MySQL side. So we don't have GIN indexes here, but you can put an index on a particular part of a JSON blob. And in my experience, that's actually what you're after. You shove a bunch of data into a JSON column, and then realize, "Ooh, I kind of need to query on this particular key," this works great here. So let me show you a terrible, terrible, terrible table. This is the schema that I told you not to do when we started talking about JSON, which is like, "Hey, I've got an ID and then a blob, "and I'm gonna shove everything in the blob." Please don't do that. However, if you find yourself in this situation and you're like, you're told, "All right, we need to query by name, "and we need that to be indexed," we can do that. We can do that. I would recommend fixing the schema, but we can do that here. Json_extract data, and it starts with a dollar, and we're going after name. So if we run that, that is what we're after. This is the column that we do want to index. Now, we have two options here. We have a generated column and we have a functional index. Now, neither of these is better or worse. They're just different. The first is we're gonna create a generated column and put an index on that column. The second is we're gonna just create a functional index and directly index this extraction. Let's start by creating a generated column, alter table users add column name generated always as, and then we're gonna put our function to extract the name in there. And we're just gonna call it virtual. Which means we're not actually gonna write it into the database, it's just going to be calculated at runtime. But when we add an index, the B-tree structure is persisted to disk. So if we were to create a stored column here, we would be storing this data three times instead of two. We'd be storing it in the JSON column, we'd be storing it in the generated column, and we'd be storing it in the index. Not totally necessary. So we're gonna declare this as virtual. And so now if we were to run this again, select star from users without, you'll see we have a name column here. And, importantly, this name is not in our control. We can't actually change this name because it is under the control of SQLite as a generated column. Now that we have that column that exists, we just go back to traditional SQLite, and we can say create index idx_name on users, and we'll call, what did we call that? We called that name. So if we run that, now we have an index on the name column, which itself is a virtual column which extracts data out of that blob. So select star from users where name equals Alice. And if we run that, yep, we got it. And if we hit explain on that, explain query plan on that, search using idx_name. So that is Method 1, and that totally works. This is a good method if you want the name to be shown as a top-level column. Maybe that's useful outside of the indexing. Maybe that's useful for you as a user to look at the database and be able to just quickly see the name. Maybe that's useful on the application side to pull it back and see the name. Maybe that's useful if you wanna do something like this and kind of hide the fact that that JSON blob exists. You can totally do that. So our second option is going to be creating a functional index. And this is very similar in that we're gonna use the JSON extract, but we're gonna skip that middle layer of creating the generated virtual column. So let me show you how to do that. For this one, I'm gonna add the index on the age attribute, just so we're not having a collision here. And so the functional, the functional approach is just create index idx_age age on users, and you put your little function in there. And instead we're gonna extract age out of there. And if we run that, it looks like it worked. Did it actually work? Who can say for sure? So if we look at this, let's find Alice whose age is 30. And so we'll say select from users where, and then we have to put our function in here. So this is maybe a drawback because we can't address it as this nice neat name. We have to actually put this in here and say where json_extract $.age equals 30. And if we run that, we see we got Alice again, explain query plan. You'll see we're using that expression, that idx_age expression. And so that's a functional index. This doesn't give you the top-level column that is nicely, neatly addressable. But you might not need that. If your application is already doing this query and working with the results as is, I don't know that you necessarily need the top-level column, and maybe you don't even want it 'cause it's gonna pollute your view. One interesting thing is if we go back to this and we say Alice. Now, remember we put an index on the name column, not on the name expression, right? So we went through this little jump of creating a virtual column and then indexing that virtual column. However, if we look at this, it's not smart enough to figure out... It's not smart enough to figure out that we have that index under the hood. And so if we were to switch this to, if we were to switch this to name equals Alice, then it hops back to using that index. So that's something you have to be careful of. In MySQL, it doesn't matter. It will see through that and know, "Hey, I've got that as an index," and here it looks like it's not doing that. Now, that might change in future versions of SQLite. Like I said, MySQL can see through the generated column, and you can address it as either the function or the generated column's name, and the index will work. SQLite, you have to address it by the generated column's name. As always, make sure you know what your access patterns are. Make sure you're using explain query plan to figure out if SQLite is using the index that you think it should be using. But now you know two ways that you can add an index to a JSON blob. You can extract it to its own virtual generated column and just throw an index on that, or you can put a functional index on the expression itself and skip the generated column.