Become a SQLite expert

In this video, I show how to use window functions in SQL to process data row by row, allowing us to do things like finding the first and last bookmarks for each user without using complex groupings. I explain how you can partition data by user ID and use functions like FIRST_VALUE and LAST_VALUE to get useful insights. By using window functions, I can easily retrieve specific records, such as the top three bookmarks for each user, making data analysis more efficient and powerful.

Window functions are very cool because it allows you to process a row at a time while you're generating the result set which is kinda awesome. When you when you have traditional aggregates, it just sums or groups or counts usually over the entire result set. But with window functions, we can do things on a row by row basis as the results are being built up. Let me show you that. I'm gonna start with select star from bookmarks where user ID is less than 10.

So less than or equal. We only wanna work with the first 10 users because, I don't know, we don't want a million, actually 5,000,000. We don't want 5,000,000 showing up. And I wanna partition by user, and so I don't wanna just put an arbitrary limit on it. So we're gonna be working with these 59 rows.

We haven't done anything window y yet, but hang on. Here we go. Row number over something and then what in the world? As row num. So what did we actually get here?

Well, we got this, ever increasing number that just goes up and up and up. While interesting, not exactly what we're after. So let's imagine we are after the user's first and last bookmarks. We wanna figure out what was their 1st bookmark, what was their last bookmark, and maybe what was the time between or something else like that. What we need to do is in this set of parenthesis, we need to introduce the concept of a partition.

So right now, this result set is just one massive result set 1 through 59. I want to partition by user. So I want the numbers to be bounded by the user and reset when we get to a different user. Easy peasy. Partition, if you spell it right, partition by user ID.

So now if we run that, we see 1 through 16 right there and then we see 1 through oh, not that many. 1 through 6 right there. Because when the user ID changes, a new partition begins and our whole counting starts over. So let us continue. What if we wanted to append as a column the first ID of their bookmark?

So we can do first value and then we're gonna put ID in here. And then we can just go ahead and copy this, but importantly, we're gonna say order order by created at ascending as first value. So if we run this, we'll see that the first value, for user ID number 1 is indeed 3 17288. That is the first row number and that looks like that is the earliest bookmark. And so you can see the first value is appended on every single row here here, which is that's kind of great.

That's not possible using just general or, traditional aggregate functions. Now, if we were to do this, last value, and let's take a look here. If we ran last value, what do you think is going to show up? Something kind of interesting. We see, those are the exact same.

So these are the same, but then when we move down one, it changes and it just keeps changing. And what we're seeing is not necessarily the last value of that partition, we're seeing the last value that SQLite has processed for that partition. So it's not necessarily that it takes the whole partition and looks at it all at once and says the last value's down here. It's process row 1, last value. 2, last value.

3, last value. And so not exactly what we're after, but what we can do is there's no reason these, these partitions have to be the same. We can swap that around to descending and change it to first value. And now within this partition, which is kinda crazy, within this partition it is being sorted 2 different ways. So if we look at this, now we see first value and last value both right there together.

There are too many window functions to cover here. They're all kinda awesome and all super helpful. We'll look at a few more, and we'll look at a potential performance optimization. I will say, I don't know what the performance is gonna be like if you have Windows over 10,000,000 records and partitions that are ordered differently. I don't know, but this is another tool in your tool belt that you can reach for when you encounter some of the more gnarly, hairy problems that you're trying to solve with SQL.

So let's take a look at some more of these. The first thing I want to show you is you can name your windows. So if we call this w user as, we can either so we can do this. We can either put the entire thing in there and then say over w user and we should still get the same first value, last value. But we can also pull this out and just take that as the partition and change the ordering and then we'll be able to use it here as well.

And so now we get the same thing. It's a little bit cleaner. And if these were the same and we were to, extract the order by down here, it would potentially be more performant. I'm not sure exactly what the optimizations SQLite does for named windows, but I know in other databases named windows are a performance optimization so it's worth hoping. Okay.

So if we keep going here, in fact, let's just let's just drop this out as well. So we'll order by w user descending, and we'll create another one down here. And we'll call this w user ascending. You can name you can name as many windows as you want. So we can now have w user ascending and w user descending.

So if we run that we still get the same value which is super cool. Now, let's look at 1 of, 1 or 2 of the others. If we did something like leadidoverwuserascendingasnextbookmark, this is going to give us, for any particular row, this is going to give us the next bookmark that that user makes. So we're here we're here on 321 7288 and we're claiming that the next bookmark is 229-5376 when in fact it is 229-5376. And of course, there is a lag, as last bookmark.

Last is confusing actually. Let's call it previous. So if we look at this now, you'll see for the very first bookmark, the previous bookmark well is null. That makes enough sense. But then we get down into the first values, 321, 72888.

The next one is 593 and the previous one is 321-7288. And when we reach the end of the partition, the next bookmark is null because we're at the end of the partition. And when we go again to the next partition, the previous bookmark is null. Window functions are a great way to do something like, hey, I wanna see the top three somethings per user. And instead of having to do some just hideous SQL, we can just use a window function.

So let me show you how we can do that. If we wanna see the top 3 or rather, let's go with the first three, the first three bookmarks that the user has ever made. We have our we've got our little function here, or our query here. We can Oh, do I want to? Yeah.

Let's get rid of that and we'll just do partition by user ID, so we don't need these named windows. So we've got this right here. Now, what we can do is we can say with, ranked, we'll call them ranked bookmarks as, so we're gonna open up a CTE, and then we're gonna say select star from ranked bookmarks where row num is less than or equal to 3. And just because we can let's indent that a little bit. And if we run that, now we have the first three bookmarks the first three bookmarks from every user, which is is kinda cool to me.

We can do the last 3 bookmarks. We could do In fact, we could, something that we could do here is so if we run this again, we've got all of this. And let's just say we want the first value and the last value. We want the first value and the last value for each user. We want to know what their first bookmark is and what their last bookmark is.

How can we get there using a window function? Well, the last value, first value, last value is set on every, it's set on every row. So we don't need previous, we don't need previous or next. First and last are set on every row. So we could we could just wrap all this up.

We could wrap all this up and say with ranked again, with ranked as this. Then we want to select user ID, d, first value, last value, where row number equals 1. Whoops, Select, I gotta do a from, don't I? Last value from ranked where row number equals 1. So now we've selected the 1st row in every partition, but we know that the 1st row contains the first value and the last value.

And so now we're able to calculate this result set pretty easily using window functions. This can be a great way to map a user journey through, let's say like an analytics pipeline. You sort by created at and then you put a row number and you can see every step that that person took in, through your website or something like that. Window functions are totally underused. They're underused.

I don't know that many of the ORMs support them, which is a pity. It's too bad. But window functions are incredibly powerful. So anytime you need to, have some sort of processing on the result set as it goes and you're like, man, I could do this with a horrible group by or you couldn't even do it with a group by in most cases, remember that window functions exist and remember that there are tons and tons of functions that operate within those windows, which makes this really powerful.

Full Course

SQLite for Production

Summary

Video Transcript