Become a SQLite expert

In this video, I explain how to use unions in SQL to combine two result sets into one, making the result set longer rather than wider. I differentiate between "union" and "union all," noting that "union" removes duplicates while "union all" keeps them, which is faster if duplicates are not a concern. I use an example of combining active and archived user tables to show how unions can help manage and query data from multiple sources as if they are a single table.

We've talked about joins where you take data from 2 different tables and you put them together side by side. We've talked about sub queries where you have data in 1 table and you can filter it based on data in another table or you can join it in based on a subquery that is derived from another table. Now we're gonna talk about unions which is where you can take 2 result sets, whether it's from tables or not, and put them over under into the same result set. So instead of making the result set wider, you're making it longer. Let's take a look.

The most basic union looks like this. Select, we'll just do select 1, union, select 2. And if we run that, great. Not super interesting, but it is kind of result set here. And we're saying, great.

Take both of those result sets and put them together. Remember when I said that there is a or by union optimization? There's a multi index optimization. This is the union. So, that or by union optimization that I talked about was when SQLite scans 2 indexes, comes up with a set of row IDs, and then unions them and then takes it back to the table.

This is a union. Now, a few interesting things about unions. So if we were to do a bunch of twos, you see, that's interesting. We only got 1 2 back even though we have, you know, what is that, 4 different twos in there. And that is because by default, it's going to eliminate duplicates.

So if we were to do union all, if we're to do union all here, then we get all of our rows back. If you, if you don't need the duplicate weed out, you must put union all. Even if you know that there are no duplicates, especially if you know that there are no duplicates, put union all because it's it's a lot faster. It doesn't have to do the duplicate weed out. And when your rows are really long, it's doing it by the entire the entire row to see if there's a duplicate in there.

So if you know based on application logic or your powerful brain that there are no duplicates or if you don't care about duplicates, make sure you set it to union all. Okay. Let's keep going. You'll notice here that the column took on the name of the first value. So we can say, we can just give it a column alias and that will set it for the whole table.

You'll also notice that, we've just selected single numbers. So if we did 1 as n and a as, we'll just call it character, that is the problem. They do not have the same number of columns. So every query that you run must have the same number of columns, but that doesn't prevent you from doing something silly like this, where you accidentally switch which column is. So you have to be super careful that you're always selecting stuff within the right column order.

SQLite will prevent you from selecting the wrong number of columns because it doesn't know how to line up the result set if you if it's not the same number of columns. But you have to take great care to make sure that you're selecting them in the right order. So if you do it in the right order, you get a nice clean result set here. Let's take a look at pulling from the users table and I'm gonna show you an example where the deleted users get shuffled off into an archive table, but we might wanna search across both the users table and the users archive table. So, to create the users archive table, what we're gonna do is we're gonna create table user archive as select star from users where deleted at is not null.

And so if we run this we should get a user's archive table. So this is our users archive table and now we can go in here and say delete from users where deleted at is not null. So now in our in our users table, we should have only active people. And in our archive table we should have only deleted people. This is a totally viable strategy.

This is not 1 that I do very often but you can imagine you wanna shuffle people off into an archive table to keep your hot working set of data as small as possible. But you need to keep these people around in case, I don't know, I don't know what GDPR is, but maybe this is GDPR, maybe this is anti GDPR. But whatever, you want to keep it around. So then in this case, you you suddenly you want to include a search deleted users in your UI, and you're like, shoot. That's on a different table now.

Fear not. Select star from users where email equals aaron dot Francis at example dot com, union all, select star from users archive. So there is your table. Looks like 1 came from 1 came from the archive table. 1 came from the regular table.

And then what you can do is you can just do select, you know star from all of that. Select star from that result set as temp where first name equals Aaron. And so you can what did I do wrong here? I forgot the from. Select star from that where first name equals Aaron.

So now this is as if you've put together the users table again. And is this the most performant thing in the world? Probably not. Probably keeping it all in the users table would be more performant for this particular query But most of the time I would guess in your application you're not searching trashed users or deleted users. This is hopefully more of a 1 off thing where you can reconstruct it if necessary.

So this is a good example of how you can take 2 different tables and pretend that they are 1 table by putting them 1 over the other and then using that result set to further filter, refine, or just return to return to the end user. So unions can be very powerful. They're gonna be even more powerful when we start looking at recursive CTEs. So we can use unions to build up tables recursively over and over and over again and do some pretty cool stuff with that which we'll look at in future videos.

Full Course

SQLite for Production

Summary

Video Transcript