At the PASS Summit 2015, I was giving a presentation about Query Plan Operators, and Kalen Delaney (@sqlqueen) was in the audience. She's kind of a big deal in the SQL world – I still remember the first time I met her. It was 2007 and she came up to me and said "I read your blog". I was a little star-struck, but we've been good friends ever since.
In that presentation, I was explaining Seeks and Scans, as I often seem to, and was reminded about the times I wander round the supermarket holding a list of the things I need to get. Because what I'm doing is essentially a Join between my list and the stock in the supermarket. And the way that I implement that join highlights some important ideas in the database world.
Kalen seemed to like my analogy. So much so that over a year later she casually mentioned it on Twitter.
I figured that it was about time that I explained more about this.
When I'm sent to the supermarket to pick up a lettuce, I know where I'm going. It's in the fruit and vegetables section. I'm good with that. I'll go straight there, pick up the lettuce, and I'm out. I'm not going to wander around – I'm not going wander down the confectionary aisle – I'm just grabbing the lettuce and leaving. This is somewhat like a Seek.
In fact, it's more like a Seek with TOP 1, because there are probably lots of lettuces, and I'm only going to get a single one. That's taking the analogy a little further, but it still works. It's one of the nice things about good analogies, and I totally think this is one of those. If I want to get a lettuce that is a particular quality of lettuce, then I might have to check a few of them before grabbing one (because the supermarket doesn't sort the lettuces into the good ones v the ones that look like they've been there a while), and that's like having to deal with a Residual Predicate. The more fussy I am, the more I might have to look through, and I risk getting no lettuce even if they have some fairly ordinary ones. If I want to specifically get the best lettuce they have (even if it's awful), then I need to do a Top N Sort on all the lettuces. That might be an expensive operation if there are a lot of lettuces.
I mentioned a minute ago that I wasn't going to go down the confectionary aisle. Good thing too, if there's a problem there. I'm sure we can all imagine the times when there's a problem down a particular aisle… analogous to a page corruption in a database, but if I didn't have to go there, then I can still do what I need to without being affected.
What if there's some sort of a crisis going on and I need to buy all I can get of something (I'm not meaning like all the toilet paper – in a crisis, other people might need some too). Like all the Ham & Pineapple Pizzas, because we've been asked to cater for a classroom of kids, and those kids don't understand the world yet. But the supermarket understands the world and only ever stocks like, three of them. I'm totally fine with grabbing all three pizzas and putting them in my shopping basket.
But what if that day they have over-ordered and they have fifty? Suddenly I'm needing more memory – I mean, a bigger basket – and I might need to do something differently. I kinda hope that never happens.
Back to when I have a shopping list, rather than a single item. At this point, I'm wanting to join between everything on my list with the things that match in the supermarket. If it's a short list, it might be best to find one thing, then the next, then the next, and so on. Even if I grab a lettuce and then grab a cabbage, which is right next to the lettuces! If my list is short enough, then that's fine.
When my list is quite long, I'm going to use a different strategy. There comes a time when it's going to be quicker to just walk through the aisles looking for things that are on my list. At first glance that sounds like the "tipping point" with a Seek+Lookup turning into a Scan, but I want to point out that this means we're anticipating having a bunch of rows being pulled into a Nested Loop operator and then doing a Lookup for each one, and that's a Join. Sure, we might decide not to do the join, but I'm looking at the join part for my supermarket analogy.
So if I have a long list I might not want to grab each item individually. Let's think about other options.
One option is to sort the list in my hand into aisle order, which is essentially "section". I know the sort order of the supermarket, so this is fine. I can start with aisle 1, and walk through, keeping my eye out for the things in my list in order. Brilliant. This is a Merge Join. It really is.
And this works pretty well, except that I need to order my shopping list first. That's one of the drawbacks of a Merge Join.
Plus, there are times when I might have picked something up, gone to move to the next section of the supermarket, but then I need to grab something else from that section. So if my sort wasn't down to the point where it's unique in the list, I might need to backtrack, which is really annoying and takes time. Now I'm basically doing a many-to-many join, and a whole ton of efficiency is lost.
Another option is to make sure I can see my whole shopping list, and walk up and down going "Do I need this? Is this on my list?" for every item I come across. At this point I'm doing a Hash Match. It can work, but I need to have that shopping list spread out, and I'm asking myself that question (creating the hash value and doing the probe) about everything.
One nice thing though, is that scenario where I don't know how long the list is because I'm getting text messages as I'm walking in. So I can start spreading out the list, thinking that a Hash Match might work out well, bracing myself for a long walk up and down all the aisles, and then when it turns out the list is short, I can decide to go to each item individually. That's Adaptive, and it's really handy when you don't know how much data you're going to be dealing with.
Shopping in a supermarket is obviously very different to querying a database. But the underlying concepts behind how we pull the right goods from the shelves definitely have some strong similarities, as I hope I've shown here. Analogies can help you learn principles by hanging them on concepts you already know. Maybe next time you go to the supermarket, you'll get a little better at understanding how your queries run.