The power of T-SQL’s APPLY operator

I could go on all day about APPLY – it really is an incredible part of T-SQL. It helps solves problems were frustratingly painful to solve otherwise. It empowers the query writer in a way that is matched by few other features. This makes it a worthy topic for this month’s T-SQL Tuesday, hosted by Matt Velic.

TSQL2sDay150x150Yes, I’ve already written a post for T-SQL Tuesday this month, but I’m on a flight for a couple of hours, and I do like the APPLY operator. This post will probably be more like what other people are writing this month, looking at some of the inner workings of the APPLY operator. So if you’re reading this, go and read some of the other posts about APPLY, to have things explained in a slightly different way.

APPLY, in both its inner (CROSS APPLY) and outer (OUTER APPLY) forms, allows a correlated sub-query or table-valued function to be part of the FROM clause. This is different to a regular JOIN, which is between two sets that are independent of each other, but able to be joined according to some predicate that lives in the ON clause. Using APPLY, that restriction disappears, and the predicate can be built into the sub-query (or TVF) itself.

These two queries are exactly the same. They have identical execution plans, and produce the same result.

image

You may look at this and figure that the first option is entirely adequate, and see no reason to switch. There isn’t a reason to switch for most situations, and I don’t actually want you to change the way you write regular joins. But I do want you to consider the power of having a predicate as part of a correlated table-sub-query.

One common scenario is being able to hook easily into the first row in a particular join. It’s easy – you just add TOP (1) to the sub-query, and put an ORDER BY clause in there. I do this happily in BI solutions, where I want to grab a record from a dimension table that is current at the time that an event occurred.

I know, you can do this very easily using a correlated sub-query in the SELECT clause, since the sub-query returns a single value (yes, it’s scalar). But what if we needed to return multiple columns from a sub-query, or needed to return the best three records for a particular category. Here I’m finding the three best-selling products for each subcategory.

This could be done with a Common Table Expression (CTE) and the ROW_NUMBER() functdion, but that solution isn’t nearly as elegant as this. This is the kind of solution that was designed for APPLY. One of those times when a problem and solution are nicely matched, made for each other.

Here’s an example that achieves the same using a CTE:

I’m sure you’ll appreciate that this feels a lot more contrived than using APPLY. The Query Optimizer should really break these two queries down to exactly the same, recognising that joining on the PARTITION BY column(s) is a correlation mechanism. It doesn’t do that yet, but one day, I’m sure it might.

(The CTE query runs quicker than the APPLY equivalent in this particular case on the AdventureWorks database, but it shouldn’t need to. The behaviour largely depends on a number of other factors, in the same way that multiple Seeks can sometimes be faster than a Scan, but not always.)

Let’s think about views for a moment. They’re sometimes referred to as “Virtual Tables” (yes, SQL Books Online, I’m looking at you!)

People who have heard my preaching presenting about writing good T-SQL will have heard me refer to views as “stored sub-queries”. I reject the notion that they’re “virtual tables”. That’s rubbish – they’re treated in by the Query Optimizer as a stored sub-query, and should be referred to as such. The natural extension of this is that an inline table-valued function is a parameterised view, or stored correlated table sub-query. Basically the same as what we see here.

So let’s create it.

Notice I’ve simply changed the correlation hook (s.ProductSubcategoryID) into the parameter. Also note that I’m NOT using BEGIN and END, which would render the function procedural and non-simplifiable. To hear me go on about that in more detail, have a look at the video at http://bit.ly/SimpleRob – a URL I have to thank Farmer Paul for, which will direct you to a presentation I did at SQLBits V, in late 2009. (I figure what Paul meant when he created “Simple Rob” is that I present simple truths that have a profound importance, and not suggesting that I’m ‘simple’ at all. He’s not suggesting that my IQ is low or anything like that. He wouldn’t do that, he’s far too nice a bloke, and my friend. I mean, I know he’s a Kiwi, and Kiwis have a friendly rivalry with Aussies – I think they’re jealous of our weather – but he wouldn’t pick on me like that. Besides, Paul and I are both English originally, he just moved to NZ instead of Australia. Maybe he just enjoys colder weather.)

So now my query can become:

This executes in exactly the same way, but we’ve contained our logic nicely by storing the sub-query in an inline TVF. Notice the join condition – it’s buried inside the function, abstracted away, and yet completely simplifiable by the Query Optimizer. It’s still very much a join – if you were to look at the execution plan, you’d see a Nested Loop (Left Outer Join), with a Seek Predicate being used further down. (I’ve created indexes to help, of course.) You should be able to click on the image to see it in more detail.

image

I really could keep going about APPLY, and talk about a bunch of other uses for it, but I’ll leave that for another time.

What I want to leave you with is the reminder that APPLY is just a more powerful way of doing a JOIN, because the set that you’re joining to doesn’t need to exist yet. You may have heard people talk about APPLY being bad, in the same breath as people will tell you that a correlated sub-query is bad. Unfortunately, there are times when this is true. But there are times when any join is bad, it largely depends on what indexes you have in place, and what kind of simplification the Query Optimizer can do in working out your plan. If you have considered issues such as SARGability and Residual Predicates, then you really shouldn’t have much of a problem.

Of course you understand joining to Tables, and to Views. If you aren’t used to joining to sub-queries, then try expanding a view yourself. Then think about what happens if you use APPLY and sneak the join condition into a predicate within the sub-query. All of a sudden you’ll find yourself able to write richer queries, and have a much stronger T-SQL arsenal.

SELECT and WHERE, but no FROM clause

SQL Books Online is wrong (or at least, on that page).

I say that because it lists the FROM clause without brackets, suggesting that it’s not optional. This post is going to look at a few situations where we leave out the FROM clause, showing that it’s really not something to be put off by.

image

There are other things I don’t like about this page – the lack of a semi-colon, for example, and the fact that it seems to suggest that the order_list should be specified before ASC or DESC (when ASC or DESC would apply to each member of the list).

But the thing that I want to look at is that the FROM clause should be an optional extra. (Also, because one of my examples is using the APPLY operator, it’s going to be part of this month’s T-SQL Tuesday, hosted by Matt Velic.TSQL2sDay150x150 Go look at that page for a bunch of other posts written today about APPLY.)

In its most simple form, the SELECT statement is simply that – a SELECT statement. Like:

Of course, we can fetch data out of a table if we want, but this certainly isn’t required – we may want to perform some simple calculation or something and just see the result. This is often considered a different kind of statement – not a real SELECT statement, but I’d like to argue otherwise.

A scenario we might recognise more freely could be for assigning something to a variable, for example:

A SELECT statement with no FROM clause will return a single row of data (making it a nice option for variables). Rows can be added using UNION ALL (but I consider this the joining of two SELECT statements), or filtered out using a WHERE clause (as in the following example, for which I’m assuming we have a variable called @filterout).

I’m sure you use this pattern already when preparing data for drop-down controls, such as for SSRS parameters*. Here you use a WHERE without a FROM, and it seems perfectly normal.

*Personally, I’d rather use an SSRS Filter to indicate whether that value should be included or not, instead using a query parameter. A filter would mean that the dataset doesn’t rely on the parameter, and can fetch the dataset without waiting for the @AllOptionAvailable value.

Let’s think about the variable assignment situation with a filter.

This is an odd one. If no rows are returned by the statement, no assignment is done. It doesn’t get set to NULL, it simply gets left as it was. It’s equivalent to:

And because this second one is clearer, I’m sure most of you would use that instead.

However, one scenario that I come across relatively often where using SELECT and WHERE without FROM is in sub-queries. A sub-query can access values from the context in which it runs, accessing column data in much in the same way as a normal query can access variables that are in scope. As far as the sub-query is concerned though, those values are constants, and treated that way.

A scalar sub-query within a SELECT clause, the sub-query can return only one row (and a single value in that row). If it doesn’t return that value, NULL is used instead. However, query writers are more likely to use a CASE expression instead of a sub-query that doesn’t have a WHERE clause. The second of these is both more natural, and will also (generally) perform slightly better.

However, not all sub-queries return a single value. This is where the APPLY operator comes in, in its two forms, CROSS APPLY and OUTER APPLY. OUTER APPLY leaves rows in even if they match nothing in the APPLY sub-query, much like an OUTER JOIN, but my examples here will work the same way whichever you use.

I’m sure you’re aware of the usefulness of APPLY when unpivotting. Brad Schulz has a post about it, and if you haven’t read this, I recommend you spend time looking through it.

Essentially, he demonstrates that you can replace the UNPIVOT operator very easily using APPLY:

instead of the trickier to remember:

(I use the ‘o’ sub-query here, because UNPIVOT doesn’t give me control over which columns are returned. I rarely use UNPIVOT myself, having been long-since converted to using APPLY).

However, I quite typically don’t use Brad’s method of VALUES. I prefer the SELECT … UNION ALL SELECT … method. It comes down to the flexibility I have from SELECT statements.

A SELECT statement lets me use WHERE, which means I can apply a lot more control over which rows project in which ways.

For example I might find myself wanting to add an extra row for Orders which shipped more than seven days after the order. I can do this very easily using the UNION ALL method:

Using a WHERE filter, I can easily make sure that this row gets inserted only when required, rather than applying the filter to all those rows that come out of the APPLY operation, but again, I have a WHERE clause without a FROM.

APPLY is definitely very useful, and I love it for unpivotting. Just this week I’ve used this same method for expanding unpivotting some rows, but only the ones that need it. Using APPLY with SELECT gives me the flexibility I need – so long as I’m happy to abandon some of those conventions such as SELECT statements needing FROM.

…and just for the record, this other page lists everything correctly.

A SQL story in 11 words or less

Success Quietly Looms.

Not quite 11 words, I know. I’ll put more at the end. Definitely SQL related though, right? (And it’s even an acrostic!) Tom LaRock started this, and Stacia Misner tagged me. Monday is over now though (heck, it’s well into Tuesday here), so I’m not going to tag anyone else.

I’m not talking about our success (even though I’m in the process of hiring my sixth employee), I’m talking about the impact of having a system that you’ve implemented well, and finding that people actually want to use it. It’s the hope of every project. You don’t want to implement something only to find that people use the old system.

And yet when you implement a system which becomes successful, there are unexpected problems.

Look at Twitter. A few guys sit around and think how useful it would be to send SMS messages to a central location where everyone can read it. It becomes a worldwide phenomenon, and has very serious (and very public) problems scaling. Brilliant idea, but success quietly loomed.

Scalability needs to be a forethought, not a reaction. If you do your job well, you will have underestimated the required workload. It’s a fact of life. Luckily, SQL easily lets you plan things nicely. Systems can be tuned when small, allowing for growth that doesn’t cripple.

The principle also applies on a personal level. If you do well at things, you’ll be given more, and need to work out how to scale. Right now my business is growing, and that’s a good thing. I’ve had to hire extra people to allow for the growth, and I’m trying to find time to get into CRM Online and various other systems. I’m needing to adjust to the challenges of having a company of six (nearly seven) people, having to lean on God more than ever, and (like Aaron) making sure I prioritise the things that are important. I’m not leaving the community – I see that as an important part of running my business – but I do need to make sure that my family doesn’t suffer too much simply because there is more that needs doing than ever before.

As for the other eight words in my short story (and acrostic):

Did Anybody Think About Backups And Saving Everything?