The power of T-SQL’s APPLY operator

April 13, 2011

I could go on all day about APPLY – it really is an incredible part of T-SQL. It helps solves problems were frustratingly painful to solve otherwise. It empowers the query writer in a way that is matched by few other features. This makes it a worthy topic for this month’s T-SQL Tuesday, hosted by Matt Velic.

Yes, I’ve already written a post for T-SQL Tuesday this month, but I’m on a flight for a couple of hours, and I do like the APPLY operator. This post will probably be more like what other people are writing this month, looking at some of the inner workings of the APPLY operator. So if you’re reading this, go and read some of the other posts about APPLY, to have things explained in a slightly different way.

APPLY, in both its inner (CROSS APPLY) and outer (OUTER APPLY) forms, allows a correlated sub-query or table-valued function to be part of the FROM clause. This is different to a regular JOIN, which is between two sets that are independent of each other, but able to be joined according to some predicate that lives in the ON clause. Using APPLY, that restriction disappears, and the predicate can be built into the sub-query (or TVF) itself.

These two queries are exactly the same. They have identical execution plans, and produce the same result.

SELECT p.Name, s.Name, s.ProductCategoryID       
FROM Production.Product AS p        
JOIN Production.ProductSubcategory AS s        
ON s.ProductSubcategoryID = p.ProductSubcategoryID;    
SELECT p.Name, s.Name, s.ProductCategoryID       
FROM Production.Product AS p        
CROSS APPLY (        
SELECT subcat.Name, subcat.ProductCategoryID         
FROM Production.ProductSubcategory AS subcat        
WHERE subcat.ProductSubcategoryID = p.ProductSubcategoryID) AS s;

SELECT p.Name, s.Name, s.ProductCategoryID

FROM Production.Product AS p

JOIN Production.ProductSubcategory AS s

ON s.ProductSubcategoryID = p.ProductSubcategoryID;

SELECT p.Name, s.Name, s.ProductCategoryID

FROM Production.Product AS p

CROSS APPLY (

SELECT subcat.Name, subcat.ProductCategoryID

FROM Production.ProductSubcategory AS subcat

WHERE subcat.ProductSubcategoryID = p.ProductSubcategoryID) AS s;

You may look at this and figure that the first option is entirely adequate, and see no reason to switch. There isn’t a reason to switch for most situations, and I don’t actually want you to change the way you write regular joins. But I do want you to consider the power of having a predicate as part of a correlated table-sub-query.

One common scenario is being able to hook easily into the first row in a particular join. It’s easy – you just add TOP (1) to the sub-query, and put an ORDER BY clause in there. I do this happily in BI solutions, where I want to grab a record from a dimension table that is current at the time that an event occurred.

OUTER APPLY (       
SELECT TOP (1) d.DimKey        
FROM dbo.DimensionTable AS d        
WHERE d.BusinessKey = e.DimensionCode        
AND d.StartDate < e.EventDate        
ORDER BY d.StartDate DESC) AS d

OUTER APPLY (

SELECT TOP (1) d.DimKey

FROM dbo.DimensionTable AS d

WHERE d.BusinessKey = e.DimensionCode

AND d.StartDate < e.EventDate

ORDER BY d.StartDate DESC) AS d

I know, you can do this very easily using a correlated sub-query in the SELECT clause, since the sub-query returns a single value (yes, it’s scalar). But what if we needed to return multiple columns from a sub-query, or needed to return the best three records for a particular category. Here I’m finding the three best-selling products for each subcategory.

SELECT s.Name, TopProds.*       
FROM Production.ProductSubcategory AS s        
OUTER APPLY (        
SELECT TOP (3) p.Name, SUM(o.OrderQty) AS NumOrdered        
FROM Production.Product AS p        
JOIN Sales.SalesOrderDetail AS o        
ON o.ProductID = p.ProductID        
WHERE p.ProductSubcategoryID = s.ProductSubcategoryID        
GROUP BY p.ProductID, p.Name        
ORDER BY SUM(o.OrderQty) DESC) as TopProds        
;

SELECT s.Name, TopProds.*

FROM Production.ProductSubcategory AS s

OUTER APPLY (

SELECT TOP (3) p.Name, SUM(o.OrderQty) AS NumOrdered

FROM Production.Product AS p

JOIN Sales.SalesOrderDetail AS o

ON o.ProductID = p.ProductID

WHERE p.ProductSubcategoryID = s.ProductSubcategoryID

GROUP BY p.ProductID, p.Name

ORDER BY SUM(o.OrderQty) DESC) as TopProds

;

This could be done with a Common Table Expression (CTE) and the ROW_NUMBER() functdion, but that solution isn’t nearly as elegant as this. This is the kind of solution that was designed for APPLY. One of those times when a problem and solution are nicely matched, made for each other.

Here’s an example that achieves the same using a CTE:

WITH OrderedProds AS       
(        
SELECT p.ProductSubcategoryID,  
p.Name,  
SUM(o.OrderQty) AS NumOrdered,  
ROW_NUMBER() OVER (PARTITION BY p.ProductSubcategoryID         
ORDER BY SUM(o.OrderQty) DESC) AS ProdOrder       
FROM Production.Product AS p        
JOIN Sales.SalesOrderDetail AS o        
ON o.ProductID = p.ProductID        
GROUP BY p.ProductID, p.Name, p.ProductSubcategoryID        
)        
SELECT s.Name, o.Name, o.NumOrdered        
FROM Production.ProductSubcategory s        
LEFT JOIN OrderedProds o        
ON o.ProductSubcategoryID = s.ProductSubcategoryID        
WHERE o.ProdOrder <= 3        
;

WITH OrderedProds AS

(

SELECT p.ProductSubcategoryID,

p.Name,

SUM(o.OrderQty) AS NumOrdered,

ROW_NUMBER() OVER (PARTITION BY p.ProductSubcategoryID

ORDER BY SUM(o.OrderQty) DESC) AS ProdOrder

FROM Production.Product AS p

JOIN Sales.SalesOrderDetail AS o

ON o.ProductID = p.ProductID

GROUP BY p.ProductID, p.Name, p.ProductSubcategoryID

)

SELECT s.Name, o.Name, o.NumOrdered

FROM Production.ProductSubcategory s

LEFT JOIN OrderedProds o

ON o.ProductSubcategoryID = s.ProductSubcategoryID

WHERE o.ProdOrder <= 3

;

I’m sure you’ll appreciate that this feels a lot more contrived than using APPLY. The Query Optimizer should really break these two queries down to exactly the same, recognising that joining on the PARTITION BY column(s) is a correlation mechanism. It doesn’t do that yet, but one day, I’m sure it might.

(The CTE query runs quicker than the APPLY equivalent in this particular case on the AdventureWorks database, but it shouldn’t need to. The behaviour largely depends on a number of other factors, in the same way that multiple Seeks can sometimes be faster than a Scan, but not always.)

Let’s think about views for a moment. They’re sometimes referred to as “Virtual Tables” (yes, SQL Books Online, I’m looking at you!)

People who have heard my ~~preaching~~ presenting about writing good T-SQL will have heard me refer to views as “stored sub-queries”. I reject the notion that they’re “virtual tables”. That’s rubbish – they’re treated in by the Query Optimizer as a stored sub-query, and should be referred to as such. The natural extension of this is that an inline table-valued function is a parameterised view, or stored correlated table sub-query. Basically the same as what we see here.

So let’s create it.

CREATE FUNCTION dbo.Top3ProductsBySubcategory(@subcat int) RETURNS TABLE       
AS RETURN(        
SELECT TOP (3) p.Name, SUM(o.OrderQty) AS NumOrdered        
FROM Production.Product AS p        
JOIN Sales.SalesOrderDetail AS o        
ON o.ProductID = p.ProductID        
WHERE p.ProductSubcategoryID = @subcat        
GROUP BY p.ProductID, p.Name        
ORDER BY SUM(o.OrderQty) DESC        
);

CREATE FUNCTION dbo.Top3ProductsBySubcategory(@subcat int) RETURNS TABLE

AS RETURN(

SELECT TOP (3) p.Name, SUM(o.OrderQty) AS NumOrdered

FROM Production.Product AS p

JOIN Sales.SalesOrderDetail AS o

ON o.ProductID = p.ProductID

WHERE p.ProductSubcategoryID = @subcat

GROUP BY p.ProductID, p.Name

ORDER BY SUM(o.OrderQty) DESC

);

Notice I’ve simply changed the correlation hook (s.ProductSubcategoryID) into the parameter. Also note that I’m NOT using BEGIN and END, which would render the function procedural and non-simplifiable. To hear me go on about that in more detail, have a look at the video at http://bit.ly/SimpleRob – a URL I have to thank Farmer Paul for, which will direct you to a presentation I did at SQLBits V, in late 2009. (I figure what Paul meant when he created “Simple Rob” is that I present simple truths that have a profound importance, and not suggesting that I’m ‘simple’ at all. He’s not suggesting that my IQ is low or anything like that. He wouldn’t do that, he’s far too nice a bloke, and my friend. I mean, I know he’s a Kiwi, and Kiwis have a friendly rivalry with Aussies – I think they’re jealous of our weather – but he wouldn’t pick on me like that. Besides, Paul and I are both English originally, he just moved to NZ instead of Australia. Maybe he just enjoys colder weather.)

So now my query can become:

SELECT s.Name, TopProds.*       
FROM Production.ProductSubcategory AS s        
OUTER APPLY dbo.Top3ProductsBySubcategory(s.ProductSubcategoryID) AS TopProds        
;

SELECT s.Name, TopProds.*

FROM Production.ProductSubcategory AS s

OUTER APPLY dbo.Top3ProductsBySubcategory(s.ProductSubcategoryID) AS TopProds

;

This executes in exactly the same way, but we’ve contained our logic nicely by storing the sub-query in an inline TVF. Notice the join condition – it’s buried inside the function, abstracted away, and yet completely simplifiable by the Query Optimizer. It’s still very much a join – if you were to look at the execution plan, you’d see a Nested Loop (Left Outer Join), with a Seek Predicate being used further down. (I’ve created indexes to help, of course.) You should be able to click on the image to see it in more detail.

I really could keep going about APPLY, and talk about a bunch of other uses for it, but I’ll leave that for another time.

What I want to leave you with is the reminder that APPLY is just a more powerful way of doing a JOIN, because the set that you’re joining to doesn’t need to exist yet. You may have heard people talk about APPLY being bad, in the same breath as people will tell you that a correlated sub-query is bad. Unfortunately, there are times when this is true. But there are times when any join is bad, it largely depends on what indexes you have in place, and what kind of simplification the Query Optimizer can do in working out your plan. If you have considered issues such as SARGability and Residual Predicates, then you really shouldn’t have much of a problem.

Of course you understand joining to Tables, and to Views. If you aren’t used to joining to sub-queries, then try expanding a view yourself. Then think about what happens if you use APPLY and sneak the join condition into a predicate within the sub-query. All of a sudden you’ll find yourself able to write richer queries, and have a much stronger T-SQL arsenal.

This Post Has 25 Comments

Paul Bell April 14, 2011 Reply

Excellent article, helped me simplify an overly complex view.
Julie Smith April 15, 2011 Reply

I love my t-sql advice peppered with humor 😉
Pankush April 15, 2011 Reply

Great article, i am already looking at improving a lot of my complex queries. thanks Rob
Matt Velic April 19, 2011 Reply

Rob, seriously, this is awesome stuff. Thanks for the two-fer!
jamiet July 20, 2011 Reply

"one day, I’m sure it might."
Talk about hedging your bets 🙂
Rob Farley July 20, 2011 Reply

Well, I want to feel like I can have a faint glimmer of semi-justified hope. 😉
Abraham August 1, 2012 Reply

Its cool, could you try to explain more about CTE’s later. I’d love to read that.
Dan November 22, 2012 Reply

Thanks – helpful article. Also very interesting about the BEGIN and END of a function hurting the optimization. I hadn’t heard of that before
bran December 7, 2012 Reply

I used apply but some how it is taking too long. please help
Select shipment_CaseId, shipment_code, P.SiteId, P.[Temp Container Removal]
into #shipment1
From RoseA.dbo.shipment1
outer apply dbo.fn_ShRo(shipment1.shipment_CaseId) [P]
where [P].ClientAbbreviation = (Select top 1 abbreviation from client where client = @iClient)
and [P].Datebil = @dDatebill
Rob Farley December 7, 2012 Reply

Bran: that’s really going to depend on what dbo.fn_ShRo is. Also though, if the client column in the client table isn’t unique, which abbreviation are you getting? The plan would tell me a lot here, of course.
Chandu March 16, 2013 Reply

Great Article Rob
Wayne April 26, 2013 Reply

Rob
i have a 5 table join where the driving table has foreign keys to these child tables. using a join the section of code is 66% of the total procedure cost. when i change all the joins to cross apply with correlated subqueries, the cost goes down to 7%. i have searched the net for an explanation for this and can’t find much. i feel there are probably hidden dangers which is why i am reluctant to implement this into a production environment before i know about them. your thoughts on this?
Rob Farley April 26, 2013 Reply

Obviously it’s impossible for me to reassure you completely without seeing the queries. However, I suspect you have nothing to worry about. Joins can be performed a number of different ways, and although CROSS APPLY will tend to persuade the database engine to prefer one in particular, if the logic of the query hasn’t changed, your results should remain correct.
Dave March 10, 2014 Reply

Bitly link not working 🙁
I was looking forward to the dulcet tones
Rob Farley March 10, 2014 Reply

Odd. Works for me, sends me to http://sqlbits.com/Sessions/Event5/Designing_for_simplification (which is just harder to remember).
Dave March 10, 2014 Reply

…and it works fine doh. Poor internet today
Andrew Watson June 6, 2014 Reply

One minor comment.. While top(3) works… I would humbly suggest that it won’t cope very well if the Business requirement, is "return the top 3 performers, by sales amount"… if the amounts are ever tied….
eg:if the product sums are:
One:1000
Two: 800
Three: 500
Four: 500
Would the business want to return the top 4 in this case (as there is a tie for third place), or have a random selection of Three/Four?.
If they would want to see both, then the With query with a RANK (rather than Row_number) function is the more appropriate choice…. definitely more adjustable to business rules (rank = top performers by amount, dense_rank = top amounts and products with those amounts, row_number = top three by amount [and some other ordering criteria])… don’t fight it. 🙂
Rob Farley June 6, 2014 Reply

You can always use TOP (3) WITH TIES…
Valeriy Khvalabov June 14, 2014 Reply

Thank you Rob,
A great article!
You made the explanation simple and straight-forward.
Jean Joseph October 16, 2014 Reply

Cool article
Johnny October 3, 2015 Reply

Great article.
Jodie June 29, 2016 Reply

I know I’m a few years late, but just wanted to say thanks for such a well-explained article and the highly useful examples. I will definitely keep Apply in mind for future SQL conundrums! Thank you 🙂
Chris February 6, 2018 Reply

I’m intrigued by the aside about using begin/end rendering the sql “non-simplifiable.” Unfortunately, while the shortcut URL http://bit.ly/SimpleRob correctly links to http://sqlbits.com/Sessions/Event5/Designing_for_simplification , the presentation on the destination page is not functioning, nor is the link to the presentation wmv file (the links to ppt and demo sql work fine, though).

Specifically, I’m not seeing why APPLY with a table function is simpler than – or preferable to – an inline view. Most correlated subqueries I’ve come across can be converted to a join to an inline view in the main body of the procedure without using a correlation in the where clause. This is likely covered in the broken presentation link.
Rob Farley March 2, 2018 Reply

Hi Chris,

I don’t have much control over the SQLBits site. I can try to ask them maybe…

But I’m not sure I do cover the idea of using a view instead of APPLY. Both are useful in their way, but APPLY is certainly very useful for the fact that it provides correlation in a much easier way.
Rob Farley March 2, 2018 Reply

Chris – try grabbing it from https://1drv.ms/v/s!AoK2mL8SvaMytzvfE5N1LOrdIJgj