Dangers of BEGIN and END

December 5, 2009

I’ve presented this material at three conferences recently, so it’s about time I wrote a blog post on it…

As programmers, we love modularisation — even in the SQL space. We make stored procedures, views, and functions to encapsulate our code. This improves maintainability, simplifies the development experience, and is generally useful.

But there’s a time when it’s a bad thing for SQL Server.

There’s an amazing component of SQL Server called the Query Optimizer (I always want to write Optimiser, but I’m assuming it’s a proper noun and putting up with the US spelling). When we write queries in T-SQL, it’s the Query Optimizer that works out how to actually run the query. It works out what indexes can be used to improve performance, what order tables (well, indexes and heaps) should be accessed, how to perform the joins, and so on. I find that a rough appreciation of the power of the Query Optimizer can really help query writers.

For example, the Query Optimizer will translate a correlated sub-query in the SELECT clause into a LEFT OUTER JOIN, so that you don’t have to. It will also work out when joins can be rendered pointless and thereby removed from the plan altogether. If you let these principles help you in your query design, you can see significant benefits. It also helps you write queries that are easier to maintain, as there’s little point in trying to be clever by writing a query in a different way if the Query Optimizer will handle it in the same way as before.

If you use a view in another query, the definition of the view is used in the query as if you had written it with a sub-query. A view is simply that — a stored sub-query. They are sometimes referred to as ‘virtual tables’, but I disagree. They are stored sub-queries. Sure, the analogy falls down when you start considering indexed views, but on the whole, a view should be seen as a stored sub-query. The Query Optimizer takes the view definition, applies it in the second query, simplifies it where possible, and works out the best way of executing it. If you’re only interested in a couple of columns out of the view, the Query Optimizer has an opportunity to take that into consideration.

Stored procedures are different. You can’t use a stored procedure in an outer query. The closest you can get to this is to use OPENROWSET to consume the results of a stored procedure in an outer query, but still the whole procedure runs. After all, it’s a procedure. A set of T-SQL commands, not a set of queries. I see the clue to this as the BEGIN and END that stored procedures generally use. I like stored procedures, but I do get frustrated if they’re returning more information than I need, since I have no way of letting the system know that maybe it doesn’t need to do as much work.

Functions are in between, and come in two varieties. A function can be inline, or it can be procedural. I don’t think you find this differentiation in many places — and normally people talk about this particular drawback as being associated with Scalar Functions as compared to Table-Valued Functions, but the problem is actually one of simplification.

An inline function must be a table-valued function at this point in time. It takes the form:

CREATE FUNCTION dbo.fnFunctionName(<paramlist>) RETURNS TABLE AS
RETURN
( SELECT …. );

It is always this form, with a sub-query enclosed in a RETURN statement. It can return many columns and many rows, but the definition of the table is implied by the SELECT clause. This is essentially a view that can take parameters.

The other form is one that involves BEGIN and END. Scalar functions (unfortunately) require this (but hopefully one day will not).

CREATE FUNCTION dbo.fnFunctionName(<paramlist>) RETURNS int AS
BEGIN
RETURN ( … )
END;

As the RETURN statement is enclosed between a BEGIN and END, it can be preceded by other statements, used in working out what value should be returned.

Table-valued functions can use BEGIN and END, when multiple lines are required to calculate the rows in the table being returned.

CREATE FUNCTION dbo.fnFunctionName(<paramlist>) RETURNS @table TABLE (<fields>) AS
BEGIN
…
RETURN
END;

In this kind of function, the table variable is populated with data, and returned to the outer query when the RETURN command is reached.

But when the Query Optimizer comes across a procedural function, it cannot simplify it out and executes the function in a different context.

The execution plan will report that the cost of running the function is zero. But it’s lying. The way to see the impact of the function is to look in SQL Profiler, where you’ll see potentially many calls to the function, as it needs to work out the result for each different set of parameters it’s passed. The pain can be quite great, and you will never have noticed if you just look at the Execution Plans.

The moral of the story is to make sure that your functions are able to be simplified out by the Query Optimizer. Use inline table-valued functions even in place of scalar functions. You can always hook into them using CROSS/OUTER APPLY in your FROM clause, or even use them in your SELECT clause (not “SELECT Claws” — that would make it related to my company LobsterPot Solutions, and “SELECT Claus” is just a bit Christmassy) using a construct like SELECT (SELECT field FROM dbo.fnMyTVF(someParam)) …

Consider the Query Optimizer your friend. Study Execution Plans well to look at how the Query Optimizer is simplifying your query. And stay away from BEGIN and END if possible.