Not-so-dirty SQL hacks

September 10, 2013

Using a hammer to push in a screw isn’t a good idea, no matter how good the hammer is. We all know that. and yet there are times when we get frustrated at the ‘right tool’ and opt for the one that will work. Unfortunately, there are plenty of examples in the IT space – the topic of which is this month’s T-SQL Tuesday, hosted by Rick Krueger (@DataOgre).

There are many ways to use SQL. Having seen plenty of awful code over the years, and written plenty of it myself, I know that there are lots of ways to achieve the same thing, and there are ways to persuade the Query Optimizer to take one particular approach rather than another. These efforts might seem quite ‘hacky’ at first glance, but if there is good reason for them, then perhaps they belong.

For example, a couple of years ago (and then again a couple of weeks ago) I wrote about using views as a way of passing sets of values to a stored procedure (where the stored procedure is masquerading as an instead-of trigger). At the time, people told me it was quite hacky, and that they’d never use it. However, I’ve heard of other people (including Greg Low, who also blogs at sqlblog.com) using this same method for getting data into Azure using SSIS, because it just works, and it has a number of advantages over other methods.

Many look at string concatenation as another example of this, although some get attached to the wrong solution.

When concatenating strings stored across a single row, there is no issue – simply use the + operator, or a function like CONCAT().

It’s when you need to consider data across multiple rows that you have more of a problem.

If there are a known number of rows in play, then performing a Pivot transformation can simplify the problem down to the single-line variety. I don’t mind whether you prefer to use the PIVOT operator or not. Personally, I prefer not to. In this example, I take a simple set of unpivotted data, and do a basic pivot transform, using GROUP BY and CASE. These then form useful values for the CONCAT function.

Frustratingly, pivoting requires the values to be known, producing a known set of columns. Dynamic Pivot requires Dynamic SQL, but if this is being done, there’s no problem to provide those values to a CONCAT function – though creating the Dynamic SQL probably requires a degree of concatenation in the first place!

There is a “Quirky Update” method for concatenation that has been touted a bit over the years, but I find myself completely unable to recommend it, based on the prerequisites listed to get reliable results. For me, I want the results of any query I write to be guaranteed, and to have a query which could end up producing the wrong results is simply too dangerous for my liking. I’d rather use a cursor than risk wrong-results, and that’s saying something!

The method I like to use for string concatenation has its roots in a hack of sorts, and to explain why, I’ll first describe the environment from which it has sprung – FOR XML PATH.

There are a few ways to generate XML from SQL. I find FOR XML PATH to be quite straight forward, although it can become overly complex when nesting comes into play (causing me to switch to one of the others). However, for simple examples, XML PATH is very effective.

The results of a simple query like this:

SELECT ProductID, Name, ListPrice
FROM Production.Product
ORDER BY ProductID;

can be turned into XML by simply adding a “FOR XML PATH” option at the end. Providing some additional parameters, we get results like the following:

You’ll notice that the column names have appeared within XML tags. The interesting thing here is that we can manipulate this behaviour by changing the name of the columns, with aliases. For example, putting an @ in front of the column name causes the element to be treated as an attribute instead (I remember it as “@ for ATtribute”).

Another trick with column names is that if there is none (or it’s [*]), then the element tag disappears. Naturally this is way better than having “<>” appear in the resultant XML.

…but this behaviour of making the empty element disappear is part of the appeal.

I’m told that it was around the time of the development of this ‘disappearing element’ bit that the penny dropped and using it for string concatenation became an option.

Consider what happens if we only have the Name column now.

If we had no ROOT element, we wouldn’t have legal XML…

…but if we used an empty string for the PATH element as well…

…now suddenly we have string concatenation (including the support for ordering), without having to resort to something which could fall apart at a moment’s notice.

If we want commas included, that’s easy, and if we want to convert it back into a string (instead of being XML, with XML-style quotes), then surrounding it with brackets (round ones, what Americans call parentheses) and putting .value(‘.’,’varchar(max)’) at the end will do the trick.

Here, we’ve taken XML generation and turned it into a string concatenation operation in a very simple way. The SQL Product Group realised this potential quite early on, and this is widely regarded as the best way of doing concatenation from SQL Server 2005 on.

There are lots of other things that I’ve done over the years that might seem quite hacky, but are actually quite legitimate – but string concatenation is probably my favourite.

@rob_farley

This Post Has 7 Comments

JDS September 13, 2013 Reply

I have always done something like this below for quick string concatenation. Do you prefer the XML based approach upon performance or some other criteria?
IF EXISTS ( SELECT *
FROM tempdb.dbo.sysobjects
WHERE ID = OBJECT_ID(N’tempdb..#source’) )
BEGIN
DROP TABLE #source
END
CREATE TABLE #source (
col1 CHAR(1)
)
INSERT INTO #source
(col1)
VALUES (‘a’)
, (‘b’)
, (‘c’)
, (‘d’)
, (‘e’)
, (‘f’)
, (‘g’)
/*******************
String Concatenation
********************/
DECLARE @string VARCHAR(MAX)
SELECT @string = ISNULL(@string,”) + ‘,’ + col1
FROM #source s
SELECT @string
Rob Farley September 13, 2013 Reply

Based on the fact that it’s not supported.
Based on the fact that it can be affected by non-clustered indexes, parallelism, and other good things, giving ‘incorrect’ results.
Based on the fact that it’s procedural and has no opportunity to be handled by the QO in a better way.
Based on the fact that you can’t order the data how you like without risking the results.
If you must use code that moves through a result set in order like that, you should use a cursor. And that’s saying something!!
merrillaldrich September 13, 2013 Reply

I’m crazy and self-destructive, I guess. I know the performance advantages. I get the reasoning.
AND … still I cringe when I see this XML concatenation trick. It’s my childish idealism.
Still, great explanation of how to do it :-).
Rob Farley September 13, 2013 Reply

How do you handle it then, Merrill?
Josías Hernández September 17, 2013 Reply

It is just AWESOME !!! I really enjoy it !!! Very smart the solution!!!
Thanks for the trick Rob.
Allan S. Hansen September 17, 2013 Reply

I’ve used XML for string concat in SQL Server for a couple of years now, and it’s always felt ‘hacky’ – but – yes, it really does the trick.
I rarely – however – put it into production code other than data migration or imports which aren’t that time sensitive.
In other situations, I’d properly rather handle it with a combination of database/code logic as string concatenation often is connection to a business logic problem needing to be solved, more than an actual data problem.
merrillaldrich September 18, 2013 Reply

Rob – I don’t have to code much. We buy everything. If I had to code this I would probably cringe and write it as you show here. It’s become sort of a best practice, but it’s still not pretty :-).