PASS Elections: Why I’m leaving the board

It’s PASS Election time, and this year, there are no serving board members on the ballot.

Two years ago, Adam Jorgensen, Denise McInerney and I were elected to the PASS board, serving two-year terms. Since then, Adam and Denise have both been elected as Vice-Presidents of PASS, and will remain on the board in that role. Of the previous VPs, Douglas McDowell is stepping off the board, while Tom LaRock has become President (the current President, Bill Graziano, becoming Immediate-Past President and the current IPP, Rushabh Mehta, stepping off the board).

So instead of the vote being to replace or re-elect Adam, Denise and me, we see Douglas and Rushabh leaving the board.

This leaves the question of me. I could run again this year, but instead have decided to take a break from the board.

Being on the PASS board is a great way to serve the SQL community, and I’m all about serving – it’s a large part of what drives me. In fact, I often think I’m better at serving than pursuing my own vision. Sometimes I think I’d make a great case study for psychologists. Other times, I think I’d they’d just roll their eyes and say “No, we see people like you all the time, you’re a classic <insert term here>.”

I was asked to be on the board just over two years ago, when PASS gathered people from the UK, Nordic and Australian regions to discuss the globalisation of PASS. Greg Low and I were there from Australia, and when JRJ, Raoul Ilyes and I were asked to be on the board, I accepted. All three of us did. Because Raoul was ineligible to become a director (he’d recently taken a job with SolidQ, and PASS has rules about having more than two board members from the same company), we were put in as board advisors. As I was already involved, I ran for election to be a director at the end of the year (two years ago). JRJ was chosen by the board to take over Andy Warren’s position when he left the board mid-term. And so the story of PASS globalisation continued with JRJ and me as directors.

Being a board member from Australia has been tough, but if it was that alone, I’d make it work. I’ve got up plenty of early mornings to be in meetings. But there’s more. I run a business (LobsterPot Solutions) out of Adelaide, which is a tough market. We also have staff in Melbourne and now Canberra (Julie’s moved there) too, and take on clients all around Australia and overseas. That doesn’t mean I travel a lot, but it does take time and emotional investment.

I should travel more than I do. In fact, that’s one of the reasons why I’m leaving the board – my old back injury. It doesn’t normally affect me much, but this year, it has done more than I would’ve liked. I trained for and completed a marathon earlier in the year. I ran because I could. I’d had nine years of not being able to run, and then after working with my physiotherapist to get it back, wanted to see how far I could push it. It was hard, but I ran a marathon. I have a medal to prove it. It was hard though, and it took a lot out of me. Every injury during training was exacerbated by my back condition, and I ended up not being able to fly to the other board meetings this year. I missed the first one because it was the first week of the school year and I chose to be here for my family, but the other two were because my back simply wasn’t up to it. I attended via Skype instead, but it wasn’t the same.

Essentially, I don’t feel that I’ve been able to be as effective this year on the board as what I’d like. I see my role on the board being to stand up for what is right for the community, and being an enabler. My portfolio this year has been SQLSaturday, and the focus I’ve taken has been in enabling Karla and Niko to do what they do more effectively, and to try to make life easier for the SQLSaturday organisers, attendees and sponsors. It’s been a year of change at PASS in many ways though, and with my energy levels being lower this year than I’d’ve liked, I don’t know how well I’ve done that. I have high standards, of course.

I would happily remain on the board as an advisor, or take on a different role within PASS (although Australia already has two Regional Mentors), but I’m also really thrilled that the people that are running for the board this year are such high quality. There are a few PASSion award winners in there, and almost everyone on the list has been involved in growing the SQL community in significant ways. I would love to see Allen get back on the board, and all the others will have a positive effect too. I could happily write recommendations for every one of them! I’m pleased I’m not running against these people, because I feel PASS deserves to have these people on the board more than it deserves to have me continue. I might run again in the future, but am also happy to serve behind the scenes too.

So instead, the PASS membership gets to vote three new people onto the board! Exciting times.

@rob_farley

String length and SARGability

CONVERT_IMPLICIT isn’t the only problem with getting data types wrong. You might have the right type, but what if the length is wrong? This post will look at both getting the type wrong and getting the length wrong too.

Let’s do some testing. We’ll need a table with indexes. I’d normally use one of the AdventureWorks versions for this, but as they have a tendency to use user-defined types (which I’m not actually a fan of), I’m going to create my own. Also, my example needs to leverage a composite index. I’m only putting a single row in the table, because the amount of data isn’t relevant to what I’m going to show. I’m using a Windows collation, which is the default on my machine. I’ll put a note in later to mention why.

I haven’t indexed the vmax and nmax fields, because you can’t use them as index keys. Of course, there’s plenty of argument to having those fields in your actual tables, but if you’re wanting to be able to search on that data, you might want to consider a full-text index. If the searching is always on the start of the string, you could consider another option, but we’ll come to that later.

Let’s look at what happens when we do it right, defining variables using the right type.

image

Notice that there is no Predicate property of the Index Seek, only a Seek Predicate, which has both a Prefix and Start. The range of rows returned by the Seek Predicate has a start-point based on a combination of the Prefix and Start, and an end-point which is the end of the Prefix.

If the wrong types are used, we see that a conversion is needed.

Let’s start by using a varchar(50) parameter, and comparing it to the nvarchar(50) column.

image

We still see no Predicate here, but look at the Prefix. A CONVERT_IMPLICIT is needed because the types don’t match. In case you haven’t heard, this is bad.

But how bad is it? Actually, not very bad at all, because @s is converted into the correct type, and then used in the Seek. You should still avoid it by passing in the correct type, but the cost of converting a parameter to the correct type is not that bad, because it only needs to happen once.

What happens if we do it the other way around, passing in an nvarchar(50) parameter and comparing it to the varchar(50) column.

image

Oh!

Straight away, you’ll notice that there’s a different shape to the execution plan, we’ve lost the Prefix in the Seek Predicate, and we have a Predicate (the residual one) property as well. But we do still have an Index Seek. It hasn’t resorted to using an Index Scan as would’ve been the case if we had used a number.

(Just for completeness, let’s truncate the table – to avoid a conversion error – and use a number for the comparison)

image

Here we get an Index Scan. No Seek Predicate. The index on v50 is as good as useless because we’re comparing the column to a number. Look what’s going on in the Predicate – we’re converting the v50 field into an integer, and seeing if it’s equal to @i. That’s doing it for every row in the index.

Luckily, we now get a warning about this. See the yellow triangle with an exclamation mark in it on the SELECT operator? If I click on that, I see a warning that says:

Type conversion in expression (CONVERT_IMPLICIT(int,[tempdb].[dbo].[StringLength].[v50],0)) may affect "CardinalityEstimate" in query plan choice, Type conversion in expression (CONVERT_IMPLICIT(int,[tempdb].[dbo].[StringLength].[v50],0)=[@i]) may affect "SeekPlan" in query plan choice

It’s actually two warnings. One is a SeekPlan warning, and one is a CardinalityEstimate warning. It’s the SeekPlan one that has caused the Scan, while the CardinalityEstimate problem means the Query Optimizer has little idea about how many rows to expect out of the Scan operator. (And no, there’s no full stop / period at the end of those warnings. Go figure…)

Anyway, that was just an aside, because I figure there are plenty of posts already out there about this CONVERT_IMPLICIT issue leading to a Scan instead of a Seek. Let’s go back to our situation, where we were dealing with nvarchar and varchar strings, and still had a Seek. This one:

image

Clearly this is a different situation to a regular CONVERT_IMPLICIT. It’s less harmful, although there is still a big impact, and it’s potentially much more commonplace, as people don’t tend to care quite as much if they see an Index Seek already in play.

Oh, and this behaviour doesn’t happen with SQL collations. If you have a SQL collation, the sort order between varchar and nvarchar is different, and it has to do a Scan, just like when I used a number.

The thing that’s happening here is the same as when you have a datetime column that you’re casting to a date, or when you’re using a LIKE comparison with fixed start. The Query Optimizer uses a function called GetRangeThroughConvert (in the Compute Scalar – you can see it in the XML), which is then used to create a Dynamic Seek. I’ve presented on this before, such as in the SARGability and Residualiciousness talks at SQLBits in 2010 and the PASS Summit in 2011. Paul White (@sql_kiwi) expanded on the GetRangeThroughConvert function in a post from 2012. The seek can’t guarantee to catch everything though, so it uses a Predicate (the residual one) to make sure the value matches exactly.

So why is this so bad? In my presentations I’ve talked about the GetRangeThroughConvert behaviour as being a good thing. More on that first down in the post.

What if we pass in the correct type, but make it too long or too short?

image

When it’s too long (though we haven’t considered ‘max’ yet), we get the same behaviour as if it were the right length. Interestingly, if you search the XML version of this plan for either 1000 or 50 (except for in the column name), you don’t find it anywhere. It’s as if we passed in the correct value. The same happens if you pass in a string that is too short, but here you need to consider whether you might be wrecking the parameter.

In this situation, my query didn’t return the same results, because @s is only ‘abcde’. But it does this without any kind of warning – you can populate varchar(5) variable with a longer string and it won’t complain at all.

image

But max is done differently.

Let’s see what happens when we pass in a varchar(max) or nvarchar(max) parameter, and compare it to the limited-length string.

image

We’re comparing the varchar column to a varchar parameter, but the parameter is defined as a max field, and we have the GetRangeThroughConvert functionality, like what happened with the nvarchar / varchar scenario. But it’s more complicated again – despite the fact that we have a Range, our Residual Predicate doesn’t include the equality check. That check has actually been pulled further left in the plan, in that new Filter operator you see there.

You see, checking a max type is expensive, and involves memory allocation (that parameter is potentially up to 2GB in size), so the Filter is moved to the left as far as possible (SQL won’t ever do a max comparison in a Seek/Scan operator because of the memory allocation). By calling all the other filters (however the filters are done, Joins, Seeks, Residuals) before applying the max filter, the data that gets pulled into the max filter is now as few rows as possible. The Seek will be close to the correct amount, thanks to the GetRangeThroughConvert part, but that final check does still need to take place. It’s a good thing that the check is pulled left, but you should avoid passing in a max parameter so that this check can be done in the Seek Predicate.

So what about the other way around? What if we have a max column, and a limited-length parameter?

In some ways, this is more trivial because you can’t index a max column. This is one of the things that carried over from the days of text and ntext, although you couldn’t define a variable as text or ntext either, so you were less likely to try.

If you can’t index the column, but still want to be able to perform searches that would be index-like, what can you do (apart from using an Full-Text Index)?

Well, you could use a computed column that matches the length of your parameter. The non-clustered index persists the value, although it’s not persisted in the underlying heap / clustered index.

This is similar to what can be done to tune many queries to get around SARGability problems. But it won’t help here unless we change our query, because our query still uses the max column, and implied conversion makes the shorter one longer, not the other way around. However, if we add an extra predicate to our query, we can achieve what we want:

image

The Filter here is still the max check, but now we at least have a more effective seek on the rest of it, thanks to making a column which contains the first part of the potentially-long string.

So you can see that GetRangeThroughConvert functionality is useful, and way better than the alternative of using a Scan.

But GetRangeThroughConvert is actually bad for your query. Honestly.

The reason why it’s bad is because of what has happened to the second predicate that we’re using, the “id > @i” bit. When the type matched properly, it was part of the Seek Predicate. When GetRangeThroughConvert is used, either through the wrong type or by passing in the right type but with the max length, this second predicate gets relegated to the Residual (see the image below). That means that every row that satisfies the string comparison much be checked against this second predicate. It’s like using the phone book and not appreciating that all the “Farley”s are sorted by first name. Even without a second predicate, there may be problems in the plan because of unnecessary sorts, or the avoidance of a Merge Join, simply because the data cannot leverage the second column in an index (or the CIX key if there are no more columns in the key). If you’re looking for Barry Smith in the phone book, but are passing in ‘Smith’ via a varchar(max) parameter, you’ll be scanning all the Smiths looking for Barry. If you’d passed it in using varchar(100), then you could’ve found all the Barry Smiths with your Seek.

image

It won’t give you a warning, but you’ll find your indexes aren’t being used as effectively as you might like.

@rob_farley

Not-so-dirty SQL hacks

Using a hammer to push in a screw isn’t a good idea, no matter how good the hammer is. We all know that. and yet there are times when we get frustrated at the ‘right tool’ and opt for the one that will work. Unfortunately, there are plenty of examples in the IT space – the topic of which is this month’s T-SQL Tuesday, hosted by Rick Krueger (@DataOgre).

TSQL2sDay150x150

There are many ways to use SQL. Having seen plenty of awful code over the years, and written plenty of it myself, I know that there are lots of ways to achieve the same thing, and there are ways to persuade the Query Optimizer to take one particular approach rather than another. These efforts might seem quite ‘hacky’ at first glance, but if there is good reason for them, then perhaps they belong.

For example, a couple of years ago (and then again a couple of weeks ago) I wrote about using views as a way of passing sets of values to a stored procedure (where the stored procedure is masquerading as an instead-of trigger). At the time, people told me it was quite hacky, and that they’d never use it. However, I’ve heard of other people (including Greg Low, who also blogs at sqlblog.com) using this same method for getting data into Azure using SSIS, because it just works, and it has a number of advantages over other methods.

Many look at string concatenation as another example of this, although some get attached to the wrong solution.

When concatenating strings stored across a single row, there is no issue – simply use the + operator, or a function like CONCAT().

image

It’s when you need to consider data across multiple rows that you have more of a problem.

If there are a known number of rows in play, then performing a Pivot transformation can simplify the problem down to the single-line variety. I don’t mind whether you prefer to use the PIVOT operator or not. Personally, I prefer not to. In this example, I take a simple set of unpivotted data, and do a basic pivot transform, using GROUP BY and CASE. These then form useful values for the CONCAT function.

image

Frustratingly, pivoting requires the values to be known, producing a known set of columns. Dynamic Pivot requires Dynamic SQL, but if this is being done, there’s no problem to provide those values to a CONCAT function – though creating the Dynamic SQL probably requires a degree of concatenation in the first place!

There is a “Quirky Update” method for concatenation that has been touted a bit over the years, but I find myself completely unable to recommend it, based on the prerequisites listed to get reliable results. For me, I want the results of any query I write to be guaranteed, and to have a query which could end up producing the wrong results is simply too dangerous for my liking. I’d rather use a cursor than risk wrong-results, and that’s saying something!

The method I like to use for string concatenation has its roots in a hack of sorts, and to explain why, I’ll first describe the environment from which it has sprung – FOR XML PATH.

There are a few ways to generate XML from SQL. I find FOR XML PATH to be quite straight forward, although it can become overly complex when nesting comes into play (causing me to switch to one of the others). However, for simple examples, XML PATH is very effective.

The results of a simple query like this:

SELECT ProductID, Name, ListPrice
FROM Production.Product
ORDER BY ProductID;

can be turned into XML by simply adding a “FOR XML PATH” option at the end. Providing some additional parameters, we get results like the following:

image

You’ll notice that the column names have appeared within XML tags. The interesting thing here is that we can manipulate this behaviour by changing the name of the columns, with aliases. For example, putting an @ in front of the column name causes the element to be treated as an attribute instead (I remember it as “@ for ATtribute”).

image

Another trick with column names is that if there is none (or it’s [*]), then the element tag disappears. Naturally this is way better than having “<>” appear in the resultant XML.

image

…but this behaviour of making the empty element disappear is part of the appeal.

I’m told that it was around the time of the development of this ‘disappearing element’ bit that the penny dropped and using it for string concatenation became an option.

Consider what happens if we only have the Name column now.

image

If we had no ROOT element, we wouldn’t have legal XML…

image

…but if we used an empty string for the PATH element as well…

image

…now suddenly we have string concatenation (including the support for ordering), without having to resort to something which could fall apart at a moment’s notice.

If we want commas included, that’s easy, and if we want to convert it back into a string (instead of being XML, with XML-style quotes), then surrounding it with brackets (round ones, what Americans call parentheses) and putting .value(‘.’,’varchar(max)’) at the end will do the trick.

image

Here, we’ve taken XML generation and turned it into a string concatenation operation in a very simple way. The SQL Product Group realised this potential quite early on, and this is widely regarded as the best way of doing concatenation from SQL Server 2005 on.

There are lots of other things that I’ve done over the years that might seem quite hacky, but are actually quite legitimate – but string concatenation is probably my favourite.

@rob_farley