T-SQL Tuesday – Query Cost

March 9, 2010

In SQL Server, the cost of a particular plans is based largely on I/O, which makes this post a good candidate for this month’s T-SQL Tuesday, hosted by Mike Walsh who, like me, walks a straight path.

In considering I/O – the movement of data generally In and Out of disk and memory – my thoughts come very much to query cost. If I run set statistics io on, I can see the reads and writes, both logical and physical, of the queries I run, and I can use this information to gauge the impact of my query on the system.

I can also see that the Estimated Subtree Cost as indicated in the Execution Plan seems to grow with the number of logical reads. This makes sense, as the number of logical reads required by a query can really make a query costly. A logical read implies that the read might only involve examining a page of data that is already in RAM, but it should always be remembered that if the required page of data is not in RAM already, it will need to be pulled in off disk first, being shown as a physical read.

The standard line with measuring cost in terms of I/O comes down to the impact of a physical read on the system. When a physical read occurs, this involves getting data off the actual disk(s), which I liken to the process of picking a track on an old vinyl record player (the records were vinyl, not the record players). Of course, the disk controller lines up the tracks far quicker than I ever could, but the principle still applies. Compared to shifting bits in RAM, it’s incredibly slow. Much less so with Solid State Disks, but that’s another matter entirely.

But the impact of I/O is certainly not the only factor on query cost. I remember some time ago having a discussion about string splitting with a friend of mine. I had seen some posts discussing the different ways of splitting strings, with the conclusion being that CLR functions are the best, as SQL Server generally isn’t very good at string manipulation, but it led to an interesting discussion.

Take one of the standard approaches to splitting strings, such as using a table of numbers. For the purposes of the experiment, I have a table called dbo.nums(num int primary key), populating to at least 50.

There are obviously other methods that can be used – this is just one. But now compare it to the following:

Here I’m constructing a query using dynamic SQL to split strings, making a query which performs a UNION ALL of all my queries. And when I look at the cost of a batch that runs both these methods (by asking Management Studio to show me the Plans), I find that the second one is incredibly cheap. In fact, if I turn on statistics io, I can see that there is no I/O at all for the second method, whereas the first requires 12 logical reads – presumably to get at my table of numbers.

Clearly my dynamic SQL method of splitting strings is fantastically fast, and this is completely true. It is fantastic – a thing of fantasy. It’s actually really awful.

You see, as I mentioned earlier, SQL Server isn’t particularly good at string manipulation. So much so that creating this piece of dynamic SQL is very costly, won’t scale, and should never be implemented in a real environment. You can have a look at it to verify what I’m saying, but if you try to do this on a large set of data, you will very quickly find out just how poor it is.

To test it, created a table called table_of_strings (string varchar(1000)) and populated it with those four strings. I then doubled its size ten times using:

Using the table of numbers to split the strings involved over 12,000 logical reads, but the elapsed time showed 289 milliseconds to provide the 13312 rows of results.

The dynamic SQL method used a total of just 26 logical reads, but took 34 seconds to run on my (old) machine.

I’m not suggesting you ignore the I/O-based cost of execution plans, but you should most definitely consider more than just that – particularly as Solid State storage becomes more and more commonplace.

This Post Has 6 Comments

  1. AaronBertrand

    Bummer!  I was going to do a "Bad habits to kick : ignoring I/O" post for tomorrow, but my ideas were pretty much identical to yours.  Kudos.

  2. Rob Farley

    You should still write it though.

  3. Armando Prato

    I’d give SET STATISTICS TIME ON a shout out too.  I’ve found my own fair share of instances where a query that appears to run acceptably time-wise and I/O wise may be over utilizing CPU.

  4. AaronBertrand

    Yup, you’re really right.  I abandoned my original idea though, and published a more broad "bad habits" type of I/O post.

  5. Mladen Prajdic

    I’d run profiler to get the correct reads for your second method.
    i’ve seen STATS IO get totally screwed up when doing string manipulations.
    i have a query where the STATS IO show 30k reads while profiler shows 500k reads.

  6. Rob Farley

    Yes, I demonstrated that in my sqlbits.com talk, where a Compute Scalar had lots of hidden reads. stats io showed just 2 reads, when there were many in Profiler.
    But the problem with this query is the intense CPU needed. There are some hidden I/O costs, but that’s a small impact compared to the CPU.

Leave a Reply

LobsterPot Blogs

Blog posts by Rob Farley and other LobsterPot Solutions team members.

Search