A new superpower for SQL query tuners – Number of Rows Read

There’s a technique that I’ve been using for some time now, and been looking for going even further back, which has become immediately available to everyone who can see a query plan (version permitting).

Some years back (ok, it was 2010), I started to present at conferences about SARGability and residual predicates. I had a session at SQLBits VII about it (one of the first times I’d presented with SQL MVPs from at least five different countries in the room), I presented at the 2011 PASS Summit on “The evils of residualiciousness”, and I even wrote a song with the line “my predicate’s residual, my seek just runs too slow”. I wrote blog posts about bad covering indexes, and TSA security probes (or something like that).

The point of all this is that an index seek, merge join, or hash match is not necessarily effective for quickly locating the rows that you care about. It all comes down to the predicates that are involved, and whether they are SARGable for the index you’re trying to use.

Over at my “Covering Schmuvvering” post, I describe an index and query on AdventureWorks like this:


CREATE INDEX rf_ix_Covering ON Production.Product(DaysToManufacture)        
INCLUDE (Name, ProductNumber, Size, ReorderPoint, Color);  

SELECT Name, ProductNumber        
FROM Production.Product         
WHERE DaysToManufacture < 4         
AND ReorderPoint < 100         
AND Color = 'Red';  

The plan gives me an Index Seek that returns a single row. Wohoo!


…but I explain how this is actually really bad, because all the work is being done in the Predicate, not the Seek Predicate.


You see, the “Predicate” is the Residual Predicate (my term – you’ll just see it as “Predicate” here, because it’s the same kind of thing that we see in a Scan, which doesn’t have a concept of the Seek Predicate and the Residual Predicate).

The Residual Predicate is the leftover one, that needs to be checked after the Seek Predicate. You might not have one, if every predicate in your query is handled already by the time the Seek is done. But if you do have one, there is extra checking to do.

So just like how a Scan will start on the first page of the index and keep running until it’s either reached the end or it doesn’t need to keep looking (because the operator on its left has stopped asking for more rows), a Seek will find the rows that satisfy the Seek Predicate, and then have to check each one to see if it satisfies any leftover predicates.

The same applies for Merge Joins, which find things that match using the sort order of each data stream, but then need to apply any residual predicates. Hash Matches can be even worse, as the Probe Residual will include the Probe Keys if the hash function could cause two different values to be assigned to the same bucket. Read more on probe residuals here.

Back to Index Seeks though…

When the Query Optimizer creates a plan that has a residual predicate in a Seek or Scan, one of the earlier iterations will have seen this predicate applied in a Filter operator to the left of the Index operation. But before the plan is created, the residual predicate is pushed down into the index operation.

If we apply trace flag 9130 (undocumented, so be wary), we can see the plan without the pushdown.


And that thick arrow there shows us 407 rows being produced by the Seek operation, despite the single row being produced in our original plan. That original plan did not contain that information.

But with Service Pack 3 of SQL Server 2012, all that changes.

From now on, Index Seeks and Index Scans have an additional property called “Actual Rows Read”, which shows through in SSMS as “Number of Rows Read”.


Please note that you need to have a recent-enough version of SSMS to see this come through. If you are running SSMS 2014, you’ll need to wait for the next service pack. Or if you’re running SSMS 2012 or SSMS 2016, make sure you’ve applied the latest service pack or CTP.

This feature is amazing though! A significant part of the query tuning I do is to look for ineffective Seek operations. Many people look for Scans, but don’t consider that a poor Seek is just as bad. Now, we can easily see that this Seek took 407 rows to produce just 1, and a different indexing strategy could be recommended. It wouldn’t surprise me to see warnings start to come through in 3rd party products like SQL Sentry’s Plan Explorer soon as well, because I think that a Seek with 0.25% effectiveness is worth a warning. And without having to use the trace flag either!

(Edit: They do now! Check out http://blogs.sqlsentry.com/aaronbertrand/sql-sentry-v10-index-analysis )

It’s like you’ve just been given X-ray vision on the Index operators!

Other experts in the space understand the significance of the residual predicate, such as Kendra Little (@Kendra_Little), and now you can see this too.


There are a few more things I would like to see to round this feature off. I’d like to see Estimated Rows Read (which I can see using 9130), and I’d like to see similar information for Merge Joins and Hash Matches. If you want to see these implemented, jump onto Connect and vote for these items: Estimated Rows Read and Merge Join / Hash Match info – and one to fix up the naming convention.

…and in the meantime, make sure your local copy of SSMS is up-to-date, and encourage people to upgrade their SQL 2012 boxes to SP3!


Does the paradigm fit the environment?

This month we see the 73rd T-SQL Tuesday come around – six years were completed last month (the first was in December 2009), and this is the start of year seven. It’s hosted this month by Bradley Ball (@sqlballs), and he asks a question about whether our environments are naughty or nice.

Now, I’m a consultant, and deal with a variety of customers. And I don’t talk about those customers. It’s a thing. I don’t ever want to have a tweet or post where I say “Seriously? I can’t believe my customer has a setup like this!” – because my customers wouldn’t want me to say those things in public, even if I’ve kept the names private.

Something that I see from time to time though, which really affects the ‘niceness’ of an environment is the paradigm that was used to solve the particular problem.

20-something years ago, when I was at university, I did a lot of subjects that haven’t obviously impacted my career. Most of them were interesting and fairly useful (although I haven’t really been able to apply the particular nuances of using matrices to rotate the camera in a 3D-modelling environment), but one that really stands out for me as being particularly useful was a subject on Programming Paradigms. We explored functional programming, logic programming, and a few others. I can’t actually tell you the full list of paradigms we explored – I didn’t even attend most of the classes (I had stuff going on which meant I ended up only scraping through – the lowest scoring semester of my whole degree). But the impact that it had on me was an appreciation that the problems we face today shouldn’t necessarily be approached with the same hat that we wore yesterday.

In the database space, we use a set-based programming paradigm. We apply relational theory to a schema design, and then write queries using set-based logic. This is a useful approach, but it can go too far. When you’re writing queries that you want to perform in particular ways, the focus could be something different. Perhaps you want to create a cursor, looping through each row of a resultset and doing some amount of processing on it. Iterative code, within a set-based environment. It’s a different paradigm, and can turn a nice system into a naughty one, or perhaps even turn a naughty system into a nice one.

Even within the database space, we have different paradigms to apply. I see data warehouses that try to stick to a normalised design like the underlying transactional environment. I see data warehouses that demand a purely star-schema design. I see parallel systems that haven’t considered distribution theory, and parallel systems which have pushed distribution theory to the nth degree. I see indexing strategies which help, and indexing strategies which don’t.

Usually, this comes down to the paradigm being applied. It’s generally not too hard to spot when the wrong paradigm has been used, or when a particular paradigm has been pushed too far, but it’s not always easy to quantify and measure empirically. My perspective is that the people involved should feel like things make sense. When sufficiently educated people (people who ask questions rather than blindly accept what they are told) are comfortable with the design decisions, it’s generally not a problem. When they find themselves trying to figure out what’s going on, and why a particular approach to a query has been taken, then there’s an issue. And I don’t care whether that’s a problem with a T-SQL query, or an MDX query, or a piece of ETL – I simply find that if there are experts in the place who wince a little when describing why something is the way it is, then that’s a sign that things aren’t quite right.

Now, I’ll happily help fight battles to get these things fixed. But as a consultant, I know there are battles worth fighting, and situations worth accepting. And I know that success can be achieved despite things which are less than ideal. But when I think about whether a particular environment is worth getting a lump of coal or a nice elf-created gift, then I often look at the paradigm that was used when the system was designed. Then at least, things will make more sense.

I hope you all have had a terrific 2015. Why not decide to write a few T-SQL Tuesday posts yourself in 2016?