Turning a Scan into a Seek With A Pointless Predicate

For those people who seek seeks, here’s a “Pro Tip”.

Suppose you have a query that involves a Scan. It can be a Clustered Index Scan or an Index Scan, it really doesn’t matter. If it’s a Table Scan, then you have a Heap in play, and this tip doesn’t apply.

I’m looking for the address in AdventureWorks that’s on Pitt St. I know it’s a tall building, and that the street is in AddressLine2.

This query scans an index called IX_Address_AddressL…something (I’m reading it from the screen, and it’s a little cut-off). Here’s the plan, and I’ve included part of the tooltip so that you can see the scan finds a single row.

Now, I’ve heard that Scans are bad, and Seeks are better. I’m sure I can make this query seek, without even adding a new index!

First I run a simple query to look up the first key column in this index.

This is handy – it tells me it’s AddressLine1, and that’s a non-nullable column of type nvarchar(120).

Now, every value in this column must be at least alphabetically >= than the empty string, so adding the predicate WHERE AddressLine1 >= N” shouldn’t affect the results (it could if it allowed NULLs though, so be careful).

And hey presto, I’ve turned the Scan into a Seek! Success! Clearly my performance has been enhanced.

Of course it hasn’t.

In fact, it’s probably marginally worse, because not only am I having to check every row to see if it’s in Pitt Street, but I have to work out where to start. I’m still starting at the beginning of the index, and going through every record, but because we’re performing a search on the key column of the index, the Query Optimizer is actually doing a Seek. It’s just a Seek which is having to scan every row of the table, which is what we generally consider to be a Scan, not a Seek.

If you’ve read some of my recent posts, you will realise that this is because of the Residual Predicate in play. Both tooltips are shown below for your reading pleasure, and you’ll notice that they’re very similar. They have the same Residual Predicate which is doing all the work, it’s just that the Seek is starting off with a Seek Predicate.

So you see, this Seek is never going to perform better than the Scan. It’s all well and good to recognise that effective usage of indexes involves Seeks, but a Seek isn’t ideal just because it’s a Seek, it has to be Seeking on something effective, highlighted by a selective Seek Predicate.

…and of course, we should’ve done some far more creative indexing, such as introducing a full-text index on AddressLine2.

I’ve also submitted a Connect item to have more information shown in the query plan, at:
https://connect.microsoft.com/SQLServer/feedback/details/670391/more-information-from-seek-operations-in-plans

@rob_farley

Covering Index because it’s Covering up the truth?

Some people have missed the point about my last post, called Covering, Schmuvvering. If you haven’t read it, then do so. Have a think about what’s going on, and why the Covering Index really isn’t that good, and then come back here and see if you’ve understood it right.

My point is that often in a Seek, the Seek Predicate isn’t filtering out much of the data, and we’re left with a Seek which isn’t much better than a Scan.

The plan that I didn’t show in the last one is the one that follows. To make this, I’ve created a non-covering index, and forced (by hinting – you know, one of those hints that husbands understand isn’t really a ‘hint’ at all) the Query Optimizer to use it.

The tooltip on this Seek shows that the Residual Predicate is no longer in play here. It can’t be – the included columns from the previous post simply aren’t there any more. (If you’re wondering what I’m talking about, then remember I did ask you to read the first post first)

A covering index is simply there to remove the cost of having to perform the Key Lookup – and rightly so, those things are expensive! We really can eliminate a big chunk of the cost of this query by using a covering index. But it’s the size of that arrow that concerns me.

The Seek Predicate here, testing the DaysToManufacture field, is the same Seek Predicate used in the original Covering Index Seek operation (from the earlier post, shown below). It still returns 407 rows, which each need to be checked to see if they satisfy the Residual Predicate.

The Covering Index Seek operation here feels very similar to the box I’ve drawn here around some of the operations in the Lookup scenario. You’ll notice a Single Row arrow coming out of the box. But most importantly, you’ll notice the big thick arrow coming out of the Seek Predicate bit.

Yes, that Residual Predicate, testing Color and ReorderPoint needs to be performed 407 times – that’s how many times our Key Lookup is performed.

With our Covering Index plan, it’s much harder to see. In the diagram below, I’m showing the Tooltip from the original Covering Index plan, showing where features correlate to the plan with the Key Lookup. I’ve drawn arrows to show you what corresponds to what. Remember this is the tooltip from the Covering Index, with arrows pointing at the Non-Covering Index example. The green arrow shows the Seek Predicate, which is located in the Index Seek on rf_ix_NotCovering2. The pink arrow shows where the Residual Predicate is being applied. The blue arrow shows us where we see a single row being returned. What we DON’T see in the tooltip is that the Seek Predicate is satisfied by 407 rows.

image

The Covering Index Seek shows us that a Seek is being done, and it shows us that a single row is outputted. What it doesn’t show us is that our Seek could pretty much be a scan.

In fact, I can easily demonstrate a Seek which is no better at all than a Scan.

This is a Seek. It says it’s a Seek. There’s even a Seek Predicate involved, which is doing a Seek.

image

But of course, all the rows in the Clustered Index are returned. It’s the same as a Scan. The Estimated Subtree Cost is exactly the same as a Scan, because it’s performing a Range Scan where the Range just happens to be the whole table. I could even put in a Residual Predicate (like AND reverse(Name) = ‘niahC’), to make it look like it’s an effective Seek.

image

image

Don’t fool yourself into thinking that this CIX Seek is good – this is still scanning the entire table, checking every row to see if the name spells ‘nianC’ backwards. It’s just that the plan doesn’t tell us how many rows are satisfying the Seek Predicate, and therefore having to be checked by the Residual.

I hope you see why a Covering Index can often hide the truth. If you just look on the surface, things can seem rosy. A Seek returning a small number of rows is good. But included columns can’t be involved in a Seek Predicate, and if your Seek is hardly better than a Scan, you could well be failing to realise the true benefit of a Covering Index.

Use Covering Indexes they way they were intended – to include things that are being returned by your query, not things that need to be tested. And try forcing a non-covering index to be used, to get a feel for just how thick that arrow is – the one that a Covering Index covers up.

@rob_farley

New hires, new site, new location

Exciting times for LobsterPot – Martin & Heidi joining the team, a new website (lobsterpot.com.au), and a Melbourne presence!Melbourne Skyline

In March, Martin Cairney joined the team. Martin is a long-time member of the local Adelaide SQL Server community, and has been a friend of mine for some years now. He has spoken at the Adelaide SQL user group before, and when 2007 saw him move to the UK for a while, I wished him all the best. Martin went on to speak at the first SQLBits event, and became a regular at those conferences until he came back to Adelaide last year. He also spoke at SQL community events around the UK, establishing a good profile there. It was definitely good to get him on board a couple of months ago.

This month, Heidi Hasting brings the number of people in the company to seven. Heidi has worked with some of our staff before, and I see her having a massive future with the company.

You can see the new website for yourself! It’s clearly much better than the old site, which was little more than a placeholder. The new one has the attention of a proper web designer (instead of just being a bit of HTML thrown together by me), runs on the WordPress platform, and features a picture of Glenelg jetty, taken by my lovely wife. Most Adelaide beaches are just amazing, and it’s nice to be able to show one off on the website.

But perhaps the biggest news – we have a Melbourne presence now!

Martin has moved to Melbourne, and we’re picking up clients there. I imagine I’ll make some trips over there myself in coming weeks/months, hopefully speak at the odd user group, visit clients, and maybe see some friends and family as well.

Expanding out of Adelaide has been on my mind for some time. While Adelaide is a great place to do business, and everyone should consider moving here, it’s small compared to many places (only the fifth largest city in Australia). Establishing LobsterPot in other places makes good sense. Melbourne is a great city – I used to live there, and my brothers and their families still do. We’ve had the occasional Melbourne-based client before, but never enough to consider having a consultant based there. We do have a consultant there now, and we are well-and-truly open for Melbourne business.

PowerShell script to help uninstall SQL Server 2008 R2 Evaluation Edition

If you haven’t been caught by a client (or even yourself) installing SQL 2008 R2 Evaluation Edition and then letting it expire, you might not realise the pain that this causes. To say it’s frustrating is a serious understatement.

Fellow SQL MVP and SQLBlog blogger Aaron Bertrand (@aaronbertrand) has an excellent post on the matter, which he put together back in October 2010. I had cause to use it recently, but got somewhat put off by searching through the registry.

So I put a line of PowerShell together (which I’ve split across 5 for easier reading):

If you run this, it will produce the file that Aaron recommended. You can look down it for anything you don’t want to get rid of, and maybe reorder the odd thing, and then run it. Then you should be able to install Developer Edition, and breathe much easier.

The script is quite straight forward, it just lists everything in the uninstall bit of the registry, gets the DisplayName values out, filters them, and outputs the results in a few lines for the batch file. Nothing amazingly complicated, but useful for getting through the list quickly.

(Big thanks to Aaron for working out what was required for the uninstall, and Aaron Nelson (@sqlvariant) for answering a quick question when I was putting the final touches on the script)

Covering, schmuvvering – when a covering index is actually rubbish

Take a look at this query plan.

image

Yes, that arrow indicates a single row. This is an Index Seek, returning a single row. And yet it’s rubbish. That’s right – it’s rubbish!

In fact, I had to provide a hint for it to use this index. A table scan would’ve been better, and this is what happens without the index hint.

image

Let’s look at the query. I can promise you there’s no bug here.

And the covering index is defined as follows:

Try it yourself on AdventureWorks, you should see the same.

Of course, there’s a bit of trickery going on. For example, I purposely spaced out the NCIX using FILLFACTOR=30. But despite that, most DBAs would tell you that the first plan is better than the second. But they’d be wrong.

Let’s look at the tooltip of each of those Index operations. First, the Clustered Index Scan.

image

There’s no real surprise here. We know there’s only one row being returned, I told you that before. There’s a predicate listed which is tested on every row, and the Name and ProductNumber are the two fields which are being outputted. Estimated Subtree Cost of 0.0127253.

How about that Seek then, which the Query Optimizer only chose when its hand was forced by a hint?

image

You’ll probably notice immediately that the Estimated Subtree Cost is higher! 0.0141 instead of 0.0127. The reason why is in the Predicate and Seek Predicate sections. Sure, it can do a Seek – it can seek on the first key column of the index, on DaysToManufacture. But having found those records, the data is still very much in DaysToManufacture order. We can’t search on those predicates very easily at all, and we’re left with them being handled as a Residual Predicate.

To use a phone-book example, this is like searching for people called “Rob F.” – I can easily find people whose last name starts with F, but would then have to go through all of them looking for people called Rob. That residual check could be painful. Actually, this particular example is more like saying “Find me anyone called Rob whose surname is < Z”, as the Seek Predicate involved returns most of the table.

Unfortunately, the query plan doesn’t show us how many rows go through the Residual Predicate check. If it did, we’d be able to see that it’s over 400 rows (in a table of 504 rows), and we might rethink our indexing strategy.

In fact, I can easily make an index which causes the same query to perform a full Index Scan, plus Lookup, and which is half the cost of either of the plans we’ve seen so far.

image

Of course, now I’m really cheating, by using a Filtered Index which would only contain a single row. And I could’ve eliminated the Lookup by including the other columns, halving the cost yet again.

image

Interestingly, the tooltips for both of these Filtered Index operations don’t show the predicates – the Query Optimizer knows that the index has already done the necessary filtering, and in this case, it doesn’t need to apply any further predicates, whether Seek or Residual.

image

So… just because an index is covering a query, doesn’t mean it’s necessarily the right choice. The better question is “Is this index good for this query?”

(Incidentally, this is the kind of thing that I’d go through in my pre-conference seminar at SQLPASS if chosen, and in the talk on “Joins, SARGability and the Evils of Residualiciousness” if that one gets picked)

@rob_farley

New PASS Summit speakers that deserve votes

I’m not going to ask you to vote for the abstracts that I submitted for the PASS Summit. I’m not even going to mention the abstracts that Roger and Ashley submitted. I figure if you’re reading this, you may have already read the post I wrote about them, back before SQLPASS had said that there’d be voting.

Instead, I’m going to pick a few people that I recommend you vote for.

The people I’m going to pick are Paul White, Chris Testa-O’Neill and Erin Stellato. Three very different people, even from different countries. But I don’t think any of them have spoken at the PASS Summit before, and it would be good if they got accepted.

Paul White (NZ) is a fellow blogger at sqlblog.com. If you haven’t read his stuff, you’re seriously missing out. He spends an inordinate amount of time researching the behaviour of SQL Server, and has discovered all kinds of amazing things. He recently gave a presentation at the Boston SQL Saturday, and there was a lot of good feedback about it.

Chris Testa-O’Neill (UK) is one of the organisers of the SQLBits conferences in the UK, and has also appeared on many of the eLearning training for Microsoft. He hails from Manchester but has family living in Adelaide. He visited over Christmas and did an excellent job of speaking at the Adelaide SQL Server User Group.

Erin Stellato (US) blogs at erinstellato.com (I guess that makes sense). I met her briefly at the last PASS Summit, but over recent months have had a few conversations with her about some of the things she writes about on her blog. She submitted an abstract for 24 Hours of PASS event, but didn’t manage to get enough votes to speak. Despite this, I know she knows her stuff, and would do an excellent job presenting.

So there’s three people that I recommend you vote for. You can easily see the abstracts they’ve submitted on the site, to help in your decision. There are plenty of other people you should vote for too, such as the Scottish Jen Stirrup, the Aussie Darren Gosbell, or Seattle’s own Kendra Little (I think they haven’t spoken at the PASS Summit before either) – but please have a look through the names and pick some that are not the usual suspects.

@rob_farley

CTEs – query-scoped views

This T-SQL Tuesday is on the topic of CTEs, and is hosted by Bob Pusateri (@SQLBob). TSQL2sDay150x150For a bunch more posts on the topic, follow the link to his Invitation Post, and you’ll see a pile of them, all about CTEs. If you’re reading this and it’s still May 10th 2011 (GMT), then why not put a post together yourself!

I’m sure there will be plenty of posts about how CTEs can be used to wrap ranking functions, and about recursion – useful features both – but I want to explain how I see CTEs.

CTEs are Common Table Expressions. The clue is in the name – they’re table expressions, and able to be referred to in the FROM clause, fulfilling the same role that a table might. Not a table in the sense of storing data, but in the sense that a table refers to a set of data, ready to be transformed into a result set.

There are a few things that can be used as a table expression in a FROM clause. Obviously a table, but others include views, table-valued functions (whether inline or procedural), and table-valued sub-queries. Those of you who regularly read my blog (and if you’re not reading this at sqlblog.com, where are you reading it?) will know that I like to call views ‘stored sub-queries’, and TVFs ‘stored correlated sub-queries’ or ‘parameterised views’. So perhaps there’s no surprises here that these are all table expressions too.

But this post is about CTEs. They’re half way between view and sub-queries. They exist only within a single query, and contain a table-expression. A stored sub-query if you like, but only scoped to a particular query. They’re like views, but don’t persist.

So a CTE is a view for when you don’t really want a view. It’s a query-scoped view. Very useful really – and you can even use them for UPDATEs, INSERTs and DELETEs, so long as you understand that this is subject to the same rules as views (but obviously you can’t put triggers on them).

But this query-scoping can feel like a problem. What if I want my view to persist for a bit longer? That could be quite handy, couldn’t it? What if we wanted a view to be created at the start of a stored procedure, and be scoped there? We can do this with tables, we just put a hash in front of the name: CREATE TABLE #blah (id int), and the scope is set.

Unfortunately when we try to do this with views, we get a nice error.

image

That’s right: “Temporary views are not allowed.” It knows what we’re trying to do, it just doesn’t let us do it. It’ll give similar errors for temporary functions too.

Luckily, you can vote for this on the Connect site. There are a couple of different suggestions on how it could be done, so I’ll let you choose the one that you prefer

http://connect.microsoft.com/SQLServer/feedback/details/640863 (Please allow creation of temporary views)

http://connect.microsoft.com/SQLServer/feedback/details/343067 (Module-level table expressions)

Maybe at some point we’ll be able to have a wider range of options over the scope of our ‘common’ table-expressions.

@rob_farley

LobsterPot submissions for SQLPASS

PASSSummit2011My guys are great! When PASS started accepting abstract submissions for their Summit (in October this year), some of the LobsterPot employees immediately started looking into ideas for talks they could do. We rate communication as one of our key values at LobsterPot, and all my staff are keen presenters.

 


 

Roger Noble was at the PASS Summit with me last year, and has since spoken at both the Adelaide SQL Server User Group and Adelaide SharePoint User Group. Considering the work he’s done in data visualisation with PivotViewer over the past year, he was keen to be able to submit a session on that. This technology is seriously cool stuff – quite a few of our clients have been very interested in it and are now using PivotViewer to get at their data in new ways. You can see examples of Roger’s work at http://pivot.lobsterpot.com.au

Get even more from PivotViewer (Roger Noble)

With the release of the Silverlight PivotViewer control from Microsoft in June 2010 we saw the beginning of a new direction for data visualisation and interactivity, allowing data to be browsed and filtered in ways that highlighted information that could have easily been missed. This session will show you how to take the PivotViewer control and enhance it even further to provide even more ways to display your data, including placing information on maps, and showing extra information in the PivotViewer tiles according to the zoom level. From sourcing data from PowerPivot and SharePoint 2010, using Visual Studio 2010 to add new functionality and improvements in future versions this session will show the range of ways that PivotViewer can effectively be used in your organisation.

 


 

Like most of the team, Ashley Sewell has been doing a bunch of work with clients implementing cubes and reports. The talk he’s put in reflects a very common emotion that he gets from clients when they first start talking about Business Intelligence. They want to know that they’re not just getting their data in a different format, but that they’re going to be able to reach into the data themselves and realise that ‘Analysis’ aspect of SSAS. Ashley used to be tertiary lecturer, and understands the importance of giving presentations that are useful as well as at an appropriate technical level. This talk will be excellent, and I really hope it gets picked.

So you’ve got a Cube. What’s Next? (Ashley Sewell)

Did you ever get to the end of an Analysis Services session thinking "Cubes sound great but what can I show the analysts and execs back at work to woo them?". If your answer is yes then this session is for you. You will be taken through some of the Business Intelligence reporting and dashboarding available using a combination of PerformancePoint Services 2010 and Reporting Services 2008 R2 with particular emphasis on combining the best of each offering to maximise the impact of your dashboards. You can expect to leave this session with a deep enough understanding (and a list of gotchas) to enable you to create your own dashboards and data visualizations that bring the data within your cube to life on the web.

 


 

As expected, I’ve put a few submissions – a pre-conference seminar and two regular sessions.

The pre-con is an enhanced version of the one I did at SQLBits 7. In that, I go through a bunch of T-SQL queries that could have been fixed using T-SQL that most people aren’t aware of. For example – many people would shy away from something like ORDER BY MAX(OrderDate) DESC, but if you understand what’s going on there, when it’s good and when it’s bad, then it can be just fine. This pre-con got excellent feedback at SQLBits, and I think it will please PASS Summit delegates as well.

Fixing Queries With Advanced T-SQL Constructs (Rob Farley)

Have you inherited queries that are not your own, and are finding that performance isn’t so great? Removing cursors in favour of set-based queries is useful, but even set-based queries can perform poorly. Understanding the impact that various constructs can have on a query plan could be key to resolving many of these issues. In this seminar, irrepressible SQL MVP Rob Farley will take a look at some real-life queries and take the audience through examples of constructs that can have significant effects on tuning. This will include complex nested joins, join simplification, procedural functions, SARGability v residuality with predicates, better execution plan reading, start-up parameters, force hints, complex sorting, ORs, effective Dynamic SQL, GROUP BY v DISTINCT, unique indexes, temporary tables, APPLY considerations, and more. You’ll discover profoundness in things you thought you knew, and you’ll even see when a covering index that returns a single row can be a bad thing. This will be a day spent in Management Studio, not PowerPoint. If you want to know how to persuade the Query Optimizer to do a better job of running your query, this day is for you. The examples will apply to a variety of versions, with most of it being useful even in a SQL 2005 environment.

 


 

Another talk that I’ve done in the past is one called “Understanding SARGability (to make your queries run faster)”. In fact, last year this talk was a ‘stand-by’ for the PASS Summit. I also gave it at SQLBits 7, with Brent Ozar (@BrentO) and Buck Woody (@BuckWoody) heckling me from the back. Brent tweeted “Okay, wow, @robfarley is a seriously good presenter”, and although he got my Twitter handle wrong (it’s @rob_farley), I was very flattered. It’s one of my second-favourite tweets still.

Over recent times, I’ve found that people really don’t seem to understand the significance of having predicates fall into the category of “Residual”. I wrote about it recently, and was quite interested to see some of the reactions that people had when they talked about it with me. Jack Li wrote a post last week about a query which took longer than expected because a Hash Match was putting a lot of data into a single bucket. This is a common problem that gets missed, because of the impact of having the selective predicate treated as residual. I’m going to write more posts on that in the coming weeks, and a lot of that will be covered in one of the talks I’ve submitted this year. Residualiciousness isn’t a real word, but I figure that shouldn’t stop me.

Joins, SARGability and the Evils of Residualiciousness (Rob Farley)

You wouldn’t believe how often people just aren’t using their indexes effectively, whether it be searching for data, or joining tables. Quite often, this comes down to predicates becoming residual. Yes, residuality is a problem, and once a predicate has become residualicious, you may as well be scanning instead of seeking. There’s so much more to SARGability than people think, and people can often miss out on significant performance benefits by not appreciating this – particularly with new and improved query hints becoming available in SQL 2008 R2 SP1 and Denali. SARGable means Search ARGument Able and relates to the ability to search through an index for a value. Unfortunately many database professionals don’t really understand it – especially in regard to joins – leading to queries which don’t run as well as they should. In this talk, you’ll learn how to tell whether a predicate is being used correctly, and to evaluate what’s really going on in your Seek or Join. You’ll even learn to use new features in SQL 2008 R2 SP1 and Denali to affect the residuality of your predicates. This is a talk involving lots of demos, showing plenty of queries and execution plans.

 


 

The other talk I’ve submitted was inspired by a conversation with my friend Jamie Thomson (@jamiet), who unfortunately won’t be at the PASS Summit this year (he does have an excellent reason though). I happened to mention something which I considered an important consideration about queries used for SSIS, and he told me I had to write a blog post about it. I eventually did, and it got me thinking about a bunch of things that SSIS Tuning Talks (like those that Jamie gives) which are very relevant to tuning T-SQL, but yet almost never get mentioned by standard talks. Some of them get covered in the pre-con seminar too, and I’m sure I’ll have to get blog posts written on some of these things over coming months.

Tuning T-SQL Using Lessons Learned From SSIS (Rob Farley)

We see presentations telling us how to tune T-SQL, looking at things like how a covering index can help avoid an expensive lookup, and the importance of set-based thinking. But there can be a lot more to finding bottlenecks in an execution plan, and there are significant parallels with the kind of concerns we have when tuning SSIS Data Flows. This session will look at some of the things that SSIS gurus explain when in looking at how to make SSIS run faster, and draw strong parallels to things that many query tuners don’t realise. If only they paid attention to the SSIS world! There will be a lot of examples in this session, explaining what’s happening in query plans and the ways that you can persuade your queries to run more like your SSIS packages, and vice-versa.

 


 

Looking at the long list of abstract submissions this year (and right now there’s about three hours for more to come in), I think it’s going to be a great event. With buzz around Denali and a stronger community than ever, I think it’s going to be huge.

There will be handful of LobsterPot employees there, and I hope you will be too!

@rob_farley