Why isn’t my filtered index being used?

Quite often, people have filtered indexes but find they’re not being used as often as they’d like. I was reminded of this recently when I read a good post by Kendra Little from brentozar.com about filtered indexes. In it, Kendra talks about how the WHERE clause of a filtered index allows an IN clause, but not an OR clause (to quote someone from Microsoft “We currently do not support the "OR" clause due to the optimizer matching difficulties for such clauses”). Going back a few years, Tim Chapman wrote a post about the pains of filtered indexes.

Anyway, both of these posts hint that filtered indexes aren’t always used. Tim addresses it directly, and Kendra mentions needing a hint to make sure the index is used. I thought I’d explore the question a little more. I’m confident that both Tim and Kendra know this information – they are two of the top index experts in the world. This post isn’t for them, but for those people who are trying to find out why their filtered indexes aren’t being used.

To be used, the filtered index must be able to satisfy the query.

This should be fairly obvious, but it goes a little deeper than you might think on first glance. Let’s explore Tim’s examples to show what I mean.

Tim creates a filtered index:

…and then shows that it’s used successfully for the query:

(The image here is from Tim’s blog post, and belongs to Microsoft)

No surprise here – if the system knows that ProductID is 870, then it can use an index which only includes rows that satisfy that.

Tim then tries to use a variable instead of 870 – although he still passes in the value of 870.

(Image from Tim’s blog again)

No luck – the system doesn’t use the filtered index. He can’t even use a hint to force it to use it – SQL replies saying that it can’t create a plan for it using that index hint.

So what’s going on?

The problem is not with the Query Optimizer seeing that the value is going to be 870 – the problem is with the plan cache. You see, when SQL runs a query, it figures it won’t be in isolation and it puts it into the cache. But the version that goes into the cache is a general one, that doesn’t consider the values that are passed in. Because it needs a plan that will work regardless of what the parameter is set to, using the filtered index here would be inappropriate.

Tim shows one way around this, and ‘hints’ at another in an edit, although sadly you tend to find that in blog post edits, you can miss the key a little.

The way that Tim gets around this is to use Dynamic SQL, but I’m not a fan.

Yeah – I’m not a fan of this. Go read my post on SQL Injection for why.

Tim does mention a better method in the ‘edit’ bit in his post, and I want to look at that. He says: “In many cases, a way to get around the local variable problem is to recompile the statement.” – but I want to make it very clear that the point is not actually to recompile the statement, but to use a statement that isn’t going to get put into the cache (which you do by using the OPTION (RECOMIPLE) hint, which makes it sound like you’re recompiling the statement).

When you use OPTION (RECOMPILE), the main impact is not that it recompiles, but that the query doesn’t get cached. Because it doesn’t get cached, it won’t find the query in the cache beforehand either. This means that it doesn’t need to consider the generalised version – it has the confidence to know that it doesn’t need to cater for future uses, so it can use the filtered index!

image

So the better option than using Dynamic SQL is to use OPTION (RECOMPILE).

So that’s one reason why your filtered index might not be used – but there’s another too:

Using the filtered index might be too much work.

Let’s consider Kendra’s example. She had a query that used IN. Her example was:

But as I don’t have a Votes table, I’m going to use an equivalent with AdventureWorks:

Kendra uses SELECT COUNT(*) queries to test it. Similarly to Kendra’s example, this works well when IN is used, and when OR is used, but when just one of the options is used, it needs a hint to use the index, and then it needs a lookup to satisfy it.

image

What Kendra doesn’t point out is why the filtered index needed the hint to be used in the single-option examples (which is fair enough – because her post was about OR v IN, not about why the filtered index wasn’t being used).

The reason why is because of the Lookup that’s needed. This is so expensive, it’s cheaper for the Query Optimiser to do a clustered index scan instead. It’s standard ‘tipping point’ stuff, but we don’t normally see this when we have COUNT(*), because COUNT(*) is just counting rows, not returning extra columns.

except that there is a column that isn’t included in the filtered index, which our query needs – Color.

Subtly, even though our filtered index only contains rows that have the Color either Red or Black (or in Kendra’s example, VoteTypeID either 1 or 2), it doesn’t store which rows are Red and which rows are Black. We know that every row in the index is either Red or Black, so we can use this index as a starting point, but we would need to do a Lookup to get the actual Color value.

To fix this, we should INCLUDE the Color column in the filtered index.

Now we don’t need to hint at the index to use, and we see a residual predicate being used to make sure that we only pull Red rows out of the index.

image

So we see two reasons here for filtered indexes not being used, and two ways to help encourage them to be used more often. The first is to consider using OPTION (RECOMPILE) to help persuade the Query Optimizer not to consider generalising the query across different parameter values, and the second is to INCLUDE the columns that are used in the filter, in case the database engine needs those values later in the query.

Do try to use filtered indexes, and be understanding about those times when they’re not used. There’s quite probably a simple explanation.

@rob_farley

SQL Injection – the golden rule

The problem with SQL Injection is that most people don’t realise the fundamental concept which makes SQL Injection vulnerability not only easy to spot, but also easy to prevent. And it’s the thing that SQL Injection has in common with countless other hacking mechanisms that have been around since the early days of computing.

The simple truth is this: You are vulnerable if you confuse DATA with COMMAND.

SQL Injection occurs when a piece of user-entered data makes its way into the command that is being executed. Not simply as a piece of data, such as the value that a column is being compared to, but when it becomes something which actually gets executed. Buffer overflow attacks worked in the same way – although the method of getting the data to be executed is different.

So you should never put yourself in a situation where something that a user has entered gets executed. It doesn’t matter what sanitisation you have done, you should never do it. Creative hackers will be able to find ways around however you do your sanitisation. Just don’t run something that has come from the user – pass it in using parameters instead.

Code within a stored procedure is trustworthy – SQL code written in a stored procedure can be trusted. But…

Code within a web application is trustworthy – SQL code written in a web application can be trusted. But…

…but all this can be undermined if your developers tell user-entered code to be executed, rather than being treated as parameters.

(Of course, if your server is compromised and someone can tell rogue commands to run, then all bets are off – I’m just looking at SQL Injection here)

What happens with SQL Injection is this: a COMMAND is constructed that includes user-entered DATA, and is then executed. It doesn’t matter whether this is done within .NET code, or within perl code, or within a stored procedure – you just don’t ever execute data.

There are plenty of guides to avoiding SQL Injection. They say to avoid using dynamic SQL; to avoid ad hoc SQL; to only use stored procedures. Most of them miss the point. They’re not bad, but following these rules doesn’t make you injection-proof. You simply don’t ever execute anything that came from the user.

Dynamic SQL is fine. You can construct a query based on user input – so long as you don’t execute something that came from the user. If the user chooses a particular option, the dynamically-generated SQL can include or exclude some predicate, or can bring a new parameter into the mix. This gives the freedom of dynamic SQL without needing to become vulnerable to SQL Injection.

Ad hoc SQL is also fine, and you don’t need to use stored procedures. It doesn’t matter whether your SQL code is stored in a stored procedure within the database, or whether your SQL code is stored within your application.

Ad hoc code or dynamic SQL like this is okay:

but code like this is not:

…no matter what the Sanitise functionality is. The only way to protect against SQL Injection is to only use controlled commands, and never anything else.

But what about generic procedures, that developers use to be able to be able to query whichever objects they want?

It’s easy to simply “Just say no”, to procedures like this. After all – it’s not healthy for all kinds of reasons, such as the plan cache, such as the effective use of indexes, and more. But if you have a stored procedure like:

…with an application that uses it, then you may be somewhat worried (and legitimately so) about SQL Injection, and be wondering how you’re going refactor all the code that uses this procedure.

So how do you make this safe from SQL Injection, so that the calls to the procedure aren’t susceptible, while all the redevelopment is done?

Hopefully you know the answer – you need to control the commands.

The issue is not with the use of @filterval – that’s being passed in as data, and is being used as data. It’s never part of the command itself. The problem is that @tablename, @colname and @filtercol are all passed in from the calling application, potentially even from the user, and then used as part of the command. This is what needs to be fixed.

Luckily, we have access to a bunch of things that we know can be used here – actual table names and actual column names in the database’s catalog views. So therefore, we can safely use those things. They are controlled.

I’m not saying this is good. I’m saying this is SAFE. We don’t execute anything that we don’t already control. The only tables we can query are ones that we know actually exist. The only columns that we can use are ones that must exist. You can always add more logic, to make sure that people can’t query system objects, and things like that – but they definitely can’t execute anything overly malicious. They can’t make calls to sp_configure or to xp_cmdshell. They can’t drop tables or update values. If you need to restrict the columns, you can easily filter the queries on all_columns, so that the calling code can’t access so much.

SQL Injection is serious. It’s a huge problem for people who deal with databases – and not just SQL Server, but all databases. The solution is easy, but requires some level of discipline.

Don’t execute user input, and don’t let your developers do it either (and if that means forcing them to use stored procedures that you can audit, then fine – do whatever it takes).

Oh, and as this month’s T-SQL Tuesday (hosted by Kenneth Fisher@sqlstudent144) is on the topic of security, that’s what this post is on.

TSQL2sDay150x150

@rob_farley