Missing Index in SQL Server 2008 – should try harder!

Ok, maybe I’m being a little harsh, but I just feel like it should be better.

Let me show you the niceness of the way that missing indexes are handled in SQL Server 2008.

Using AdventureWorks (not AdventureWorks2008) on a SQL Server 2008 install, if I show the Execution Plan from this simple query, I get a nice suggestion. My query…

select productid, orderqty
from sales.salesorderdetail
where carriertrackingnumber = ‘FB88-4B92-82’;

…could be improved through better indexing. It uses 1240 reads to get this data, which seems awful. The system shows me that it could be improved, and suggests an index.

image

It’s there, in green. It says:

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>] ON [Sales].[SalesOrderDetail] ([CarrierTrackingNumber])

(that’s right, no semi-colon on the end, but I’m fine about that)

What I’m not fine with is the fact that this index isn’t actually so ideal. If I create it (supplying a name, of course), we see it’s being used, and it should be clear that a better index ought to be better.

image

This query uses 42 reads to get the required information, which is significantly better than 1740, but still not brilliant. In fact, 42 is about 2.4% of 1740, so it’s hardly the 99.6512% improvement that Management Studio suggested would be seen.

My preference would be to consider that Key Lookup as well. it’s taking 92% of this improved query. We can avoid the Key Lookup by creating an index which INCLUDEs the columns we’re interested in. Like this:

CREATE NONCLUSTERED INDEX [MyNewIndex2] ON [Sales].[SalesOrderDetail] ([CarrierTrackingNumber]) INCLUDE (ProductID, OrderQty);

If I create this index, we see that the execution plan becomes just the Index Seek (on my new index), and the number of reads drops to just 3. Yes 3. That’s 0.17% of the original query, and only 7.1% of the reads of the improved query! 99.83% of the original reads have been eliminated – much more like the figures promised by my Missing Index suggestion, except it got it wrong.

image

I like the idea of detecting Missing Indexes, and I love the fact that it suggests these in Execution Plan viewer… I just want it to be slightly better by considering INCLUDEd columns.

I’ve suggested this be improved on the Connect site at https://connect.microsoft.com/SQLServer/feedback/ViewFeedback.aspx?FeedbackID=375024

12 thoughts on “Missing Index in SQL Server 2008 – should try harder!”

  1. I don’t claim to know what’s in the mind of the people who wrote the algorithm that suggests missing indexes. But here’s my guess.

    The best index for any query, any simple select query at all, will be a covering non-clustered index (like you suggested).

    It’s always a bit of an art and a science isn’t it? A bit of a balancing act.

    We can make every select statement we analyze superfast by implementing the right index.
    But the risks are: Expensive updates, too much space. etc…

    My guess is that the database engine people deliberately took a step back from recommending non-clustered covering indexes for everything and just stuck with the non-clustered part. Giving up included columns and accepting lookups.

    But that’s just my guess.

  2. But why accept lookups? If there were a semi-ideal index in place, I understand the decision to let things be and NOT suggest the better one. But if a suggestion is made at all, it ought to be the one that is most ideal.

    Rob

  3. Kinda odd. When I run that query on Adventureworks2008 (SQL 2008 RTM x64) the missing index it lists is

    CREATE NONCLUSTERED INDEX []
    ON [Sales].[SalesOrderDetail] ([CarrierTrackingNumber])
    INCLUDE ([OrderQty],[ProductID])

  4. I would hope it’s not a 32/64 issue! I’d be very worried if some of the core features (optimiser) behave differently on 32/64. Unfortunately I don’t have a 32-bit server to test on.

    Can you check the missing index DMV, see if the include columns are listed there? Also, maybe have a look in the exec plan’s XML at the missing index section.

    I’m curious as to whether it’s the optimiser not noticing that the include columns would be useful, or management studio just displaying the missing index details wrong.

    FWIW, most of the time I find the missing indexes to have too many include columns (up to the point of including the entire table), not too few.

    What build of 2008 are you running there?

  5. Gail,

    On my machine, the included columns aren’t there in the XML or in the DMV.

    @@version gives:
    Microsoft SQL Server 2008 (RTM) – 10.0.1600.22 (Intel X86) Jul 9 2008 14:43:34 Copyright (c) 1988-2008 Microsoft Corporation Developer Edition on Windows NT 6.0 (Build 6001: Service Pack 1)

    And yes – whilst in the past I’ve found that there were too many included columns listed, I now find this!

    Anyway – we’ll have to see what other people find too.

    Rob

  6. I disagree here. I think the DTA goes way TOO far in INCLUDING columns to create covering indexes. I have seen it absolutely destroy databases (and database performance) with bloated indexes. I just cleaned out 60% of the databases a client had in their database because they ran DTA on a workload and accepted everything. Massive bloat, blocks and locks all over the place. The indexes I left caused just a small amount of read increases for select activity but made the DML stuff WAY faster.

  7. I think there’s a big difference between the results you can get from the DTA looking at a whole batch, compared to looking at a single query. For a single query, it should be suggesting the best index (perhaps with a disclaimer). For a batch of queries, a degree of compromise could be considered.

    All the suggestions should come with a bit of a warning about the fact that indexes should be considered carefully.

  8. Whether this index should include ProductID or not depends only on whether you decide its important for the query to complete in 3 reads vs 6 & that the overheads associated with having the extra column included are worth the minicsule saving in reads.

    Index maintenance overheads are often described in terms of DML overhead (updates etc). Less widely recognised is the extra space index columns take up, which have daily implications for disk space management, backups, index maintenance & log shipping. These are real problems which should be taken into consideration when deciding to simply throw an extra column onto an index.

    Adding extra columns onto indexes should be given similar consideration to throwing extra columns onto tables..

  9. Yes, I agree wholeheartedly – creating indexes should be considered very carefully.

    My point is simply that when considering one query in isolation (as we see with the Missing Indexes section of an execution plan now), it should suggest the best index for that query. If the administrator chooses to implement something else, then that’s fine.

    I definitely don’t advocate creating every index that the system suggests. I would simply prefer it to suggest the index that would see the most improvement for the query in the plan.

  10. I think it is kind of funny that when run Rob’s query and then look a the missing index stats the included_columns are not showing up when running the following query:

    SELECT *
    FROM sys.dm_db_missing_index_groups AS g
    join sys.dm_db_missing_index_group_stats AS gs ON gs.group_handle = g.index_group_handle
    join sys.dm_db_missing_index_details d ON g.index_handle = d.index_handle;

    It might be nice to know why the optimizer is not picking up these included columns.

    But now if you change Rob’s query to be an inequality query like so:

    select productid, orderqty
    from sales.salesorderdetail
    where carriertrackingnumber > ‘FB88-4B92-82’;

    Now the suggested missing index now has the appropriate INCLUDED columns.

    Oh, by the way all my testing was on a 32bit machine.

Leave a Reply

Your email address will not be published. Required fields are marked *