Table? No such thing…

September 14, 2010

No really – hear me out.

Of course you create tables, and you query tables, and we say that data is stored in tables. The table is (rightly) a fundamental part of relational theory. But I find that when I think about queries and how they run, I need to approach the system thinking about the indexes that I’m querying, not the tables.

When you decide to find someone’s phone number, you have to make a decision about where you’re going to look that up. Do you pick up the White Pages, the Yellow Pages, your little black book, your kids’ school’s parent directory, or something else? In fact, this is a question that you ask yourself before you even consider whether the particular information is accessible or not. If you had no other option but to scan the entire White Pages (starting at page 1), you would probably just consider the information unobtainable.

And yet all these different sources of data are just different non-clustered indexes on the master set. They’re generally filtered, ordered in an appropriate way, and don’t contain all the information. The Yellow Pages is just businesses, ordered by business type. The White Pages doesn’t include the business type at all, and maybe the school’s parent directory doesn’t include the address. Each of them is designed for a particular type of use. The cost of maintaining them is considered (and presumably it’s understood that the benefits of having the index far outweighs of cost).

I know some of you will be telling me that not all tables have indexes – but every table in SQL Server is either a Heap or a Clustered Index, and if you consider that a Heap is simply an unordered index, then hopefully my points can stand.

When you write a query, you have the option of forcing a particular index to be used with a Hint. This is the kind of hint where you don’t get a choice. I’m sure you all know times when someone has “hinted” that a particular task needs doing, or that you should avoid telling a particular kind of joke, or etc. The kind of hint that if you can’t obey, an error situation will occur. So if you “hint” that a particular index should be used but that index isn’t there, you’ll get an error. So I try not to use index hints if possible.

But my dislike for index hints doesn’t stop me from wanting to consider indexes rather than tables. If I think about what data I want, I really ought to be thinking about the ideal index for my query, and expect to see that index being used in the execution plan. If it’s not being used, or it’s being used in a way that I didn’t expect, then I really ought to ask myself why and investigate the situation (and consider whether or not the way that the query is being run is acceptable or not – remember the analogy of not wanting to start on page one of the White Pages). Hopefully I’ll realise that the index I imagined wasn’t actually ideal when considering the statistics involved, or that the Query Optimizer has found some clever way to answer my query in a way that I didn’t expect, but sometimes you find that the way you’ve written the query is actually persuading the Query Optimizer to create a plan that’s not actually what you want. Correct, valid, but not necessarily ideal.

It’s very easy to just think in terms of tables. You need to reference tables in your query, which ultimately is a logical construct – but the plan that gets produced is what determines the performance, the locking, the cache use, etc, and the plan doesn’t talk about tables (and if it does, the word Table means Heap). By all means you should remember that two indexes that are based on the same ‘table’ may refer to the same data, and a perfectly valid plan could use a different index to what you’re expecting – but this doesn’t mean you shouldn’t give thought to what you consider the ‘correct’ index to be.

So move on from tables, and think in terms of indexes. You’ll learn more about how your queries get executed, and your indexing techniques will improve wildly. More importantly though, your queries will almost certainly run faster.

But to complete the story, it’s also completely valid to think of every index as a table. Each non-clustered index could be thought of as a table which the key is like the table’s clustered index, and the NCIX’s included columns are like the rest of the columns in the table. This helps you think about the cost associated with maintaining your indexes, and also helps you plan them. I’m not saying that you should create different tables instead of indexes by any means, but when you consider the idea that a table is an index, also consider that each index is a table – just to complete the picture.

This Post Has 5 Comments

mjswart September 15, 2010 Reply

That’s awesome! Spoofing the Matrix:
"Don’t try to look up information in the table. That’s impossible. Only try to realize the truth."
"What’s that?"
"There is no table."
Rob Farley September 15, 2010 Reply

Yes – exactly. 🙂
mjswart September 17, 2010 Reply

While heaps don’t have indexes, I think your points can stand after we stop calling them "heaps" and start calling them "piles" or "messes" (and not tables).
Rob Farley September 17, 2010 Reply

No – heaps aren’t all bad. They’re just indexes ordered by the rowid.
Mike S September 21, 2010 Reply

I am in favor of "piles" or "messes" over the word "Heap".
When I first started learning database internals, the word "heap" reminded me of my analysis of algorithms days, where the word "heap" refers to a tree-based data structure used for heapsort…
Why they SQL Server would store unordered data as those types of "heaps" was very confusing to me.
"Messes" is better. If you want to start a campaign, I’ll be the first one to sign the petition