Filegroups and Non-Clustered Indexes

Let's start with some basics and then jump in a bit deeper, for this post to go with the 40th T-SQL Tuesday, hosted this month by Jen McCown. TSQL2sDay150x150

SQL Server holds data, and that data is stored physically in files.

Of course, in the database world we think of the data as living in tables*, so naturally there must be some sort of mapping between the data in a table and the data in a file. This is where filegroups come in.

When SQL objects are created within a database, there is an option to be able to tell the system where to actually store it. But this isn't a file, it's a filegroup. (If it were a file and the disk volume that the file was on filled up, we'd have big problems. Instead, we put it in a filegroup, and can add extra files (on different volumes if so desired) to that filegroup.) Objects are stored within filegroups. Filegroups are groups of files, although many database systems do end up with only a single file per filegroup.

Filegroups end up providing a large amount of flexibility for the storage of data. Rarely accessed data can be put in filegroups that have files on cheaper (but probably slower) disk, while data that is heavily written can be put on disk that's more appropriate for that, and so on. I'm sure you get the picture, and this is nothing overly thought-worthy.

You may even have already considered the idea around partitioning data across filegroups, moving data of a certain age (but potentially even from the same table) onto a different filegroup, so that queries that use different parts of tables can benefit from having some data on faster disk. Lots of potential. 🙂

Where it becomes interesting though, is when you start to consider the concept of non-clustered indexes.

You see, while a clustered index (or heap) provides the structure to store the data across all the columns of a table (although I'm simplifying it a little in regard to LOB data), we use copies of some of that data (in non-clustered indexes) to optimise access to the data.

So where are these copies stored? Many database people don't even think about this kind of thing, in the same way that they often don't think about including indexes in their database design. I'm going to figure you're not in that boat though, because I probably know you, and you're probably not one of those people.

Most people like to have a default filegroup that is NOT the primary filegroup. It means that when they create new objects, those new objects get put in the default filegroup, not the primary one. But it's not actually that simple.

Let's start by creating a database.


Notice the column data_space_id. This is the column which identifies each filegroup. We'll use it later.

Let's create a new filegroup and set it to be the default.


Cool – data_space_id 2 is created.


I think this is a shame – but I guess it's fair enough. We have to have a file in the filegroup before we can make it the default. That's easy enough though, and probably what we'd be wanting to do before too long anyway.


(I'm doing this on my laptop, which only has one physical drive – on a server it'd go somewhere more sensible of course)

Now let's add another filegroup. This is going to be for some data, but I don't want it to be my default.


Now, I want to create an object on FG3, which I'm going to pretend is my slower disk.

Once created, I'm looking in sys.objects to confirm which filegroup the table is in, but there's no information there. Remember that a table is only metadata, and the things that matter are the indexes/heaps that are on it. Checking in sys.indexes shows me that indeed, it's in data_space_id 3.


For completeness' sake, I'm going to put some data in there, using a query that I blogged about yesterday.


But the whole point of this was to see what happens with the non-clustered index, which I want to be on recent data only (ie, filtered), and I want it to be in the DEFAULT filegroup.

As I want it in the default group, I won't specify a filegroup for the index.


But look! It's put the non-clustered index in the same filegroup as the clustered index. This isn't what I wanted. In fact, it's almost never what I'd want, because even if the disk performance is identical, it can be nice to have transactions which update both the clustered and non-clustered indexes using different disks.

Instead, we have to specify it explicitly to tell it to use the filegroup we want.


It's a shame to have to be so explicit with these things, but when you create non-clustered indexes, you really should think about which filegroup they're going onto, because they may well not go onto the default one as you might expect.


A TOP Query

For the blog post that I'll publish tomorrow, I wrote a query that I thought needed a blog post all of its own. This is that post.

The query was this one. Its results aren't that interesting, it's just a list of dates with a random number between 0 and 99. Just some sample data that I thought I'd use.


So, let me give some background…

When teaching about T-SQL, I often point out that a nums table is tremendously useful. One of its uses is to make a dates table, which can be really handy in a bunch of other ways, like in data warehouses. The idea is simple, assuming you have a table of numbers which starts from either 0 or 1 (I don't really care, although I start mine from 1). Here I figure that you want to get dates from between 20010101, up to (but not including) today.

I'm not going to show you the results, I figure that it's easy for you to picture a list of dates.

Oh, alright then.


Anyway, moving on.

In this case, I didn't have a nums table handy, and for that, I tend to use ROW_NUMBER() and the table master.dbo.spt_values. This table contains a bunch of rows of handy reference data. I wasn't interested in the contents though, I just wanted to have enough rows, and it's quick to reference that rather than writing a bunch of self-joining CTEs. There's over 2000 rows (2515 to be exact) in master.dbo.spt_values, and if I want to have up to 4 million, I just do a cross join to itself.


This query gives me a lot of rows, of course… but if I use it in a sub-query (or CTE) and filter it, then the simplification work that the Query Optimizer does will mean that it doesn't try to work out all 4 million rows for me, it'll stop when it's seen enough.

As an example, I can use this in conjunction with my earlier query, which had a WHERE clause.

This produces my rows in less than a second, giving the same results as when we had a physical nums table.

To include my random values, I'm using the commonly found randomizing method of ABS(CHECKSUM(NEWID())) % 100. RAND() is no good, it just produces the same value for every row. NEWID() is much better, but it's not a number. CHECKSUM solves that, but can be negative. ABS will wrap that up nicely and give a random number in a large range. Mod 100 solves that.


This works nicely, and is a common pattern for dealing with this kind of thing.

But it's not the query at the top of this post. That was done without a CTE, and used TOP instead, ordering by the ROW_NUMBER().

Let's have a look at it.

We can see that the FROM clause is the same as in our nums CTE. And the same randomising bit for NumOrders is in there.

But instead of using ROW_NUMBER() OVER (ORDER BY (SELECT 1)) to define a num column, we're using it within another function? Can we do that? Yes. ROW_NUMBER() can only be used in the SELECT clause and in the ORDER BY clause, and there are restrictions about using it within aggregate functions as you might expect, but here we're just using it as a value which changes with every row, and there's really no problem at all.

We don't have a filter though. In our CTE version, we used a filter to make sure we weren't using every row from our CROSS JOIN. Here, we're using TOP instead. But not TOP with some number – TOP with a function in it! This has been possible for a while, and it even supports sub-queries that produce numbers, in the form (including the double-bracket): SELECT TOP ((SELECT SomeVal … )) Col1, Col2…

TOP appears in this query because I needed to limit the number of rows coming back. I couldn't use the WHERE clause, because I didn't have anything to filter on. So I used TOP, and had to use a function in there.

So let's compare the plans.

To be fair, I'll use the fixed date in both queries (for now).


Oh, how disappointing! My TOP query is 79% of the batch, and the CTE version is 21%. Clearly my new one is 4 times as bad, and I should give up on it.

Actually, if you run these two queries on your own machine, you'll see the first one isn't 4 times as bad at all. It's actually FASTER than the second. Something's going on, and we should find out what.

The plans look remarkably similar. In fact, the second one is identical, but has an extra Filter operator. I don't have a Filter in the TOP one, but I didn't really expect it to make that much difference.

Otherwise, the plans look pretty similar. They both use a Row Count Spool, have a Sequence Project to work out the ROW_NUMBER, and they both use a Top operator – even the second one which doesn't use TOP.

But you see, the Query Optimizer would have seen that I was filtering on a column that mapped to ROW_NUMBER(), and that I was doing a "less than" operation there. That's like doing a TOP, and the Query Optimizer sees benefit in this. It doesn't explain what's going on though with the "4 times worse" thing though.

Let's examine some of the numbers. Interestingly, we see that the Nested Loop operator expects to do almost no work in the second plan, and 27% in the first. There's a warning on there – that's just because I'm using a CROSS JOIN, and I'm okay with that.

The word 'expect' was very important in that last paragraph. The percentages there are based on the work that is expected to be done. Let's look at the properties of the two Nested Loop operators.

image image

Glancing down here, we see a lot of it is the same, but the Estimated Number of Rows in the first one is 4453 (which is correct), while the second one is only 100 (not quite so correct). The arrows on the upper side of the Nested Loops show the effect of this.

image image

The second one figures it'll have seen all the rows it needs to see before it gets a second row from the first table, whereas the second one things it might need 1.77058 rows (being 4453/2515). No wonder the second query thinks it's quicker.

Let's see where this 100 estimate comes from though. Perhaps it's that filter?

Based on the estimates of the rows going into the filter and coming out of it, it expects that the filter will see 100 rows and return just 30. This is bad too, but it's not as bad as the 100 v 4453 impact.

This thing actually comes from the way that the Query Optimizer has turned the WHERE clause into a TOP clause. It figures there's some doubt there, and guesses that 100 rows is probably not a bad place to start. When we give an explicit value (even using the DATEDIFF function), it can figure out what's there and use this value. The second query goes part way there and works out that the result of the DATEDIFF is 4453, but simply doesn't apply it fully to the Top operator, leaving us with that guess.

image image

It'd be nice if it could tell that 4453 is never NULL, and simplify this out a bit more, but it simply doesn't do this.

To round the post off, let's consider what happens if we're using SYSDATETIME() instead of the constant.


Oh! Now the first one is simpler still, leaving out the Row Count Spool operator, and thinking it's going to be cheaper than the second one. Having not trusted that figure before, does this mean the first one is actually worse? Well, we have an idea about where to look – the estimates on some of the arrows, particularly near the Top operator.

image image

Oh no! Our first query thinks there's now only one row going to be coming through. How awful! (Oh, and the number 4452 is fine, because I'm actually running this on March 11th, not March 12th, it's just that March 12th is T-SQL Tuesday this month, which is what I was writing the query for).

If you run this on your machine, hopefully you saw something different. Really.

You see, this problem has kinda been fixed, and if you enable the documented traceflag 4199, it should be better – for the first query at least.

By turning on trace flag 4199, and telling it to ignore what's in the cache, it will evaluate SYSDATETIME() for that particular execution, and therefore come up with the right value for the Top operator. It doesn't fix the WHERE clause version, but it does solve the TOP clause version.


The reason why I say this shouldn't've happened on your system is because you're probably using trace flag 4199 as a start-up parameter.

So there you have it… a query which might seem strange at first glance, but is actually a really nice alternative. Don't be afraid to use expressions in your TOP clause – it's a very powerful mechanism, and TOP is a great operator to see in your plans (because they will ask for fewer rows from your Scans and Seeks – remember how execution plans suck?). As is often the case, we're tripped up by estimates being out, but if you can see what's actually going on, you should be able to make good decisions about how to form your queries.