Following my recent post about 40,000 days, I got a couple of emails telling me that Excel disagrees about when the 40,000th day is. And this is true — Excel counts Day 40000 as July 6th 2009, not July 7th.
Unfortunately for Excel users, they're wrong. And it's down to the fact that Excel thinks that 1900 was a leap-year, when we all know it wasn't. You see, leap-years only fall on the century year if the year is divisible by 400. So 2000 was a leap-year, but 1900 wasn't, and 2100 won't be either.
What's more, this has been a problem for ages, and it was a conscious decision to introduce the bug. Essentially, Lotus 1-2-3 thought 1900 was a leap-year, and to be consistent with the market leader of the day, Microsoft treated 1900 as a leap-year too. And nowadays, we have a situation where backwards compatibility is more important. Today, if you have an application which knows that day 40000 is July 7th, then you'd better not treat your date that same way in Excel (or vice-versa).
The thing that I find really amazing with this is that Excel tells me that Feb 29, 1900 was a Wednesday. But it didn't exist — so what happened that week? Actually (and you can check other calendars for this, including Windows' one), Feb 28th was a Wednesday, and Excel gets the day of the week wrong for the first 59 days of its calendar.
I know you don't care, but perhaps you should — in case you ever write an application that needs to know what day of the week it is.
Working out the day of the week is really trivial. For instance, in SQL Server, you can generally ask for the DATEPART(dw,…) of the date in question, and get a number back, telling you what day of the week it is. It'll tell you 1 for the 1st day of the week, 2 for the 2nd, and so on.
Which is great, until you find that someone in your organisation says that Sunday is the first of the week, but someone else insists that it's Monday. In the movie industry, I think Thursday is the first day of the week. So then, when is the 5th day? In SQL we have @@DATEFIRST, which helps a lot, but a method I like to use is to count the number of days since a known Sunday (or whatever), and take the "mod 7". If that's zero, I'm an exact number of weeks since that known Sunday. It works nicely, and it's simple enough for everyone to understand (and it works regardless of location or other changeable settings).
But if you had picked your "known Sunday" in early 1900 using Excel, you'd've got it wrong, and your data might not work if you push your system out to SQL later (so pick something later — like 1901). I recently dealt with a date dimension that someone had put together in Excel and imported into SQL — if this data had gone back to 1900, then there would've certainly been errors in it (for a start, the import wouldn't've worked because SQL would've complained that Feb 29, 1900 wasn't a valid date).
My preference with date dimensions is to use a lot of computed columns, and only ever populate a single field. It works nicely, and it's almost no effort to extend the table to include extra dates when required.