I feel like every database project has major decisions now, which are remarkably fundamental to the direction that's going to be taken. And it's almost as if new options appear with ever-increasing frequently.
Consider a typical database project, involving a transactional system to support an application, with extracts into a data warehouse environment for reporting, possibly with an analytical layer on top for aggregations.
Not so long ago, the transactional system could be one of a small number of database systems, but if you were primarily in the Microsoft space you'd be looking at SQL Server, either Standard or Enterprise (but that decision would be relatively easy, based on the balance between cost and features), with extracts into another database, using Analysis Services for aggregations and Reporting Services for reports. Yes, there were plenty of decisions to make, but the space has definitely become more complex. If you're thinking about a BI solution, you need to work out whether you should leverage the SharePoint platform for report delivery, figure out whether you want to use the Tabular or Multidimensional models within SSAS, Project or Package within SSIS, and of course, cloud or 'on-premise'.
This month's T-SQL Tuesday topic, hosted by fellow MCM Jason Brimhall (@sqlrnnr) is on the times when a bet has had to be made, when you've had to make a decision about going one way rather than another, particularly when there's been an element of risk about it. These decisions aren't the kind of thing that could cause massive data loss, or cost someone their job, but nonetheless, they are significant decisions that need to be made, often before all the facts are known.
As I mentioned before, one of the biggest questions at the moment is: Cloud or "On-Premise"
I'm not going to get into the "on-premise" v "on-premises" argument. The way I look at it, "on-premise" has become an expression that simply means "not in the cloud", and doesn't mean it's actually on your premises at all. The question is not about whether you have a physical server that you can walk up to without leaving your office – plenty of organisations have servers hosted with an ISP, without being 'in the cloud'. It also doesn't mean that you're avoiding virtual machines completely.
So by 'cloud', I'm talking about a system like Windows Azure SQL Database. You've made the decision to adopt something like WASD, and are dealing with all the ramifications of such a system. Maintenance of it is being handled as part of your subscription. You're not making decisions about what operating system you're using, or what service accounts are being used. You're spinning up a database in the cloud, because you've made a decision to take the project that way.
WASD has a much smaller initial outlay than purchasing licenses, and the pricing model is obviously completely different – not only using a subscription basis, but considering data transfer (for outbound data) too. If you're comparing the cost of getting your system up and running, then the fact that you're not having to set up servers, install an operating system, have media for backups, and so on, means that choosing the cloud can seem very attractive.
But there are often caveats (how often are 'bets' made more risky because of a caveat that was ignored or at least devalued?).
For example, right now, the largest WASD database is limited to 150GB. That might seem a lot for your application, but you still need to have considered what might happens if that space runs out. You can't simply provision a new chunk of storage and tell the database to start using that as well.
You need to have considered what happens when the space runs out. Because it will.
I would like to think that every database system has asked this question, but too frequently, it doesn't get asked, or otherwise, the answer is disregarded. Many on-premise systems find it easy enough to throw extra storage at the problem, and this is a perfectly valid contingency plan. Other systems have a strict archiving procedure in place, which can also ensure that the storage stays small. But still, there are questions to ask, and a plan to work out.
To me, it feels a lot like what happened to Twitter in its early days. The concept of Twitter is very simple – it's like text messages sent to the world. But because the idea caught on, scaling become a bigger problem than they expected, much earlier than they expected. They were a victim of their own success. They worked things out, but there were definitely growing pains.
In the 1990s, many of us in the IT industry spent a decent amount of time fixing code that no one imagined would need to be still running in the futuristic 21st century. After the fact, many claimed that the problem had been over-exaggerated, but those of us who had worked on systems knew that a lot of things would have broken if we hadn't invested that effort. It's just that when a lot of software got written, no one expected it to be still be used in 2000. Those coders didn't expect to be so successful.
It's too easy to become a victim of success. I tell people that if they have done a good job with their database application, they will probably have underestimated its popularity, and will have also underestimated the storage requirements, and so on. I've seen many environments where storage volumes were undersized, and volumes which had been intended for one type of use now serve a variety (such as a drive for user database data files now containing tempdb or log files, even the occasional backup). As a consultant I never judge, because I understand that people design systems for what they know at the time, not necessarily the future. And storage is typically cheap to add.
But when it comes to Windows Azure SQL Databases, have a plan for what you do when you start to reach 150GB. Scaling out should be a question asked early, not late.