The BigData Legacy

October 10, 2017

Trends come along, and trends pass. Some hang around for quite a while, and then move on, and some seem to disappear quickly. Often we’re glad that they’ve gone, but we still bear scars. We live and work differently because they were there. In the world of IT, I feel like this is all too common.

When ORMs became trendy, people were saying that writing T-SQL would be a thing of the past. LINQ was another ways that people were reassuring the developer community that writing database queries would never again be needed. The trend of avoiding T-SQL through ORMs has hung around a bit, and many developers have recognised that ORMs don’t necessarily create the best database experiences.

And yet when we consider what’s happening with Azure SQL Data Warehouse (SQL DW), you find yourself querying the data through an interface. Sure, that interface looks like another database, but it’s not where the data is (because the data is in the 60 databases that live in the back), and has to it translates our query into a series of other queries that actually run. And we’re fine with this. I don’t hear anyone complaining about the queries that appear in SQ DW’s explain plans.

When CLR came in, people said it was a T-SQL killer. I remember a colleague of mine telling me that he didn’t need to learn T-SQL, because CLR meant that he would be able to do it all in .Net. Over time, we’ve learned that CLR is excellent for all kinds of things, but it’s by no means a T-SQL killer. It’s excellent for a number of reasons – CLR stored procedures or functions have been great for things like string splitting and regular expressions – and we’ve learned its place now.

I don’t hear people talking about NoSQL like they once did, and it’s been folded somehow into BigData, but even that seems to have lost a little of its lustre from a year or two ago when it felt like it was ‘all the rage’. And yet we still have data which is “Big”. I don’t mean large, necessarily, just data that satisfies one of the three Vs – volume, velocity, variety.

Of these Vs, Volume seems to have felt like a misnomer. Everything thinks what they have is big, but if you compared it to others, it probably wouldn’t actually be that big. Generally, if people are thinking “BigData” because they think their data is big, then they just need a reality check, and then deal with it like all your regular data.

Velocity is interesting. If your system can’t respond to things quickly enough, then perhaps pushing your data through something like Stream Analytics could be reasonable, to pick up the alert conditions. But if your data is flowing through to a relational database, then is it really “BigData”?

And then we have Variety. This is about whether your data is structured or not. I’m going to suggest that your data probably is structured – and BigData solutions wouldn’t disagree with this. It’s just that you might not want to define the structure when the data is first arriving. To get data into a structured environment (such as a data table), types need to be tested, the data needs to be converted appropriately, and if you don’t have enough control over the data that’s coming in, the potential for something to break is high. Sorting out that mess when you need to query it back again means that you have a larger window to deal with it.

So this is where I think BigData is leaving its legacy – in the ability to accept data even if it doesn’t exactly fit the structure you have. I know plenty of systems that will break if the data arriving is in the wrong structure, which makes change and adaptability hard to achieve. A BigData solution can help mitigate that risk. Of course, there’s a price to pay, but for those times when the structure tends to change overly regularly, BigData’s ideology can definitely help.

We see this through the adoption of JSON within SQL Server, which is much less structured even than XML. We see PolyBase’s external tables define structure separately to the collection of data. Concepts that were learned in a void of relational data have now become part of our relational databases.

Don’t dismiss fads that come through. Look into them, and try to spot those things which have more longevity. By adopting those principles, you might find yourself coming through as a stronger professional.

@rob_farley

This post was put together for T-SQL Tuesday 95, hosted by Derik Hammer (@sqlhammer). Thanks Derik!