Converting Points to a Path

Suppose your SQL table has a bunch of spatial points (geographies if you like) with an order in which they need to appear (such as time) and you want to convert them into a LineString, or path.

One option is to convert the points into text, and do a bunch of string manipulation. I’m not so keen on that, even though it’s relatively straightforward if you use FOR XML PATH to do the heavy lifting.

The way I’m going to show you today uses three features that were all introduced in SQL Server 2012, to make life quite easy, and I think quite elegant as well.

Let’s start by getting some points. I’ve plotted some points around Adelaide. To help, I’m going to use Report Builder to show you the results of the queries – that way, I can put them on a map and you can get a feel for what’s going on, instead of just seeing a list of co-ordinates.

First let’s populate our data, creating an index that will be helpful later on:

Great. Starting at the corner of Currie and King William Streets, we wander through the streets, including Leigh St, where the LobsterPot Solutions office is (roughly where the ‘e’ is).

image

I’ve labelled the points with the times, but it’s still not great viewing. Frankly, it’s a bit hard to see what route was taken.

What we really want is to draw lines between each of them. For this, I’m going to find the next point in the set, using LEAD(), and use the spatial function ShortestLineTo to get the path from our current point to the next one.

I didn’t need to use pull back the fields geo and nextGeo, but I figure that the lineToNext column might be confusing at first glance, since it uses the subsequent row’s position as an argument in a function on the current row’s position. Anyway, hopefully you get the gist, here’s what it looks like.

image

This is way better – you can see the path that was taken, and can easily tell that the route didn’t just go straight up North Tce, it ducked down Gawler Place instead.

What’s more – with each part of the journey still being a separate row, I can colour each part differently. You know, in case I don’t like the “Tomato” colour in my last example (yes, that colour is called “Tomato”, no matter whether you say it “tomato”, or “tomato”, or even “tomato”).

To colour it differently, I’m going to throw in an extra field, which is just the number of minutes since we started. I’ll use the old fashioned OVER clause for that, to count the number of minutes since the earlier time.

image

Cool – now I can easily tell which end it started at (the more tomatoey end), and where it ended (the paler end). Each segment is the same colour, but that’s okay.

Now, I said I’d use three SQL 2012 features, and so far the only new ones have been LEAD and ShortestLineTo. But remember I still have several rows, and each section of the route is a separate line. Well, to join them up, I’m going to use 2012’s UnionAggregate function. To use this, I need to use a sub-query (I’ll go with a CTE), because I can’t put an OVER clause inside an aggregate function.

Now I have my solution! I’ve converted points into lines, in the right order.

image

You may be wondering how this performs – what kind of execution plan is going to appear.

Well it’s this:

image

image

Look at this – there are Stream Aggregates (which just watch the data as it comes through, popping rows out when needed, but never holding onto anything except the aggregate as it grows), a Spool (which is used to do a bit of the windowing trickery, but also holding onto very little), and the Sequence Project & Segment operators which generate a row_number as a marker for the lead function. You might be interested to know that the right-most Stream Aggregate has the following “Defined Value” property:

For each group (which is defined as the row), it uses the LAST_VALUE of geo, and ANY of geo. ANY is the current one, and LAST_VALUE is the row after it. It’s the last row, because the Spool gives up two rows for each ‘window’ – the current row and the lead row. In this scenario, with 9 rows of data in the index, the Spool pulls in (from the right) 9 rows, and serves up (to the left) 17. That’s two per original row, except the last which doesn’t have a lead row.

So the overhead on making this work is remarkably small. With an index in the right order, the amount of work to do is not much more than scanning over the ordered data.

Finally, if I had wanted to do this for several routes, I could have put a RouteID field in the table, used PARTITION BY RouteID in each OVER clause, and GROUP BY RouteID in the final query. If you do this, then you should put routeid as the first key column in your index. That way, the execution plan can be almost identical (just with slightly more explicit grouping, but with identical performance characteristics) to before.

But I don’t have a picture of that, because that wasn’t the query I was wanting.

Write-BlogPost

A couple of years I ago I was going to write a song about automation, in reggae style, which could maybe have been used by the Trinidad SQL community – particularly Nigel Sammy (@nigelsammy). The theme was going to be around the fact that you need automation because the sun shines and the beach is calling.

But of course, automation is about so much more than freeing up time for the beach (even here in Adelaide, where every weekday this week is set to be over 40C). Automation helps you be consistent in what you do by removing manual steps, and lets you focus your attention on the things that require thought, rather than being the same as always.

TSQL2sDay150x150This month’s T-SQL Tuesday is about automation, and I thought I’d write about how a few of my favourite applications help me massively in the quest for better automation. The host-post asks about what has changed since the last time automation was a topic, but that time I mainly looked at Policy Based Management, which is great for being able to make sure that things happen. This time, I want to look particularly at the things I use to develop repeatable commands, thereby reducing how much I have to do compared to how much can be done by the machine.

SQL Server Management Studio (SSMS)

The Script button in dialogs! Oh how I love it. In fact, I wish that there were no OK button on dialog boxes in SSMS. I would be perfectly fine with a “Script and Close” button instead. I know I could have an Extended Events session or Trace running to be able to pick up what has just been run on the SQL box, but that doesn’t quite cut it. When I hit the OK button, I don’t actually know what commands are going to be run. I’ll have a good idea, of course, but if I’ve been tabbing through options and accidentally changed something, I might not have noticed (ok, I’m sure I will have, no one ever makes that mistake in real life). Even more significantly though, I might want to be able to run exactly the same command against another server. The Script button is amazingly useful and should be used by EVERYONE.

gVim

While I was at university, I used Unix a lot. My PC at home ran Linux, and I shuddered whenever I’d find out I had to use a Microsoft environment. It’s okay – I got over it – but one thing that remains is my appreciation for the text editor vi. I was pretty much forced to use it for a long while, and for a good year or more, I think I learned a new way of doing things almost every day. Just about every time you’d sit with someone else and work with them, you’d see something they’d do and go “Oh, how did you do that?” Of course, they’d reply with “Oh, that’s just pressing star”, or something like that. It was a good time, and I developed an appreciation for vi (and later, vim, and its Windows client gVim), which has stayed with me. Still I find myself opening Visual Studio and filling a row with ‘j’s as I hope to scroll down through the code.

From an automation perspective, gVim is great. The whole environment is based on keystrokes, so there’s never any reliance on putting the mouse cursor somewhere and clicking. Furthermore, I can hit ‘q’ and then record a macro, playing it back with @ (ok, it’s actually q followed by another letter, in which you store the macro, and @ followed by the letter for the macro of interest). This makes it great not just for writing code, but editing all kinds of text. I like Excel for being able to use formulas which can be repeated across each row, but I also find myself leveraging gVim’s macros for doing things even more easily – and navigating multiple lines.

PowerShell

I so wish that Windows had the macro-recording concept of gVim, or the Script button of SSMS. It would be really nice to be able to go to some spot in the Registry, or some Control Panel dialog, make some change, and say “And please give me a Script for what I’ve just done!” (If someone knows how to do this, PLEASE let me know)

But even so, PowerShell is tremendously useful. In my Linux days I would control everything through a shell environment (I preferred tcsh for some reason – I forget why – bash was good too, of course), and as such I could look back at what I’d just done, store scripts to repeat things another time, and so on. I don’t get that feeling with Windows, but PowerShell helps. I feel comfortable loading up a piece of XML in PowerShell (even an Execution Plan), and I love how easily I can move around XML in PowerShell.

Of course, every month I write a post for T-SQL Tuesday, and it would be quite neat to have a script that would automate that for me. But there are plenty of things that I don’t have automated (and may never do), and putting blog posts together is probably going to remain one of those. I can’t see myself creating a fully-automated Write-BlogPost cmdlet any time soon.