PASS Board elections coming up

It’s four years since I ran for election to the PASS Board now, for a two year term which ended two years ago.

The two other people that got elected when I did were Denise McInerney and Adam Jorgensen. Two years ago, Denise and Adam became the two Vice Presidents, and Adam is about to being a stint as President. For me, I didn’t run a second time. Two years ago, Jen Stirrup, Amy Lewis, and Tim Ford got elected when Rushabh Meta, Douglas McDowell, and I left the board.

And so we come around to election time again, and Jen, Amy, and Tim are all about to finish their first two years on the board. Amy is not re-running, but Jen and Tim are, along with two new candidates – Argenis Fernandez and Ryan Adams.

I’m excited, because PASS can’t lose here. The current board members seem to be doing an excellent job (although I didn’t get to serve alongside either Jen or Tim, I know them both well and am sure they’re doing great), but I also know that Argenis and Ryan will do great.

Argenis is a passionate guy who will stand for important things. I’ve had many conversations with him, and know him to be wise, caring, and dedicated to seeing good things happen. He raises money for charity, and is not afraid to put his reputation on the line for important things. He is one of my most excellent friends.

Ryan is also a great guy. He works hard for PASS, and has almost won the PASSion award for his dedication to the community. I haven’t had as many conversations with him as I have the other candidates, but I know that he will work tirelessly for the PASS community – that word ‘tireless’ seems to describe Ryan more than anyone else I can think of in the community, and I am a little jealous of his energy levels.

I can’t tell you who to vote for (except for Jen – as the only person running for the EMEA seat I expect everyone to vote for her), but do take the time to get to know the candidates and make your vote count – voting opens later this week.

@rob_farley

Superhero powers for managing the enterprise

This month’s T-SQL Tuesday is hosted by Jen McCown of @midnightdba fame. She wants us to write about strategies for managing the enterprise, and as a database consultant, I find myself giving advice on this kind of thing to customers remarkably often.

No, I’m not going to do stories about LobsterPot customers. We don’t do that. What happens between us and the customer stays between us and the customer. However, I can talk about the kinds of things that we look for when we talk to a customer.

The thing that I want look at in this post is about that twitch that you get when something doesn’t feel right. The ‘spider-sense’ feeling that Peter Parker gets when there’s something that’s not quite right.TSQL2sDay150x150

Experience is what helps people know what ‘normal’ looks like. I’ve heard stories that people who are trained to spot counterfeit bank notes don’t spend time with fake notes, they spend time with the real ones (on loan, presumably). They learn what ‘normal’ looks like, and get really familiar with it. That way, when something isn’t quite right, they can spot it and raise an alarm.

For DBAs taking care of an environment, whether small or large, they will have learned what ‘normal’ looks like. They should have benchmarks that tell them what how various metrics perform, and for those things that they don’t have formal metrics on, they should still be familiar enough to recognise when something isn’t right, even if they can’t tell exactly what. Their ‘spider-sense’ should tingle.

If you don’t have benchmarks, then get them. Every time you find something that doesn’t seem right, you will wish you had a benchmark on that thing to be able to quantify your suspicion. Feeling like something isn’t right is great, but it won’t be long until someone asks “How do you know this thing needs attention?” and “I just know” probably won’t cut it. If you’re a consultant, you can probably get away with “In my experience…”, because that’s what they’re paying you for, but having supporting evidence – actual numbers – can help, particularly if you’re the main person responsible and are needing to persuade someone to find the money for a CapEx.

Having the numbers handy is useful for a lot of situations, but there are a whole bunch of tools also available to look at too. A while back I wrote about how DBAs could use the same kinds of tools that other industries hire data professionals to provide, in a post called “DBAs and the Internet of Things”. If you take this kind of approach and start analysing the readings from all kinds of things that affect your database, then you can get ahead of the game. Feed this stuff into something like Azure ML for predictive analytics, and you might be able to have an even-better spider-sense, where the prediction isn’t just based on your own opinions, but on what has caused failures in the past.

Too often, the significant thing is some small detail that most people wouldn’t notice, but before of your experience and expertise, you can spot it and work out whether it’s significant or not. Then if you don’t have that particular thing benchmarked, or analysed by other tools, you can include it to see what’s going on.

…and develop superhero powers for managing your enterprise. It’s something we regularly recommend to DBAs.

@rob_farley

SQL 2014 queries under a strobe light

I just wrote a post about Live Query Statistics. Let me show you how this technology in SQL Server 2014 can be used for some amazingly cool (nerdy cool, at least) stuff.

Behind the scenes, LQS uses a DMV called sys.dm_exec_query_profiles. When you run a query with SET STATISTICS PROFILE ON, the engine puts data into this DMV for every operator in the plan that produces data (which doesn’t include the SELECT operator, or Compute Scalar, for example). What SSMS does while you’re watching a long-running query is poll this DMV over and over to get the progress data, so that you can see it in those moments between kicking off the query and its completion.

When you use LQS on a relatively quick query, say, one that completes in a single second, you can’t exactly watch this information come through. You might be able use a debugger, and pause operation of your server for a moment while you step through it, but this is far from ideal. And yet a query that completes in a single second might actually need some work. What if this query needs to be able to run many times per second, and you’re looking for strategies to tune every last bit out of it?

Clearly LQS is going to be no use.

But the workings behind it… that’s another matter. Let me show you.

When botanists are wanting to study what happens in the flight of an insect, they take lots of photographs, often using a strobe light to capture a moment with as little blur as possible. It allows for incredibly detailed images, like the ones you can see in this article from the Daily Mail in the UK.

(Image only linked from source – please let me know if it’s not there, but also go and look at the amazing pictures that this guy takes)

I don’t know how many times this insect flapped its wings between each image that was captured, but I think you’ll agree that with enough images, it would be possible to learn a lot about the way that the wing-flapping takes place.

This is the same as what I’m doing with queries in SQL Server 2014.

Suppose I run a query over and over, with statistics profile turned on. The code here runs my sub-second query over and over for six minutes.

I used an SSMS window for this, and told SSMS to discard the results. I only need to run it once to see the results – I just care about the profile stats. Incidentally, it returns 266 rows.

Now, I know that the plan that’s being used here is:

image

The Node_IDs of these operators are 1 (the Join), 2 (the Aggregate), 3 (the Scan on SalesOrderDetail), and 7 (the Scan on Product). These numbers have gaps just because of the inner workings of the Query Optimizer. And they go from left to right because that’s how a plan runs. The SELECT calls Node 1, which calls Nodes 2 and 7, and Node 2 calls Node 3.

So during five of the six minutes that my query was running over and over and over, I went into a different window and polled the DMV every half a second.

I had made a table which matched the DMV, with a couple of extra columns thrown in. One to record when I captured the moment (so that I could differentiate between moments), and one (called ‘comment’) to comment which query I was monitoring. The DMV contains the sql_handle and plan_handle, so I could’ve differentiated between them later, but I wanted to be able to differentiate between them more easily than that.

This data gave me 597 different values for ‘myquerytime’, 597 different moments captured. I don’t know how many different times my query ran in that time – probably far more than 597, although I wouldn’t have really cared if it were less. These 597 moments each had up to 4 rows, showing how each operator was going in its cycle.

There are columns in the DMV for all kinds of attributes, but the one that I was most interested in was the row_count, which tells me how many rows the operator has produced. There are columns about CPU, but as I’m looking at a query which runs in a small number of milliseconds, I’m more interested in the number of rows its produced.

Here’s a some of the data that’s in my table.

image

You can see that when I polled the DMV at 19:28:20.303, Node 3 (the Clustered Index Scan on SalesOrderDetail) had pushed 25241 rows, and no rows had been outputted by the other three. Half a second later, the snapshot showed 91044 rows from that node. Another half a second, and it was 54279, and in the poll at 19:28:21.807, the scan had served up all its rows, and there was output from the other three operators.

Each of these is like a strobe image, capturing a moment in the life of the query.

To recreate how the query runs, we need to piece them together. To do this, I’m going to assume that the outputted rows are done in the same order each time (which is a reasonable assumption when I’m running the query over and over in quick succession, with no changing parameters or conditions). By summing the row_count across all the operators in each poll, I can order the polls. A quick bit of pivotting…

…and I can see the query start:

image

..and the moment when the Clustered Index Scan stops outputting more rows:

image

…and the end of the query.

image

Notice that almost all the time we polled the DMV to see how the query was going, data was still being pulled out of SalesOrderDetail. It was only 96.1% (574/597) of the way into the query that data started to be outputted from the blocking Hash Aggregate. And because a Hash Match Join blocks until the hash table has been completed, we have to wait even longer before we eventually start pulling data from the Product table, when we pull up to 504 rows to find the 266 rows of interest.

I’m sure you can get a feel for how this query is running from this information.

But let’s compare this to what happens if I force a Nested Loop join instead of the Hash Match Join. This plan:

image

I repeated the collection of data, running the same query over and over but with OPTION (LOOP JOIN), and a different comment in the polled data. I also decided to use dynamic SQL to query my strobe table to save rewriting the pivot for each plan.

It started very similarly, but was very different towards the end.

The Scan seemed to go for a longer portion of the plan – 98.5% (587/596), but as the Hash Match Aggregate started producing rows, the Nested Loop was pulling the row from the Seek and returning it to the SELECT operator before pulling the next row in. You can see the row_count going up equally across the three operators, which is very different to what we saw with the Hash Match Join.

image

A factor I hadn’t considered before became evident at the end. In the Hash Match Join example, we saw a bunch of moments when all the rows had produced their data, which aren’t there in the Nested Loop example. The result of the query is no different, but the tear-down time is much quicker with the Nested Loop – presumably because the Hash Table used for the join doesn’t need to be dropped. I didn’t expect this to be as significant as it seems to be, but we certainly managed to catch six images – about 1% of them – when the operators had all stopped returning data, but the query was still hanging around as far as the DMV was concerned. With everything else being identical, I can only assume it’s down to the Hash Table having more to do to shut down than the Nested Loop.

Just for fun, I tested the pivot query itself, giving 582 strobe images.

image

image

The first thing to notice is that the Compute Scalar operators didn’t report anything, as expected.

image

The next is that the right-most Sort was blocking, and had a significant pause after the Scan finished – about 35 rows or 6% of the query time.

image

The Stream Aggregate doesn’t block, but the left-most Sort, which has called the Stream Aggregate (via two Compute Scalars) does, following which there’s another pause (but smaller – fewer rows), after which the Segment and Sequence Project operators don’t block.

image

At the end of the query we have about 80 rows – well over 10% of the query time – after the Sequence Project has outputted its last row.

image

image

Now – there is more that I will learn about this still, and I have been making some assumptions about whether the DMV polling gives a sufficiently random moment. But from the things I’ve seen, there is definitely information about queries that I haven’t seen before and which require some further research.

Finally, I spent a bit of time looking at visualisation options for this. I immediately thought of the Power View play axis that’s on scatter charts, but sadly I didn’t have enough luck coming up with an effective visualisation very quickly. I had felt like Hans Rosling with his scatter chart about birth rates, and used LAG and MAX() OVER() to come up with a speed of row production compared to the total, but I kept getting the “representative sample” message, which wasn’t conducive. I’m sure it won’t be long before this visualisation would be easy, and for the purposes of analysis, I was more interested in exploring the data rather than making it look pretty.

Strobe photography is very impressive. I just never thought it would apply to T-SQL queries.

@rob_farley