How I would re-imagine the PASS organisation

Maybe it's not a re-imagination. Maybe it's a return to what it once was. I don't know.

I'm writing this on my blog because I doubt I'm going to get an audience with the PASS Board of Directors any time soon. I've been relatively vocal about these thoughts for a while, but have never written them up.

And before I start, I want to mention a few things for context. I used to be on the PASS Board of Directors. I served for about six months in the second half of 2011 as an invitee, and then ran for election and served as an elected director for two years, 2012-2013. At that point I didn't re-run. I had discovered that the toll of being on the board from such an incompatible time zone was harder than I wanted it to be, and I didn't feel I was being effective when I had a hard time being physically present at meetings and events. The cost was great, and I didn't feel like my presence on the board was having the impact that the community might have wanted.

One thing I did learn from when I was invited to be on the board was that Microsoft saw PASS as a vehicle to reach the data community. Along with JRJ from the UK and Raoul from Denmark, we had been brought onto the board to help it be more globally aware. To find ways to increase the organisation's global reach, rather than having it just centred on North America. I don't feel like I was effective in helping with that, for reasons I'm not going to go into in this post. Instead I want to focus on that thing that I learned – that Microsoft saw PASS as a vehicle to reach the data community.

Let's be clear – I'm not in the room for the discussions about the future of PASS. I'm not a director, and I'm not running for re-election (because I don't want to have a two-year commitment to those people who might elect me). I offered to be part of discussions, but that hasn't amounted to anything. I do hope those discussions are actually happening, but I'm not in the room.

This post is about making my opinion known in the wider community and opening it up for comment. Maybe some people on the PASS Board will notice and see your comments too.

I think right now, most people associate PASS with their annual event the PASS Summit. But I don't think the PASS Summit is PASS's raison d'être. It's become the main focus of the organisation over the last ten years or so, maybe because it's the primary source of revenue, but I think the reason PASS exists should still be as a vehicle for Microsoft to reach the data community, and for the data community to reach Microsoft.

I've heard from people involved in other data community events – ones that are not PASS-branded – that PASS wouldn't give support because they weren't PASS-branded events. If the goals are to get people along to the PASS Summit, I understand that. If the goals are to reach the wider data community, then it's wrong. From the perspective of building PASS revenue, then protecting the PASS brand is good. From the perspective of being a vehicle between Microsoft and the data community, it's bad. Sadly, this felt consistent with the experience that I had while on the board, and continue to feel as a PASS group leader – that PASS' goals are about promoting Summit, not about the community.

Also, about 13 years ago, Microsoft Australia gathered lots of user group leaders together. Leaders from all the various technologies. One of the things that was communicated that day was that we shouldn't see Microsoft as a monolithic whale, but rather as a pod of dolphins, where each group is doing its own thing, communicating in its own way, but understanding the general direction of the pod. Once upon a time, the SQL Server Product Group might've been a single one of these dolphins – but now there are lots of different groups that might want to interact with the data community. And there are lots of data community groups that want to interact with Microsoft.

So here's how I imagine how PASS could be.

PASS kinda wants to be like in this diagram. Microsoft's way of reaching the data community.

And if this is accurate, if you remove PASS from the world, it looks like this:

And nobody really minds.

If the PASS organisation restricts its definition of "Data Community" to PASS-run events, then that's limiting the reach, and Microsoft will simply go directly to all the other events that run. Events like SQLBits, DPS, DataGrillen, 8kb, GroupBy, all of them. And that's what's been happening over the last dozen years or more.

Let's consider that pod-of-dolphins view of Microsoft. Let's also acknowledge that the Data Community actually means all the different events where people gather and connect and share and learn about data in the Microsoft world.

So now imagine that PASS was a vehicle between all the different data groups within Microsoft and all the different data groups within the community. Now it looks something more like this.

PASS becomes the "Enterprise Service Bus" (to draw on an analogy that's about as old as PASS) to serve as a vehicle between Microsoft and the data community. The various groups within Microsoft that want to reach the community can talk to PASS. The various groups within the data community that want to reach Microsoft can talk to PASS. PASS can be a facilitator, an enabler, a vehicle. Those events want something like the SQL Clinic? Talk to PASS. Those events want a bunch of Microsoft speakers? Talk to PASS. Microsoft wants to get some messaging out about some new thing? Talk to PASS.

In this model, if you remove PASS, it looks like this.

…which is actually what it kind of feels like now. When I run my local user group, I have to figure out who to approach to get to speak. I know quite a lot of people, but if I didn't, I would really struggle. SQLBits, DPS, and all the others have worked hard to establish relationships when a different model of PASS might've enabled it better. And what about new groups that are created within Microsoft? The community doesn't know about those groups, and those groups don't have the relationships with the people that run all these different events.

A model of PASS like this means that PASS is no longer a "Professional Association" of anything. It's about PASS-through communication. It's the Service Broker, enabling conversation. PASS hasn't been a professional association for a very long time, but it can still be a vehicle like this. Money would come from sponsors, particularly Microsoft, rather than events because PASS would be making the logistics between Microsoft and the community smoother. It would provide an actual service to both Microsoft and the community, one that would be paid for by Microsoft and by others who want to be part of the Microsoft + data community conversations. And this service doesn't disappear because of an interruption to the event calendar such as a pandemic, volcanic eruption, or terrorist attack – all things which have been problems before.

This approach also provides a way of letting the community know about events that are coming up that they might want to attend or speak at, because PASS could provide that centralised communication. It could be a central vehicle for other sponsors to reach event organisers (and vice-versa). And it could provide assistance for group leaders to run their groups – not by trying to control everything, but by offering advice. They could offer advice and maybe negotiate discounts for using tools like Sessionize and EventBrite rather than trying to provide all of those services themselves.

PASS would be how you the community reach Microsoft, and how Microsoft reaches you. No matter where in the world you are, and which events you're attending.

Please let me know what you think. Hopefully PASS and Microsoft are watching.

@rob_farley

Supermarket Seeks and Scans

At the PASS Summit 2015, I was giving a presentation about Query Plan Operators, and Kalen Delaney (@sqlqueen) was in the audience. She's kind of a big deal in the SQL world – I still remember the first time I met her. It was 2007 and she came up to me and said "I read your blog". I was a little star-struck, but we've been good friends ever since.

In that presentation, I was explaining Seeks and Scans, as I often seem to, and was reminded about the times I wander round the supermarket holding a list of the things I need to get. Because what I'm doing is essentially a Join between my list and the stock in the supermarket. And the way that I implement that join highlights some important ideas in the database world.

Kalen seemed to like my analogy. So much so that over a year later she casually mentioned it on Twitter.

I figured that it was about time that I explained more about this.

Plus, as the topic for this month's T-SQL Tuesday is analogies, hosted by Rob Volk (@sql_r), it's definitely a good time to write about it.

When I'm sent to the supermarket to pick up a lettuce, I know where I'm going. It's in the fruit and vegetables section. I'm good with that. I'll go straight there, pick up the lettuce, and I'm out. I'm not going to wander around – I'm not going wander down the confectionary aisle – I'm just grabbing the lettuce and leaving. This is somewhat like a Seek.

In fact, it's more like a Seek with TOP 1, because there are probably lots of lettuces, and I'm only going to get a single one. That's taking the analogy a little further, but it still works. It's one of the nice things about good analogies, and I totally think this is one of those. If I want to get a lettuce that is a particular quality of lettuce, then I might have to check a few of them before grabbing one (because the supermarket doesn't sort the lettuces into the good ones v the ones that look like they've been there a while), and that's like having to deal with a Residual Predicate. The more fussy I am, the more I might have to look through, and I risk getting no lettuce even if they have some fairly ordinary ones. If I want to specifically get the best lettuce they have (even if it's awful), then I need to do a Top N Sort on all the lettuces. That might be an expensive operation if there are a lot of lettuces.

I mentioned a minute ago that I wasn't going to go down the confectionary aisle. Good thing too, if there's a problem there. I'm sure we can all imagine the times when there's a problem down a particular aisle… analogous to a page corruption in a database, but if I didn't have to go there, then I can still do what I need to without being affected.

What if there's some sort of a crisis going on and I need to buy all I can get of something (I'm not meaning like all the toilet paper – in a crisis, other people might need some too). Like all the Ham & Pineapple Pizzas, because we've been asked to cater for a classroom of kids, and those kids don't understand the world yet. But the supermarket understands the world and only ever stocks like, three of them. I'm totally fine with grabbing all three pizzas and putting them in my shopping basket.

But what if that day they have over-ordered and they have fifty? Suddenly I'm needing more memory – I mean, a bigger basket – and I might need to do something differently. I kinda hope that never happens.

Back to when I have a shopping list, rather than a single item. At this point, I'm wanting to join between everything on my list with the things that match in the supermarket. If it's a short list, it might be best to find one thing, then the next, then the next, and so on. Even if I grab a lettuce and then grab a cabbage, which is right next to the lettuces! If my list is short enough, then that's fine.

When my list is quite long, I'm going to use a different strategy. There comes a time when it's going to be quicker to just walk through the aisles looking for things that are on my list. At first glance that sounds like the "tipping point" with a Seek+Lookup turning into a Scan, but I want to point out that this means we're anticipating having a bunch of rows being pulled into a Nested Loop operator and then doing a Lookup for each one, and that's a Join. Sure, we might decide not to do the join, but I'm looking at the join part for my supermarket analogy.

So if I have a long list I might not want to grab each item individually. Let's think about other options.

One option is to sort the list in my hand into aisle order, which is essentially "section". I know the sort order of the supermarket, so this is fine. I can start with aisle 1, and walk through, keeping my eye out for the things in my list in order. Brilliant. This is a Merge Join. It really is.

And this works pretty well, except that I need to order my shopping list first. That's one of the drawbacks of a Merge Join.

Plus, there are times when I might have picked something up, gone to move to the next section of the supermarket, but then I need to grab something else from that section. So if my sort wasn't down to the point where it's unique in the list, I might need to backtrack, which is really annoying and takes time. Now I'm basically doing a many-to-many join, and a whole ton of efficiency is lost.

Another option is to make sure I can see my whole shopping list, and walk up and down going "Do I need this? Is this on my list?" for every item I come across. At this point I'm doing a Hash Match. It can work, but I need to have that shopping list spread out, and I'm asking myself that question (creating the hash value and doing the probe) about everything.

One nice thing though, is that scenario where I don't know how long the list is because I'm getting text messages as I'm walking in. So I can start spreading out the list, thinking that a Hash Match might work out well, bracing myself for a long walk up and down all the aisles, and then when it turns out the list is short, I can decide to go to each item individually. That's Adaptive, and it's really handy when you don't know how much data you're going to be dealing with.

Shopping in a supermarket is obviously very different to querying a database. But the underlying concepts behind how we pull the right goods from the shelves definitely have some strong similarities, as I hope I've shown here. Analogies can help you learn principles by hanging them on concepts you already know. Maybe next time you go to the supermarket, you'll get a little better at understanding how your queries run.

@rob_farley

Why the PASS Virtual Summit 2020 is ridiculously good value for money

As a user group leader, I've probably mentioned to the people in my user group over a hundred times that the PASS Summit is excellent value, even if you have to pay to fly to America from Australia, stay in a hotel, and lose a week of billable time. The benefits you can get from spending time with the biggest names in the Microsoft data community are huge.

Obviously it's harder to spend time with people from the community when you're just interacting with them through a computer screen (although why not get used to that – if you can get the hang of chatting to these people through your screen, that can carry on all year!), but this is only part of the story I give as to why the PASS Summit is such good value.

The main reason why it's excellent value is the SQL Clinic (known these days as the Azure Data Clinic).

The clinic was always a great reason to have the PASS Summit in Seattle – it was simply easier for Microsoft to have a bunch of the people that already live in Seattle don white coats and hang out at the Summit around whiteboards, just so that attendees could wander up and get free consulting time (okay, they still need to pay to be at the Summit, but with no extra cost). I remember seeing former clients of mine there, who flew to Seattle from Sydney to attend the Summit and didn't sit in a single session because they (two of them) spent the whole three days in front of a whiteboard working through problems with two or three people from the CAT team and product group.

For two people to fly to Seattle from Sydney, stay in hotels, and pay for the Summit entrance fee, the cost would've been several thousand dollars. But the value of the consulting they got for that would've been significantly more.

Fast forward to 2020, and the Summit is virtual. So there are no flights to buy. No hotels to use. And the entrance fee is much lower.

But the clinic is still happening. It's mentioned right there on the "Microsoft at Summit" page.

The biggest pain might be the time zone, because I'm guessing those Microsoft people might not be available around the clock. But if I want that free consulting, I'm going to sacrifice the wee small hours of the morning (sorry, there's an instrumental cover version of that song playing while I write this) for it. These opportunities don't happen every week, and it's going to be worth sacrificing some sleep if I have some stuff to solve.

I've heard people complaining that the cost of the PASS Virtual Summit is really high, considering that it's an online event. But I don't think those people have noticed what you can get for the US$599.

I think the conversation goes like this: "Hey boss – for US$599 I can get access to Microsoft people for three days, as well as being able to attend all the conference sessions."

I suspect your boss will have stopped listening before you reach the "as well as…" bit of that.

So… I'll see you at the Summit?

@rob_farley