My first technical job

Technically my first job was in newspaper delivery, but Ken Fisher (@sqlstudent144)'s challenge is to write about my first technical job, which came much later. And no, I'm not planning to give out any information which could be used for guessing passwords and the like. I don't think any of my passwords are guessable and none of yours should be either.

So my first technical job was a long way away from databases. If you've come here to find out how to get into databases, then that's a different story. That particular story probably involves the fact that I did a half-major at university in pure mathematics, with subjects like Number Theory, Set Theory, and Logic, so that once I got a job that did involve databases, a lot of things felt very familiar.

But my first technical job was while I was at university still. I was juggling my studies with work that I had in the Japanese Department (formatting textbooks using Microsoft Word with software called Twinbridge), and the head of the Computer Science faculty John Crossley offered me work as a programmer, writing software that would hopefully prove one of his theories. He was a professor in both pure mathematics and computer science, and he figured I might be a good fit, as I was one of a very small group of students who was taking subjects in both those areas.

The idea was that if you could come up with a mathematical proof that for any value 'x' there must exist a 'y' with a particular set of values, then that proof should be able to be converted into a computer program. As mathematical proofs needed to use very very basic concepts to be accepted by the academic community, it should've be possible to represent that proof using markup, and then create a compiler to turn the proof into a program to find a value which satisfies said proof.

I don't think it ever got anywhere near completion. I worked on it for several months (although not full time), and I think most of the time was in breaking down proofs into the simple components that we knew worked. There was a bunch of time writing code in Caml (this was before Ocaml was a thing), but probably more time deciphering scientific whitepapers about mathematical theory and trying to figure out what John wanted me to focus on.

I think about this way more than I should, probably because it didn't really get anywhere. It seemed like an interesting concept, but it was purely academic and I eventually left that work to sit under a different professor who specialised in artificial intelligence.

Prior to this time working within the university I'd had jobs in supermarkets and video stores, plus the newspaper delivery stuff that I mentioned at the start. While the supermarkets and video stores gave me some interaction with customers, the work at the university was more similar to the consulting I moved into after finishing my degrees, as I became practised at discovering what my clients needed and breaking the work into manageable chunks. And being able to juggle hours around classes without commute time was amazing.

These days I always encourage students to get to know their lecturers, to find out what they are interested in, and to see if there's the chance to get involved. It might not lead to much, but it's good practice for the real world, and the convenience of working on campus is huge.

And for myself (and hopefully everyone within LobsterPot Solutions), we still take the time to discover what our clients need and break the work down into pieces to figure out the key components.


Advice about T-SQL for beginners

Following on from my last post… what advice would I give about T-SQL to my younger-self?

Well, for that I'm going back a long way. To when I had learned about queries, but still had some way to go.

It's the same advice I give to everyone who's learning to write T-SQL, even the most basic of beginners. And that is to understand that the queries you write get turned into execution plans, and it's them that actually run.

Your query is only part of the story. It also depends on everything else the system knows about the environment your query is running in. It looks at the indexes (even ones that aren't being used for your query), statistics, the parameters you are passing in, whether or not the query has been run before… you get the picture.

So when someone says that their query isn't running the way they'd like – it's quite often because of other factors. What might look like an excellent query might be being spoiled because of an atypical parameter that was used to create a plan that's now in cache. Or that index that would be amazing might have been disabled. Or there might be a unique index that has been created, so that the system thinks there's a different way to run the query, or the system might think there are no rows in the table for today.

Know about the plan. Seek to understand it. Get your head around the different ways that your query could run, and the things that will cause the Query Optimizer to choose a different plan.

It took me a few years to work this one out. These days I teach it to everyone who gets into T-SQL. Right from day one, I teach them that even the simplest of queries could be turned into an unexpected plan.


Advice, wasted on the young

In 1997, Baz Lurhmann released a song called "Wear Sunscreen", based on text written by Mary Schmich (although I had to look that up). It was a hypothetical speech that would be given to a graduating class and while it contained a bunch of good and interesting advice, the title of Schmich's essay was "Advice, like youth, probably just wasted on the young".

This month, the T-SQL Tuesday writing challenge is to comment on what advice we'd give to a younger version of ourselves – invited by Camila Henrique – so it's probably worth highlighting that I probably would've ignored it. I'm sure I wasn't good at taking advice. I'm not sure I'm any better now, despite not being young any more.

Advice to our younger selves tends to imply regret. Things that we'd like to change. Advice to make different decisions or to establish different habits. Advice to cherish things before we lose them. I suspect Schmich suffered from things prevented by sunscreen, or lost friends and relatives. Or maybe she just wanted to have something that would be a good hook for a callback at the end, for the line "But trust me on the sunscreen".

And the advice I would give depends massively on what age me I'm giving the advice to. There are things that I would tell me at 4 or 5, that are different to what I would tell myself at age 9 a year before my dad died. If I'm talking to myself during my high school years, I would give difference advice compared to what I would tell myself as my kids were growing and going through their own stuff. Maybe I'd warn myself about failed projects, and lost data, and I'd say yes to that thing in 1999.

I would want to say that things will work out. Except that they don't. When people die, it doesn't feel like things are working out. When friendships break down, they're often not going to get resolved. When employees quit or clients leave, well I'm sure you get the idea.

But time keeps on going. The things in the past will remain in the past. We can't always get them back. Things will hurt. Life will suck. But there will also be times so full of joy that all the shadows disappear for a moment.

I'd tell my former self that. That there will always be good times in the future.

In fact, I'll tell current-self that. And maybe my future-self too.


When should your user group meet?

This is a tough question. No matter what kind it is, finding the right time for a group of people to meet is hard. Even churches don't have it easy – the traditional Sunday morning doesn't work for many people, and while churches try to figure out how to cater for as large a demographic as possible, they generally figure that most people will turn up on a Sunday if they want to, and they run extra meetings at other times to try to provide alternatives.

A technical user group is a very different beast, so it's good that Rie Merritt (@IrishSQL) has asked us to write about user groups for this month's T-SQL Tuesday. I mean, if someone wants to go to church, there are plenty of choose from. In most parts of the world, there aren't exactly an abundance of Microsoft Data Platform user groups to choose from, so hopefully the time that the current one meets works for the people who want to go.

In the new "virtual world" since the start of Covid, the availability of user-group style content online has expanded massively. Presentations are more likely to be delivered from the presenters home nowadays, with the audience members in their own home. At that point, what's to stop the group members being spread out around the world? Nothing. Except that people typically attend the meetings that they feel a connection to, and a group that is hosted from your local area is more likely to hold that connection to you. They speak your language, are experiencing the same part of the day, all that. And that's just about the host, regardless of who the presenter is. My group can host a presenter from Canada, but most of the attendees will be from Adelaide because they feel a connection to the group as a whole, and to me the host. The presenter is a guest for the month. It's different.

So if it's just content you want, the time and day of meetings isn't relevant. Churches know this too, and many have seen attendance drop as the congregation members that just want content have become happy to tune into the streaming services from any church, not just their own. I think they miss the concept of a 'congregation' being 'those who congregate'. They've become an audience at that point. They listen, without meeting.

Not that there's anything wrong with that.

If your group is simply trying to provide content, then fine. There are a lot of virtual groups and virtual conferences with live presentations so that the attendees can talk to the presenter, ask questions and have them answered. This isn't entirely about the content, but it's still quite different to in-person meetings, especially when considering when is best to meet.

An in-person meeting doesn't just need attendees to block out the time of the meeting. They will need to travel there, maybe find a park, and so on. The timing is definitely harder to work out.

And so I would encourage you to look into how your city works. Do people typically try to rush off home at the end of the day? Do many people stagger their days so that they start early and also end early? Is it hard for people to get into town? Do people gather down the pub after work, and delay going home until later? Could that happen at a user group venue?

These are the types of questions that mean I run the Adelaide Data & Analytics User Group at lunchtime. Adelaide is a pretty small city on the whole – it's easy to drive around, and most people who work in the suburbs do tend to drive. But many also try to get in early and leave early – the peak traffic seems to start around 4pm. Restaurants in the city seem to close early during the week. And so I don't try to host a user group in the evening. I do it during the day. On people's company time, so their bosses can be aware of the professional development event they're attending. Not in family time, which is even harder for people who are single parents or who have other responsibilities in the evening.

I'm sure there are plenty of people who can't come along to the meetings because I host them at lunchtime. And I hope that there aren't particular segments of society that I'm discriminating against. If I ask the group if it's a good time for them, the answer will be skewed because the people who can't make them may have unsubscribed and not provide an answer. If I change when it is, there's every chance I'll lose more people than I gain.

So research what works in your city. Explore the culture, come up with an idea about what might work, and then go for it. If your group is worth being at, people will work out how to get there. Do make sure there's no discriminating factors though.


Upgrading SQL

Normally I get these monthly posts done early on T-SQL Tuesday (the second of the month), but that didn't manage to happen this month. Maybe it's me who needs an upgrade…

Upgrades are the topic du jour, hosted by Steve Jones (@way0utwest) over at his blog. He asks about our strategies for upgrades, how we test, how long it takes, all that. So I figured I'd wax lyrical about upgrades for a bit.

Azure SQL DB is a whole nother matter when compared to SQL Server, of course. With Azure SQL DB, the environment is frequently upgrade behind the scenes, hopefully without our noticing ("us noticing" sounds better there, but it's just not correct… hmm…). We can set the compatibility level, which is when we choose to adopt an updated feature set, such as moving from SQL 2017 to SQL 2019, but the basic hot-fixes and the like are done for us behind scenes.

And yet we don't tend to take this approach with on-premises installations.

Largely because we've been burned before. And (to mix metaphors) once bitten…

Annoyingly there are times when a cumulative update gets released a little prematurely. They're rare, but they do happen from time to time. That's slightly terrifying for someone who wants to protect their server from vulnerabilities. Particularly the one that that cumulative update is designed to fix.

This creates a dilemma. Should the update be applied the moment it becomes available, or should you wait for a couple of days, in a potentially vulnerable state? There's arguments on both sides. If it was Azure SQL DB, the decision wouldn't be yours. And if they found a problem with the update, Microsoft would be getting that fixed for you as soon as possible too.

A significant upgrade, such as between major versions of on-premises SQL Server, is definitely more effort. I like to think that backwards compatibility is a well-understood thing these days, and that SQL 2022 won't break anything that was being done in SQL 2016. But with no absolute guarantee of that, there's always a bunch of work to test things. Less if you have automated testing. But a lot of systems (I'm a consultant – I see plenty of bad as well as good) just aren't set up like that.

And then there are the vendors who don't want to support a later version. And who end up writing code that is only compatible with SQL Server v.Old, because they haven't been able to persuade all their customers to move onto a newer version yet. And that means that other applications don't get upgraded… it's like a recursive CTE of old versions of applications and platforms, and you wonder when you'll ever get to upgrade.

The key to all of this is DevOps. Automated testing so that you know that what's being deployed is going to work. Even if the thing that's being deployed includes a newer version of SQL Server.

I don't envy application vendors who don't have a strong DevOps story. They need to get it in place so that they can scale, but it doesn't happen overnight. The road to good deployment practice is long and is threatened by all kinds of things. Code coverage is rarely complete, and problems seem to find those places that don't have good testing in place yet (typically because problems are avoided in the areas that do have good testing). All this is so much easier when a project is starting from scratch, and not the culmination of a decade or more of development needing to be compatible with the last four versions of SQL Server and Windows.

That backward compatibility I mentioned… for the most part, I find that code does survive major version upgrades as far as correctness goes. And even the most covering test environments may say things are okay. But performance changes, such as in the Query Optimizer, mean that upgrades will almost always present some amount of pain.

It's worth it though. Running the latest version means you're getting the most attention in the way of fixes. It means you can attract the best employees to work on it. And you can take advantage of the new features.

So get your DevOps story happening. Include tests around performance (or maybe execution plan shapes), and get used to the process.

Upgrades are worthwhile. Get them done.


Preconceived notions

Andy Yun (@sqlbek) invites us to write about preconceived notions.

There are plenty that I see and challenge, but I'm going to describe how I changed my thinking about databases back in the 90s.

At university I studied Computer Science, which felt like it was mostly about algorithms and paradigms. It covered how to approach particular kinds of problems, what languages suited what problems and why, and how to model things. The answer to a lot of things was "C', whether it was a multiple choice question, or the question about which language would be used to solve something.

I skipped the database subject. Everyone said it was overly basic, easy marks, and not particularly interesting. I wasn't interested in it. Not when there were subjects like Machine Learning where we'd implement genetic algorithms in LISP to find ways to battle other algorithms in solving the prisoner's dilemma. Or the subject where we'd create creatures (in software) that would follow each other in a flocking motion around a cityscape. Everything I heard about databases was that they were largely of no importance.

I landed a job with about five months in my degree remaining. I was due to start just after New Year. A weekend and a public holiday after my final submission was due. When they offered me the job they handed me a book about PL/SQL, and it made sense. I had been right to skip the database subject – it seemed simple and of little consequence. Just somewhere to store the values that the application needed. I was more interest in the algorithms and internal data structures that I'd be coding against the data.

However, once I had got into the job, i started to see that the paradigm I was dealing with was slightly different to the ones I'd learned about at uni. The difference was that most people didn't really see it. The paradigm was that we were dealing with database applications. The databases were central to everything that our applications did. Sure, the applications would involve user interfaces, which is what most people saw as the critical bit, but I saw that the bit that really mattered was the database aspect. And the code that ran on the database was different again. I heard a lot about set-based algorithms, and quickly saw how to marry the concept of a set-based query with the row-based impact I had in my head. But even then I had got the paradigm wrong.

The misconception that I had was that the paradigm about database applications was the database-centric approach. I thought I had been able to discern this where others had thought that the key was the user interface or some other aspect. I knew that the data was the most important piece of the puzzle. That you needed to be able to protect and preserve that above all. That success was about making sure that the right stuff was happening in the database.

But I was wrong.

Yes, the database is important. It really is. I know that even if the user interface is gone, having the data can keep the business alive.

But the most important piece is the people.

Whichever people they are – users, customers, suppliers, operators, managers, whoever. They are the most important piece. The data needs to serve them. Sure I can make the database be quick, and that can make life less frustrating for the people. I can make the data tell stories and give insight, that can make life more interesting for the people. But the data is not what it's about. The data is there to serve the people.

I think this is why I like being a consultant. I get to help people with their stuff. What I do for our customers doesn't even always involve the database. Sometimes it's solving problems without the database. Sometimes it involves doing data things that make me wince, because I know that I would never choose to do it that way if it wasn't for the impact on the people. But that's when I have to remind myself that my paradigm – the way I approach problems – should always be people-first. Not data-first.


The thing I miss about conferences is the people

Today has sucked.

I got home from being out last night, and saw a Facebook post from Euan Garden's wife saying that he'd died. Euan was a good friend who worked at Microsoft in Seattle. We chatted online every few months. The last time I saw him was in 2019 at a conference in Seattle, as I was helping him into an ambulance. He had a congenital heart problem, which finally beat him. And I'm sad. I'm sad that I didn't get to him more often. I'm sad that we didn't talk more than we could've. I'm sad that he leaves his wife and daughter to live on without him. I miss my friend.

And then when I woke up this morning, I saw a post from another friend, saying that Dean Varga had died too. I only knew Dean as Stacia's husband, but he embraced the SQL community and we embraced him. I'd known Stacia for many years before she married Dean, and I knew that when she was with him she was the happiest I'd ever seen. We loved Dean because Stacia loved him. I'm heartbroken for Stacia too today.

Today being T-SQL Tuesday, I'm responding to a post. It's Xavier Morera's question about conferences. He asks about our favourite conferences, our favourite presenters, and all that. Right now it's just about the people that I haven't seen for ages. The distance is always large, because I live in Australia. So I don't really care where things resume. I just want to see my friends again. The best conferences are the ones where people are at.


Being sure of your data

Victoria Holt (@victoria_holt) says "Data governance is really about 'data erudition', showing an interest in learning about the data we have, improving the quality, and creating a more productive and trusted data asset."

This is so much part of what I do as a consultant. What my team do as consultants. This isn't a post to spruik about LobsterPot Solutions though, it's a response to Victoria's T-SQL Tuesday invitation. This is month 144, 12 years, a gross. But nevertheless, I feel like it's worth talking about one of my favourite things we often do with our customers.

And that is to help them identify situations in their data that should not occur, so that they can fix it.

All database professionals should understand the significance of foreign key constraints, making sure that the thing (product, case, offer, location, etc) you're referring to actually exists. Or about check constraints to make sure that a delivery date isn't before an order date.

But the checks that we do are more about things that the database can allow, but are business scenarios that should never happen.

Plenty of businesses seem to recognise these scenarios all too well, and can point them out when they come across them. You hear phrases like "Oh, we know that's not right, it should be XYZ instead". And they become reasons why they don't really trust their data. It's a data quality issue, and every time someone comes across a data quality issue, they trust the data a little less.

Pretty soon this distrust means they become reluctant to automate anything, feeling they need to eyeball the data first. And the paralysis doesn't stop there – it seeps into just about any piece of information about the data. "Are we sure about this?" gets asked about every report.

Data governance can help this. Approved dictionaries for definitions is a start. Documenting processes is excellent. But also you need to discover which situations cause people not to trust the data, and develop ways to alert that they have occurred. This not only gives an opportunity to fix those situations, but to see that (ideally) they're happening less often. And eventually to develop trust that it's solved.

I've spoken before about the relationship between data quality and trust. That improved data quality can lead to trust. That trust with poor data quality puts you on dangerous ground. Our T-SQL Tuesday host Victoria commented about how data governance includes improving the data quality and developing a more trusted data asset. Data quality can lead to the trust, but only when it has been demonstrated repeatedly over time. Trust must be earned.

When we help our customers discover data quality issues, address those issues, and flag when they occur, they start to develop that trust. That trust can then form a strong foundation for so much more in data.

Never ignore the importance of data governance. Of developing trust that the data quality is strong. Go through the process of documenting everything and tracking everything, but also remember that the goal of data governance should be that trusted data asset.


Go-to scripts

Actual John McCormack (@actualjohn) asks about our go-to handy short scripts. It's a nice question, because some of the first programs I ever wrote were short and featured the GOTO command.

10 PRINT "Rob was here"
20 GOTO 10

It was a Commodore machine in a department store. I don't remember if it was a Vic20 or a Commodore 64 – I think it was probably the former, because we got a Commodore 64 at some point later. I had seen another kid do similar, and I guess I learned three commands that day and the principle of line numbers. It wasn't until years later that I understood what Edsger Djikstra meant.

Of course, this isn't the GOTO script that John was asking about. He wants to know about the tools we use, the ones that are scripts that we turn to regularly. I had to think about this, to think about what I find myself typing at client sites, and when, and why.

As a consultant, the kind of work that I do from customer to customer can change a bit. Sometimes I'm reviewing people's environments; sometimes I'm performance tuning; sometimes I'm developing code or reports or cubes; sometimes I'm writing T-SQL, but it's often DAX or PowerShell.

But often, I simply need to figure out where things are being referenced. The "View Dependencies" can be handy, and gives me a pretty tree with branches that I can expand, but just isn't as useful as running:

  , object_name(object_id)
from sys.all_sql_modules
where definition like '%TheThing%'

, because this lets me see the definition if I choose to show it, or do something with the results. Plus if I run it in Azure Data Studio rather than SSMS, I can easily save the results to a file, even in Excel.

And yet this still isn't my ideal.

My ideal scenario for this kind of situation is to look through the source control repository, where it'll list the files, show me how often TheThing is mentioned, show me where it is, still be easily selectable, and I don't even have to connect to a database to do it.

And maybe that's the thing… many of my useful scripts get replaced by other systems which are more useful. The ones that don't tend to be about data rather than the database, and then it's less about having a go-to script, but about being fluent in the languages, having a good speed for typing, and getting used to the business logic at my customers.


Infrastructure as code

For T-SQL Tuesday this month we've been asked by Frank Geisler (@FrankGeisler) to write about how we use scripting to ensure consistency in our provisioned Azure platforms.

Of course! This is definitely important. Whenever something needs to happen multiple times in a consistent way, scripts are key to this. But let me wax lyrical for a moment about how this isn't new.

You see, back in the days when all our data was on-prem, we would run through the SQL Server installer, choosing the various options and hitting Next. But the best thing about this installer was that it would create a file called Configuration.ini, which could be used to perform another installation with the same configuration.

This was critical to being able to spin up a separate server. And because these files remained on the server even if someone had just used the installation wizard, when an additional machine needed to be stood up, we could just grab the ini file and run an unattended installation, confident that we had the same collation settings and everything. (Don't get me started about the kind of pain you'd have by picking the wrong collation in an install.)

If you're reading this and the idea of using scripts is alien to you, please change your habits.

When you see a dialog box in SSMS, use it of course, but then use the Script button and hit Cancel. Apply your change by running the script, ideally through a proper deployment process. This is such a good habit to get into, I often wonder whether we'll one day get a setting in SSMS to disable the "Ok" button on dialogs. I would certainly be enabling such a thing at my clients, to reduce the likelihood of ad hoc (and even accidental) changes.