Being sure of your data

Victoria Holt (@victoria_holt) says "Data governance is really about 'data erudition', showing an interest in learning about the data we have, improving the quality, and creating a more productive and trusted data asset."

This is so much part of what I do as a consultant. What my team do as consultants. This isn't a post to spruik about LobsterPot Solutions though, it's a response to Victoria's T-SQL Tuesday invitation. This is month 144, 12 years, a gross. But nevertheless, I feel like it's worth talking about one of my favourite things we often do with our customers.

And that is to help them identify situations in their data that should not occur, so that they can fix it.

All database professionals should understand the significance of foreign key constraints, making sure that the thing (product, case, offer, location, etc) you're referring to actually exists. Or about check constraints to make sure that a delivery date isn't before an order date.

But the checks that we do are more about things that the database can allow, but are business scenarios that should never happen.

Plenty of businesses seem to recognise these scenarios all too well, and can point them out when they come across them. You hear phrases like "Oh, we know that's not right, it should be XYZ instead". And they become reasons why they don't really trust their data. It's a data quality issue, and every time someone comes across a data quality issue, they trust the data a little less.

Pretty soon this distrust means they become reluctant to automate anything, feeling they need to eyeball the data first. And the paralysis doesn't stop there – it seeps into just about any piece of information about the data. "Are we sure about this?" gets asked about every report.

Data governance can help this. Approved dictionaries for definitions is a start. Documenting processes is excellent. But also you need to discover which situations cause people not to trust the data, and develop ways to alert that they have occurred. This not only gives an opportunity to fix those situations, but to see that (ideally) they're happening less often. And eventually to develop trust that it's solved.

I've spoken before about the relationship between data quality and trust. That improved data quality can lead to trust. That trust with poor data quality puts you on dangerous ground. Our T-SQL Tuesday host Victoria commented about how data governance includes improving the data quality and developing a more trusted data asset. Data quality can lead to the trust, but only when it has been demonstrated repeatedly over time. Trust must be earned.

When we help our customers discover data quality issues, address those issues, and flag when they occur, they start to develop that trust. That trust can then form a strong foundation for so much more in data.

Never ignore the importance of data governance. Of developing trust that the data quality is strong. Go through the process of documenting everything and tracking everything, but also remember that the goal of data governance should be that trusted data asset.

@rob_farley

Leave a Reply

Your email address will not be published. Required fields are marked *