What’s driving your data model?

November 10, 2015

If data modelling were easier, I doubt there would be as many books on the subject, and we wouldn’t have multiple methodologies to consider.

I’m not going to explore the different methodologies here – that’s almost a religious argument these days, and I am more than happy to let you adopt whichever method you like. Instead, I want to challenge you to think about what is driving your design, and what makes you consider whether it suits your business or not.

Time and time again I see companies that use software to help them run their business. Sometimes this is an off-the-shelf system or a cloud-based solution; sometimes it’s a bespoke system built by software developers. I’m definitely in favour of using software, and wonder how people operate without it these days.

…but how much is your business driven by the software? I see a lot of businesses being led by their software, rather than having the software adapt to the business. For the most part, I’m fine with either. There is a lot to be gained by using systems developed by similar businesses, and taking advantage of lessons learned by others. Letting that software help guide internal processes can be very useful.

But I don’t think that applies to data models – you should at least consider how much it does.

I don’t like to write about specific customer situations, so I’m not going to describe a particular anecdote in great detail here. But I want to say that I frequently see environments where the models used within data warehouses don’t describe the business that’s going on – they describe the software that’s used.

Many data professionals look at a data warehouse as a platform for reporting, built according to the available data sources. I disagree with this.

The models within a data warehouse should describe the business. If it doesn’t, it’s a data model gone wrong.

What is the central thing that your business does? What is the main interest point? What do you need to look after? For me, this forms the core of the warehouse.

The core of the warehouse is not necessarily the main fact table, but could be one of the main dimensions. If you’re a store, do you care about sales, or do you care about customers? The difference is subtle, but might drive some important design elements.

Two clothing stores might use the same back-end systems for their point-of-sales systems, and might have the same loyalty system set up to persuade people to keep coming back. But one store might have a focus of getting customers back, driving brand loyalty which leads to dedicated fans and word-of-mouth sales. The other store might be more about piquing interest from people walking past the door, and trying to get them to come in and pick up a bargain. Of course, there will be an element of both in both stores, but the culture amongst the staff will be slightly different, as the first tries to identify the customer, tries to make sure that the customer feels cared for, and tries to form a relationship with the customer. It’s less important that the customer buys something, so long as they are going to return. The second sees the customer as a way to get a sale, while the first sees the sale (or even the lack of a sale!) as a way to get a customer. I’m sure you can think of stores in each category.

It would be very easy to create the same data warehouse for both stores, using a standard retail environment. But are the needs of the stores adequately met?

There is no doubt that both stores need sales to stay afloat – the retail business requires it. But if your business culture has slightly different concerns to the industry standard, then the data model should cater for that. Perhaps you need a way of scoring customer loyalty, and some path analysis to see what helps a customer reach a particular level of engagement. Perhaps you need to start collecting extra data. Maybe the stores could consider awarding points for simply visiting the store, even if no sales are actually made. Is the person who works from a cafe and buys just one cup of coffee all morning good for business, or bad for business? Can your data model help explore this, or are you designing a system which only handles the data in your transactional system?

I like to come back to the description of a data warehouse being the single source of truth for an organisation. Many people consider this an issue for data quality – that once data is in the warehouse, it’s trusted and can be used for business analytics. But it should go beyond that. The data warehouse should have transformed the data as kept by the various software packages into data which describes the business, becoming the source of truth about the business. The reports and dashboards across this data should help identify the culture of the organisation, by highlighting the its values and ideals.

The starting point for a data warehouse design should not be “What are the facts we need to measure?” but rather “What are we about as a business?” – often similar, but occasionally not. Ask what success looks like and what questions will address that.

Don’t ignore the bottom line, but also don’t ignore what’s really important to the business.

@rob_farley 

This post was prompted by the seventy-second monthly T-SQL Tuesday, hosted this month by Mickey Stuewe (@sqlmickey).

TSQL2sDay150x150

This Post Has 2 Comments

  1. Greg B

    I agree with the thrust of your post entirely, and recently put together a "45 minute lightning talk" on this sort of topic – how you look at your business drives your model design.
    I would like to highlight this point you made:
    > The core of the warehouse is not necessarily the main fact table, but could be one of the main dimensions. If you’re a store, do you care about sales, or do you care about customers?
    I think the primary fact table still is the core of the data warehouse. If the customer relation is the primary concern, then there should be multiple fact tables around customer engagement and relationship. The measures and KPIs from these fact tables should be presented more prominently than those from the sales fact table.
    Or, for another example. Say you have a slowly changing sales hierarchy (Sales Manager > Sales Team > Sales Rep). If all you do is report point-in-time sales according to that hierarchy, then it is a slowly changing dimension (and there’s likely a better model than including all those attributes in a single sales rep dimension).
    If you report on headcount frequently and length of time that a sales rep exists on a specific team, and the number of team transitions a rep makes, you can do this from the same SCD table (but again there’s a better model for this specific question), but at that point it becomes not a dimension, but a fact table.
    I think, ultimately, we probably agree pretty much on this point, but I’m just making the point that if you care about something, you should measure it, and if you’re measuring it, it’s now one of your fact tables. Even if you have the same physical table acting as a dimension to the sales fact and its own headcount fact, the logical separation is there.

  2. Rob Farley

    I think we do agree.
    I’m not suggesting that facts not be measured. I’m suggesting that the most important part of your model might not be the obvious fact table. Additional fact tables would be introduced to give a better picture of the information about the core concept – even to the point of turning a traditional dimension table into a ‘fact dimension’ table.
    The gist is around where your design starts and where the focus should be. I see too many organisations go through the motions because of the standard for their industry or because of the software they use. And I just wish they would design their models based on their individual core values.

Leave a Reply

LobsterPot Blogs

Blog posts by Rob Farley and other LobsterPot Solutions team members.

Search