In my previous post I mentioned that there is less interpretation for the designer in data vault modeling than in dimensional modeling.
Let me elaborate on that with an example.
The question that I received in the workshop was what to do with the age of the customer at time of the transaction. Is it a fact or a dimensional attribute?
The age is something that could be calculated by using the customer’s birthdate and the date of the transaction. In this case, the dimensional attribute would be birthdate. The fact table should hold a reference to the time dimension representing the transaction date.
Depending on the number of fact records however, there could be a negative impact on performance when calculating the age on the fly, but this can probably be neglected.
But what if the age is supposed to be used to used to determine if it falls in a certain range? What if these ranges are variable and only known at query time? And what about the average age of the customers buying a particular product within a certain time frame?
Could the age be a fact measure in those cases?
Some designers would argue that it is a derived measure that doesn’t need to be stored in your fact table. I agree, but it does require your query reporting tool to be able to handle it all.
Others would argue that the age is a measure that you better store in the fact table. It wouldn’t be exactly an additive measure, but at least you could answer the “average age” somewhat easier.
You could even argue that it is both something dimensional and factual. And that’s true as well. The birthdate of the customer is dimensional and the age could be used for filtering facts.
It should be clear that it depends on the context. If there is no business question yet about the age, I would at least store the birthdate as a dimensional attribute of the customer. Adaption to the fact table when the question arises can be done later, but does require reengineering.
Data vault modeling
In data vault modeling you don’t suffer from this interpretation problem. The birthdate of the customer is a satellite attribute linked to your customer hub. There is no question about it.
You divide and conquer.
The interpretation will only be needed once you get the business question. Based on that, you can determine how to model it in the presentation layer, whether it be dimensional or any other form needed.
Wait, did you hear me mention something about Agile BI here? Well, sort of 😉