In part 1 of this series, I showed three approaches to model the business concepts. The third attempt, shown below, is the better one and I’ll explain why.
The business key issues
After discussions with the business and by looking at the sources, the business keys of the concepts turned out to be as follows for a “study site”:
- Study number
- Region abbreviation
- Site sequence number
This is a legacy business key however, because the concept of “master site” was introduced later. But it does pose some issues with the first and second attempt:
- Region isn’t foreseen anywhere but could potentially be modeled as a hub on its own;
- The site sequence number is meaningless on its own and can’t be modeled as a hub as such, resulting in a degenerate column in the “study site” link as it’s still part of the key.
The second issue is still valid according to Dan’s book1, but will require a slightly different pattern for loading (and thus automation) than a standard link that is the intersection of actual hubs.
If the region isn’t actually that important to the business to be modeled as its own hub, it would also need to be modeled as a degenerate column, bringing us back by the second issue.
The extensibility issues
Our model isn’t complete yet. During discussions with the business, it turns out there are “subjects” and “findings”.
A subject is a person (but not identifiable as such) taking part in a study at a particular study site. The same person taking part in another study (probably at another site), is a different subject.
The business key is nothing more than a sequence number (within the “study site”).
A finding is an side effect that a subject exposed during the study. Although not synonymous, think of it as a symptom.
The business key is nothing more than a sequence number or a date within the subject.
Do you see the problem here? Everything seems to have some kind of hierarchical relationship. Using the denormalization technique, you would end up with the following model:
To me this looks a bit messy, having too many dependencies. I know this can be just subjective again, but if you need to extend even further, it will become even more messy.
The alternative extension
If we however follow the idea2 of the third attempt in the previous post, subject and finding are just new hubs. Adding those to the model is easy, see below:
This model is much cleaner and easy to extend. You can argue however that some information is lost, because you can’t immediately see the dependencies that lie within the business keys. But what the question is whether these are really business keys or just a technical solution in the source that the business adopted as business keys (because they had not other choice).
Even if the composition of these business keys change, the definition of each of the hubs remains the same. The relationships between these hubs most likely don’t change either. And if they do, you can easily create other links between the hubs, without further impact on the rest of the model.
Furthermore, but this is only in latest developments in the DataVault community (not necessarily supported by Dan Linstedt and DV2.0), there is no necessity for link satellites that contain descriptive attributes.
In reality there is no conclusion. It’s partially a matter of preference. If you want to stick to pure DV2.0 recommendations, go for the denormalization technique as link-to-link is still not recommended.
But if you truly understand the modeling part of DataVault, go for the alternative2. It’s more flexible and easier to extend. And the DV2.01 book doesn’t not forbid you to model an event or transaction as a hub either so you don’t have to worry about that.