In this series of 2 articles I’m going to discuss different approaches to modeling business concepts using DataVault. It is based on discussions from the past at one of my clients in the pharmacy industry.
The backbone of a DataVault
The backbone of a DataVault consists of hubs and links. Satellites are part of the core concepts but of less significance for this series of articles.
Let me start by giving the definitions of hubs and links, as stated in literature about this subject.
“Hubs are defined using a unique list of business keys and provide a soft-integration point of raw data that is not altered from the source system, but is supposed to have the same semantic meaning.1”
“The Hub represents the key of a Core Business Concept and is established the first time a new instance of that concept’s Business Key is introduced to the data warehouse.2”
“The link entity type is responsible for modeling transactions, associations, hierarchies, and redefinitions of business terms. The next sections of this chapter define Data Vault links more formally. A link connects business keys; therefore links are modeled between hubs. Links capture and record the past, present, and future relationships between data elements at the lowest possible granularity.1”
“A Link represents an association between core concepts and is established the first time this new unique association is presented to the EDW. Just as a Hub is based on a core business concept, the Link is based on a natural business relationship between business concepts.2”
Now that the core constructs are defined, let’s define the business concepts that form the basis of this article and the data model.
“A place where healthcare is being practiced. This can be a hospital, a department of a hospital, a laboratory or another place.”
“A person that has followed some form of medical studies and practices healthcare.”
“A Master Site is the assignment of a Healthcare Professional to a Healthcare Facility.”
“A Study is a formally followed research process in the development of medicine.”
“A Study Site is the assignment of a Master Site to a Study.”
A first attempt to model the business concepts
The following data model uses the “colors” of the DataVault as introduced in “Modeling the agile data warehouse with DataVault”2.
- Hubs are blue
- Links are green
- Satellites are yellow
Based on the business definitions given above, there is probably no doubt that “healthcare facility”, “healthcare professional” and “study” should be represented as hubs.
As you can see, both “master site” and “study site” are modeled as links. The reason why this is done, is because the definitions of these concepts are indeed a kind of association and links are used for representing associations.
But this also poses an immediate problem. We now have a link-to-link relation in the model. This is not recommended practice: “This dependency does not scale nor perform well in a high volume, high velocity (big data) situation. The problem with link-to-link entities is that a change to the parent link requires changes to all dependent child links.”1
A second attempt to model the business concepts
One way to get rid of the link-to-link relation is by using (a kind of) denormalization1.
If you apply that principle, you’ll get this:
Even though this is a correct approach, I have two problems with it:
- It starts to look like a dimensional model and not like a DataVault model that is more fractal like. This is of course very subjective, but it just doesn’t feel right to me;
- The extensibility of the model is more difficult than with other approaches.
A third attempt to model the business concepts
Another approach is to take a closer look at the following statement: “This understanding is crucial to data vault modeling. A Link – by itself – cannot represent a business concept. So a Link – by itself – cannot represent an Event, a Transaction, or a Sale. Each of these event-oriented business concepts must be entire data vault constellations – including a Hub, Satellite(s) and Link(s).”2
Now think about that for a minute…
Both “master site” and “study site” are assignments, which is a kind of event. But both are business concepts too. In fact, these business concepts each have their own (composed) business keys. So according to the statement, these should be modeled as (keyed instance) hubs, not as links.
Let’s try again:
In part 2 I will elaborate on why this third attempt is the better option.