As a data modeler I’m always searching for ways to learn more about data modeling. The more I read and learn about it – on top of what I have already learned and used -, the more sense it all makes and the easier it gets. It’s a bit like learning foreign languages. The more languages you’ve learned, the easier it get’s to learn a new one, especially when that language has the same kind origin.
As such, I have been doing some study regarding Concept Maps and Graph Data Modeling. It’s surprisingly (well, it isn’t) how close these two are related. And graph databases are “hot” (at the time of this writing).
I’m not going into all the details because it makes no sense to repeat all that has been written by others, people that are far more knowledgeable about these subjects.
A concept map or conceptual diagram is a diagram that depicts suggested relationships between concepts. It is a graphical tool that instructional designers, engineers, technical writers, and others use to organize and structure knowledge.
This definition was taken from Wikipedia and there are more interesting links from that article that you should read, such as “The Theory Underlying Concept Maps and How to Construct and Use Them”.
The main idea is that you start with a “focus question”. This is the question you try to answer with the concept map. Without it, your concept map could include a lot more than is necessary or relevant. And you’re not trying to model the entire universe…
What I like about concept maps is their simplicity. There is not really anything technical about them and they are easy to read for almost everyone. Of course, from a modeling perspective they don’t cover all concerns you eventually want to cover, but it is a nice starting point for brainstorming over a particular knowledge domain.
Surely, there are other more advanced (covering more concerns) modeling techniques such as FCO-IM but these have a higher learning curve as well.
An example concept map
The following concept map I created myself, using CmapTools. It tries to answer the following focus question:
What data is relevant to a recruiting company?
Note that I have no particular domain knowledge regarding recruitment and I didn’t consult anyone. I just had a look at my own CV.
While working on it, I discovered more concepts than I originally included in my first sketch, and more importantly, more relations between those concepts as well. Each of those relations must have a “linking phrase” so that a proposition can be formed, such as:
- Person has Hobby
- Hobby requires Skill
- Education teaches Skill
Of course there are concerns not being addressed here, such as:
- None of the concepts has an actual definition (and you should have those!)
- There is no explicit view on the cardinality between the concepts (can a person have just one or multiple hobbies?)
- There are no attributes defined for the concepts or relationships
And there are a few flaws as well in my example:
- A relationship in a concept map is uni-directional, but some of them are bi-directional in the real world, so the “way back” is missing
- It’s not complete (but that was not the intention)
- Even though you know in which countries an organisation is located and which assignment belongs to which organisation (well, the latter is actually missing due to the uni-directional relationship drawn from the organisation to the assignment), you don’t know for sure in which country the assignment took place
Graph Data Modeling
That example concept map looks “surprisingly” well to be the structure of a Graph Data Model. This is no coincidence. Thomas Frisendal has written an excellent book on this subject called “Graph Data Modeling for NoSQL and SQL: Visualize Structure and Meaning“. Visit his website that accompanies the book.
Taken from that website:
In the graph world the “property graph” style of graphing makes it possible to rethink the representation of data models. Graph Data Modeling sets a new standard for visualization of data models based on the property graph approach. Property graphs are graph data models consisting of nodes and relationships. The properties can reside with the nodes and / or the relationships. Accordingly the property graph model consists of just 3 simple types, as laid out in this property graph representation of the meta model:
This diagramming style is very close to what people – by intuition – draw on whiteboards. Rather than modeling data architecture on complex mathematics, we should focus on the psychology of the end user. If we do so, then engineering could be replaced with relevant business processes. In short, to achieve more logical and efficient results, we need to return to data modeling’s roots.
Even a ten year old can spot the resemblance between a graph data model and a concept map (I actually checked this with my son and he confirmed, even though he didn’t like the fact that I disturbed him in his Fortnite game play, in which he got shot due to my distraction). Even more interesting is the fact that concept maps were originally created by Joseph D. Novak to assist in the teaching of children.
So your ten year old might be a data modeler without knowing yet! Maybe (s)he is interested when you sit together to build a concept map around Fortnite…
Anyway, the graph data modeling technique as Thomas Frisendal present it in his book at least addresses one more concern: properties or attributes of concepts and relationships (technically speaking these could be part of a concept map too).
Databases that support property graphs such as Neo4j can translate these models basically one-on-one1. These graph databases are one category of the so called NoSQL databases and are extremely good in resolving questions about your data that is highly connected to each other through a plethora of relationships.
However, graph databases are also schema-less. That is, Neo4j is able to show you a schema – consisting of nodes and relationships, not properties of them – but doesn’t enforce it. So the application developer can go wild and do anything. As far as I can tell, Neo4j bases the schema on the labels you assign to the nodes and relationships. But it does that based on actual instances. If you don’t have an instance of a particular relationship between two nodes in your data, you won’t see that it could exist when looking at the schema. That’s where your concept map can be used as a reference when developing.
Another thing to consider is that due to properties not being part of the schema and having no schema enforcement, is that a particular instance of a node label or relationship label (i.e. the type) can have totally different properties that another instance of that same node or relationship label. Flexibility that has its downside as well.
An example of a graph database
Based on the concept map example earlier, I started to play around with Neo4j on my desktop. Looking at my own CV, I created a few nodes and relationships, but by far not all of them covered in the concept map.
When I queried the database to return me what I had created, the following was shown:
Quite impressive already and there is hardly any data present. I checked the schema with the concept map and only saw one thing that is probably just a glitch in Neo4j. There was a relationship label used_in that was self referencing the Organisation node label, but I couldn’t find any data that actually does that.
The example also shows that having the data only – without a schema, concept map or graph data model made upfront – quickly results in a situation where you can’t see the forest through the trees (pun intended, as a tree is a special kind of graph).
None, you are on your own here to draw conclusions. Further reading is recommended if you see potential in this as part of your “data modeler toolbox” or any other deeper understanding regarding data modeling.
- Not entirely one-on-one. Can you spot which part of the concept map can’t be translated as such? Hint: look at the relationship from Vacancy towards Education. ↩