Data modeling for multi-structured data, nothing new

In today’s world there seems to be a lot of focus on the technology to handle multi-structured data. And for sure the advances made in technology support various aspects of data management. However this technology is just there to support the higher goals.

Schema-less or schema-on-read are terms that have often been advocated in the recent past. The headaches of that approach have also arisen quite a bit, just search Google for relevant articles.

The fact is, there is no such thing as unstructured. Rick van der Lans has also repeated multiple times that there is always some structure present, albeit not known at first. He therefore prefers the term semi-structured or multi-structured.

Let’s be honest, without some kind of structure we – humans – are not able to make sense out of data. The fact that we may not know the structure beforehand doesn’t matter. But once we know the structure, things start to get interesting. For some reason, even though we can start making sense out of our data, many fail to properly document this (multi-)structure. Why would you anyway, right?

Well, why wouldn’t you? Documenting the structure in a “data” model brings many benefits:

  • Communication with others that need to deal with the data in some way becomes a lot easier
  • You need to do it because of regulatory requirements, saving you from fines
  • You get a better overview and more insight how data relates to each other and where the gaps are
  • You are able to model and implement validations that are needed
  • Etc.

And you know what, data modeling has been around for at least half a century now. A lot of principles are old but still apply. The younger technology-focused generation just seems to have either forgotten them (best case) or has never learned them (worst case).

Data modeling comes on many different levels, not just the physical database. That is just a possible end point. While I was diving into even more around it, I came across a book called “Data Model Patterns – Conventions of Thought”, written by David C. Hay…. in 1995. Yes, you read it correctly, 1995. This book is a must-read for anyone that deals with data. It’s old and still so extremely valid. Apart from the many patterns that apply to a lot of organizations, it also show abstraction and generalization. Even better, it contains examples on how to deal with multi-structured data.

The following data model is taken from chapter 4 in the book and gives a perfect example on how to model multi-structured data. In this case it relates to “products” that may have many but varying descriptive attributes.

Now compare this with the “key-value pair” databases around nowadays. Ring a bell1? Remember the book was written in 1995, long before the internet was really a hype. Long before we started to talk about big data. Long before specialized databases supported this particular kind of data.

As said in the title of this post, there is nothing new here. Same old wine, just in a newer bag…


  1. If it doesn’t ring any bell, please read the book and start to look for training on data modeling. 

Leave a Reply