Is the traditional data warehouse still alive? Who cares?

Over the years and especially recently when I check my Twitter or LinkedIn timeline, I see questions like:

Is the traditional data warehouse dead?

I think it is a non-question, badly formulated and posed in a way that the answer is likely to be no, even without having read the actual article. It’s just marketing bullshit, clickbait.

First of all, what does traditional mean in this context? Secondly, what is meant by data warehouse? And last but not least, what is meant by dead?

Let’s start with the last one. If dead means deceased, no longer existing, then obviously the answer would be no to the original question, no matter what traditional data warehouse really means. We all know that these are still around. If dead means “no longer created in the same way”, then the answer could be yes, but is still probably no in lots of cases.

And that’s exactly where we come to traditional. What is that? Something that has been around for ages or decades? Data warehouses exist for decades already, not for ages. Even traditions are known to evolve, albeit slowly, very slowly sometimes. To be honest, I have no clue what it means in this context. If you find that strange, let me ask you another question: Is the traditional car dead? Same problem here. What is that? A Ford T for example? Yes, those are no longer made or in use (although I’m not sure about the latter part, maybe for special occasions these are still used). On the other hand, a Tesla still has four wheels, a steering wheel, an engine (or two) etc. It’s still a car, just different technology.

Finally, what is a data warehouse? Is it a technical thing? Certainly not! It’s a concept that addresses particular concerns regarding – but not limited to – the way data is gathered, stored, processed, managed, governed, made available etc. Sure, you can also take the definition given by Barry Devlin, Bill Inmon or Ralph Kimball. And when they coined the term and definition – decades ago -, there were certain technological restrictions on how to build and implement it. Let’s be honest, even though that all has evolved, we still face technological restrictions. If you don’t believe me, let’s look back in 15 years from now (maybe even shorter).

So is the traditional data warehouse dead? Or is it still alive? Or doesn’t the “question” really matter at all? My answers: no, yes, no. In no particular order 😉

The “catch” with data warehouse automation tools

During my evaluation of several data warehouse automation tools such as BIReady, Quipu and RapidAce, I have come to some sort of conclusion that is crucial to the success of using these tools.

As most of these tools take the source data models as a starting point, you better make sure it is correctly modeled. Even with BIReady that takes a “business” model, you need to have a good model. By correctly modeled, I mean that preferably your source is modeled according to 3NF. When reverse engineering an existing database model, make sure primary keys and foreign keys are defined.

If not, you can be sure that the resulting generated data warehouse (datavault) models are pretty worthless.

I noticed this when using some of the tools on a source model that I have at hand from a client. This model is basically based on flat wide files loaded into (flat wide) tables. Primary keys are sometimes not defined. Foreign keys almost do not exist at all. Normalization is not done.

You can argue whether this is a true source model. It is not, that is true. But it is all we have. A situation that you will probably encounter very often.

BIReady evaluation continues…

The issue that I had with the ODBC connection has been solved. I was using a 64-bit driver and should have been using the 32-bit driver for MySQL.

Thanks to Jan Vos of BIReady for helping me out! I will now continue my evaluation and post an update soon.

However, I’m under NDA, so I need to check what I can and cannot post here.