The “catch” with data warehouse automation tools

During my evaluation of several data warehouse automation tools such as BIReady, Quipu and RapidAce, I have come to some sort of conclusion that is crucial to the success of using these tools.

As most of these tools take the source data models as a starting point, you better make sure it is correctly modeled. Even with BIReady that takes a “business” model, you need to have a good model. By correctly modeled, I mean that preferably your source is modeled according to 3NF. When reverse engineering an existing database model, make sure primary keys and foreign keys are defined.

If not, you can be sure that the resulting generated data warehouse (datavault) models are pretty worthless.

I noticed this when using some of the tools on a source model that I have at hand from a client. This model is basically based on flat wide files loaded into (flat wide) tables. Primary keys are sometimes not defined. Foreign keys almost do not exist at all. Normalization is not done.

You can argue whether this is a true source model. It is not, that is true. But it is all we have. A situation that you will probably encounter very often.

Mac OS automator workflow for getting direct link to @CloudApp uploads

Mac OS automator workflow for getting direct link to @CloudApp uploads

BIReady evaluation continues…

The issue that I had with the ODBC connection has been solved. I was using a 64-bit driver and should have been using the 32-bit driver for MySQL.

Thanks to Jan Vos of BIReady for helping me out! I will now continue my evaluation and post an update soon.

However, I’m under NDA, so I need to check what I can and cannot post here.

BIReady evaluation – first update

Last weekend I received the demo license for BIReady and tried to play with it.

However, I wanted to make it a real case study and not use the included demo databases which were already shown to me.

And there I stumbled on the first issue. BIReady’s repository must be a MS Access database, something that will be solved in a next version according to a mail I received from their support.

So I decided to go for the MS Access repository but then I got stuck in adding a MySQL database as DWH. Something seems to be wrong with the ODBC connection. Strange as the connection tests just fine using other tools.

Anyway, I’m waiting for BIReady support to be able to continue.

Until further notice…

Impression BIReady demo – DWH automation

This afternoon Gertjan Vlug gave me a demo of BIReady, a product for automating the generation and loading of a data warehouse. In the remainder of this post, I’ll give my impression about its possibilities.

What does it do?

BIReady generates data models and ETL code for:

  • Staging Area
  • Enterprise Data Warehouse (EDW)
  • Data Marts

For the EDW it uses Data Vault modeling, for the Data Marts it uses Dr. R. Kimball’s star schemas.

The ETL code is dependent on the target database, but is essentially ANSI SQL that is run on the database itself. This means it executes as fast as your database engine can run it, the tool itself only handles the parallelism of the instructions that need to be executed.

What doesn’t it do?

BIReady doesn’t do custom integration, cleansing and other things that cannot be automated easily. However, you can still use existing ETL tools, data quality and cleansing tools to handle this part.

The starting point: a business data model

Unlike several other competitive products1, BIReady uses a business data model as a starting point for the generation of the data warehouse. This is basically an Entity Relationship Diagram (ERD) in third normal form that reflects the business (data) model. You should not confuse this with a Business Process Model using BPMN to model it.

The business data model can be imported from CA ERWin or PowerDesigner. [BIReady] also has some built-in modeling facilities, but those are of course limited compared to the fore-mentioned data modeling tools.

When a business data model is not present, you can start with reverse engineering one or more source data models, just like many of the competitive products do.

I like the fact that the business data model is taken as a starting point, because it is much more likely to integrate the data compared to using source data models. By using a business data model, you are bridging the semantic gap that Ronald Damhof is referring to in his presentation he gave at the Data Vault Automation conference last year.

Demo on Northwind database

We are all familiar with Microsoft’s Northwind database that is used in many, many examples. Gertjan used it for his demo. The good thing about it, is that it is a well documented and properly designed (business) data model. Gertjan was explaining the steps and showing the most important options of the tool and one and a half our later, the staging area, EDW and data mart were generated and loaded. The reason it took that long was because I interrupted him with some questions…

Conclusion

I was very impressed by the ease of use and speed. I will get a demo license of the product to play a bit with it. Based on that I will probably write another post containing some more details. Contact BIReady for a demo if you want to know more.


  1. most of the competitive products use the source data models as a starting point