As most of these tools take the source data models as a starting point, you better make sure it is correctly modeled. Even with BIReady that takes a “business” model, you need to have a good model. By correctly modeled, I mean that preferably your source is modeled according to 3NF. When reverse engineering an existing database model, make sure primary keys and foreign keys are defined.
If not, you can be sure that the resulting generated data warehouse (datavault) models are pretty worthless.
I noticed this when using some of the tools on a source model that I have at hand from a client. This model is basically based on flat wide files loaded into (flat wide) tables. Primary keys are sometimes not defined. Foreign keys almost do not exist at all. Normalization is not done.
You can argue whether this is a true source model. It is not, that is true. But it is all we have. A situation that you will probably encounter very often.
The issue that I had with the ODBC connection has been solved. I was using a 64-bit driver and should have been using the 32-bit driver for MySQL.
Thanks to Jan Vos of BIReady for helping me out! I will now continue my evaluation and post an update soon.
However, I’m under NDA, so I need to check what I can and cannot post here.
Last weekend I received the demo license for BIReady and tried to play with it.
However, I wanted to make it a real case study and not use the included demo databases which were already shown to me.
And there I stumbled on the first issue. BIReady’s repository must be a MS Access database, something that will be solved in a next version according to a mail I received from their support.
So I decided to go for the MS Access repository but then I got stuck in adding a MySQL database as DWH. Something seems to be wrong with the ODBC connection. Strange as the connection tests just fine using other tools.
Anyway, I’m waiting for BIReady support to be able to continue.
Until further notice…
This afternoon Gertjan Vlug gave me a demo of BIReady, a product for automating the generation and loading of a data warehouse. In the remainder of this post, I’ll give my impression about its possibilities.
What does it do?
BIReady generates data models and ETL code for:
- Staging Area
- Enterprise Data Warehouse (EDW)
- Data Marts
The ETL code is dependent on the target database, but is essentially ANSI SQL that is run on the database itself. This means it executes as fast as your database engine can run it, the tool itself only handles the parallelism of the instructions that need to be executed.
What doesn’t it do?
BIReady doesn’t do custom integration, cleansing and other things that cannot be automated easily. However, you can still use existing ETL tools, data quality and cleansing tools to handle this part.
The starting point: a business data model
Unlike several other competitive products1, BIReady uses a business data model as a starting point for the generation of the data warehouse. This is basically an Entity Relationship Diagram (ERD) in third normal form that reflects the business (data) model. You should not confuse this with a Business Process Model using BPMN to model it.
The business data model can be imported from CA ERWin or PowerDesigner. [BIReady] also has some built-in modeling facilities, but those are of course limited compared to the fore-mentioned data modeling tools.
When a business data model is not present, you can start with reverse engineering one or more source data models, just like many of the competitive products do.
I like the fact that the business data model is taken as a starting point, because it is much more likely to integrate the data compared to using source data models. By using a business data model, you are bridging the semantic gap that Ronald Damhof is referring to in his presentation he gave at the Data Vault Automation conference last year.
Demo on Northwind database
We are all familiar with Microsoft’s Northwind database that is used in many, many examples. Gertjan used it for his demo. The good thing about it, is that it is a well documented and properly designed (business) data model. Gertjan was explaining the steps and showing the most important options of the tool and one and a half our later, the staging area, EDW and data mart were generated and loaded. The reason it took that long was because I interrupted him with some questions…
I was very impressed by the ease of use and speed. I will get a demo license of the product to play a bit with it. Based on that I will probably write another post containing some more details. Contact BIReady for a demo if you want to know more.
most of the competitive products use the source data models as a starting point ↩
Yesterday I attended the DWH Automation conference in Leuven (Belgium), hosted by BI-Community.
The presentations given by [RonaldDamhof] and [TomBreur] were largely the same as last year at the Data Vault Automation conference in Utrecht (The Netherlands). They both focus on Agile BI and the importance of DWH automation in Agile BI. For that matter, the use of Data Vault modeling for the Enterprise Data Warehouse (EDW) seems the only methodology that truly supports Agile BI and is the one that is the most easy to automate due to its patterns of hubs, links and satellites.
Hans Hultgren from Genesee Academy gave a very interesting presentation about the meaning of Data Warehousing. Nowadays there are a lot of different terms in Data Warehousing and some of these have different meanings depending on who you’re asking to define the term. He focused on the importance to talk about the meaning of the term, instead of the term itself. Depending on the meaning, several layers can be defined in a Data Warehouse solution, each of which has a specific purpose that can be (partially) automated or not.
Frederik Naessens from K25 gave a small presentation on how to use the ERWin data modeling tool to generate the various models, such as a Data Vault model for the EDW. It’s a poor man’s solution focused on being able to create awareness of the need of DWH automation tools.
The following companies presented their DWH Automation solution with “SlideWare”2:
- TripWire Solutions
While the other companies gave a live demo of their products:
Dirk Vermeiren from TripWire Solutions focuses on Oracle and presented their accelerators. They gave a complete overview of the layers they implement in a Data Warehouse solution and which of those layers can be automated. Data Vault modeling is used for the layer that contains the EDW.
My 2 cents: looks promising for a specific market (Oracle).
DWhite presented a solution that is not yet commercialized, because it is still in the works, but already used at a particular client. It is focused on Microsoft BI at the moment, but should support “everything” independent of the modeling used.
My 2 cents: I got the impression it is a one-man show and the goal is set pretty high, so I don’t think this will make it.
Gertjan Vlug from BIReady, as the last presenter of the day, decided to give some kind of wrap-up of the day and picked in where their product fits in. They are one of the pioneers and their product focuses on using a business model from which the rest can be automated, instead of using source models. BIReady can handle any type of modeling and also uses (but not necessarily) Data Vault.
My 2 cents: I want to see a demo, looks very promising and I really like the fact that it starts with a business model instead of a source model.
Robert gave a stunning Star Wars introduction that made sure that he got everyone’s attention. It was funny but still hit the spot. After that Terry took over and gave a small demo of WhereScape 3D and WhereScape RED. They had been demoing already at their booth, so it was kept short.
WhereScape RED is a stunning product. It really seems to do it all. It also takes care of the ETL itself, scheduling etc.
My 2 cents: WhereScape really knows its business and has a great product.
Jeroen Klep from Qosqo gave a demo of their Quipu product. It is open source and can be adapted to your needs by changing the templates. It is not meant to replace it all, but to be complementary to investments already made. Quipu is still young, but also looks very promising. It can automate design of staging, EDW and data marts. The EDW is based on Data Vault.
My 2 cents: Quipu has to be taken seriously and could become a true competitor for the other players such as WhereScape and BIReady.
While Agile BI is not only “hot”, but also necessary in a changing world, the need for being able to automate large parts of it is inevitable. There are some great players in the market that can help you with that.