Datawarehouse Automation

Yesterday I attended the DWH Automation conference in Leuven (Belgium), hosted by BI-Community.

Organizers

It was organized by Ronald Damhof and Tom Breur.

Sponsors

Among the sponsors were Centennium, WhereScape, Qosqo, BIReady, Genesee Academy, timeXtender and TripWire Solutions.1

General presentations

The presentations given by [RonaldDamhof] and [TomBreur] were largely the same as last year at the Data Vault Automation conference in Utrecht (The Netherlands). They both focus on Agile BI and the importance of DWH automation in Agile BI. For that matter, the use of Data Vault modeling for the Enterprise Data Warehouse (EDW) seems the only methodology that truly supports Agile BI and is the one that is the most easy to automate due to its patterns of hubs, links and satellites.

Hans Hultgren from Genesee Academy gave a very interesting presentation about the meaning of Data Warehousing. Nowadays there are a lot of different terms in Data Warehousing and some of these have different meanings depending on who you’re asking to define the term. He focused on the importance to talk about the meaning of the term, instead of the term itself. Depending on the meaning, several layers can be defined in a Data Warehouse solution, each of which has a specific purpose that can be (partially) automated or not.

Frederik Naessens from K25 gave a small presentation on how to use the ERWin data modeling tool to generate the various models, such as a Data Vault model for the EDW. It’s a poor man’s solution focused on being able to create awareness of the need of DWH automation tools.

Product presentations

The following companies presented their DWH Automation solution with “SlideWare”2:

  • TripWire Solutions
  • DWhite
  • BIReady

While the other companies gave a live demo of their products:

  • WhereScape
  • Qosqo

TripWire Solutions

Dirk Vermeiren from TripWire Solutions focuses on Oracle and presented their accelerators. They gave a complete overview of the layers they implement in a Data Warehouse solution and which of those layers can be automated. Data Vault modeling is used for the layer that contains the EDW.

My 2 cents: looks promising for a specific market (Oracle).

DWhite

DWhite presented a solution that is not yet commercialized, because it is still in the works, but already used at a particular client. It is focused on Microsoft BI at the moment, but should support “everything” independent of the modeling used.

My 2 cents: I got the impression it is a one-man show and the goal is set pretty high, so I don’t think this will make it.

BIReady

Gertjan Vlug from BIReady, as the last presenter of the day, decided to give some kind of wrap-up of the day and picked in where their product fits in. They are one of the pioneers and their product focuses on using a business model from which the rest can be automated, instead of using source models. BIReady can handle any type of modeling and also uses (but not necessarily) Data Vault.

My 2 cents: I want to see a demo, looks very promising and I really like the fact that it starts with a business model instead of a source model.

WhereScape

Robert gave a stunning Star Wars introduction that made sure that he got everyone’s attention. It was funny but still hit the spot. After that Terry took over and gave a small demo of WhereScape 3D and WhereScape RED. They had been demoing already at their booth, so it was kept short.

WhereScape RED is a stunning product. It really seems to do it all. It also takes care of the ETL itself, scheduling etc.

My 2 cents: WhereScape really knows its business and has a great product.

Qosqo

Jeroen Klep from Qosqo gave a demo of their Quipu product. It is open source and can be adapted to your needs by changing the templates. It is not meant to replace it all, but to be complementary to investments already made. Quipu is still young, but also looks very promising. It can automate design of staging, EDW and data marts. The EDW is based on Data Vault.

My 2 cents: Quipu has to be taken seriously and could become a true competitor for the other players such as WhereScape and BIReady.

Conclusion

While Agile BI is not only “hot”, but also necessary in a changing world, the need for being able to automate large parts of it is inevitable. There are some great players in the market that can help you with that.


  1. Don’t shoot me if I forgot one of the sponsors 

  2. They did not give a live demo, but only presented the solution using slides. 

Interpretation in dimensional modeling vs data vault

In my previous post I mentioned that there is less interpretation for the designer in data vault modeling than in dimensional modeling.

Let me elaborate on that with an example.

Dimensional modeling

The question that I received in the workshop was what to do with the age of the customer at time of the transaction. Is it a fact or a dimensional attribute?

The age is something that could be calculated by using the customer’s birthdate and the date of the transaction. In this case, the dimensional attribute would be birthdate. The fact table should hold a reference to the time dimension representing the transaction date.

Depending on the number of fact records however, there could be a negative impact on performance when calculating the age on the fly, but this can probably be neglected.

But what if the age is supposed to be used to used to determine if it falls in a certain range? What if these ranges are variable and only known at query time? And what about the average age of the customers buying a particular product within a certain time frame?

Could the age be a fact measure in those cases?

Some designers would argue that it is a derived measure that doesn’t need to be stored in your fact table. I agree, but it does require your query reporting tool to be able to handle it all.

Others would argue that the age is a measure that you better store in the fact table. It wouldn’t be exactly an additive measure, but at least you could answer the “average age” somewhat easier.

You could even argue that it is both something dimensional and factual. And that’s true as well. The birthdate of the customer is dimensional and the age could be used for filtering facts.

It should be clear that it depends on the context. If there is no business question yet about the age, I would at least store the birthdate as a dimensional attribute of the customer. Adaption to the fact table when the question arises can be done later, but does require reengineering.

Data vault modeling

In data vault modeling you don’t suffer from this interpretation problem. The birthdate of the customer is a satellite attribute linked to your customer hub. There is no question about it.

You divide and conquer.

The interpretation will only be needed once you get the business question. Based on that, you can determine how to model it in the presentation layer, whether it be dimensional or any other form needed.

Wait, did you hear me mention something about Agile BI here? Well, sort of 😉

The Art of Data Modeling – follow up

In my previous post on data modeling, I wrote that it is an art and not pure science. In this post, I’ll elaborate on that.

Today I gave a small workshop on dimensional modeling for the business intelligence team that I am currently part of. The main objectives of this workshop were to get the team in a standard way of working and to refresh their memory on dimensional modeling.

As preparation I had written a small summary about dimensional modeling, mostly based on Dr. Ralph Kimball’s book The Data Warehouse Toolkit.

Before I got into presenting some example models for the project I’m working in, I had a small Q&A session about the preparation material that I had provided.

Well, I thought it would be small…

I received many design questions that I could not answer… in the way they thought I would answer it, i.e. as a straight answer that would set it for once and for all.

That is exactly what designing a data model is all about, especially when doing dimensional modeling. To quote Ronald Damhof:

it all depends on the context

That is also exactly why there are some drawbacks in dimensional modeling compared to other modeling techniques such as data vault.

Is a particular attribute dimensional or a measure that belongs in the fact table? It just depends in how it will be used. There is no scientific answer to it.

That’s why I say, data modeling is an art. Art represents the artist’s interpretation or view.

It is also the reason why I prefer data vault over dimensional modeling for the EDW. There is less left up to the interpretation. But that’s another story…

The Art of Data Modeling

For reasons still not quite clear to me, I come across a lot of organizations that perform pretty poor regarding the data modeling of their business intelligence solutions.

In some cases the organizations don’t even have tools for data modeling. Strangely enough a lot of these have an Oracle database, so why not use the Oracle SQL data modeler, which comes for free.

Of course the tools are just a means to an end and don’t guarantee that your data model design is correct. But bringing some kind of structure to your data that is visualized in a data model will help you and the stakeholders to determine its correctness.

Data modeling is an art, not pure science. In a follow up post, I’ll elaborate on that.

OmniFocus and Basecamp

OmniFocus and Basecamp