What’s your opinion on feeding a data warehouse from a message queue?

There are organizations that have decided that their data warehouse(s) should be fed with data in the same way as their operational systems that are part of their Enterprise Application Integration (EAI) strategy.

Depending on your information needs, this might be a good idea. Especially when you have a (near) realtime data warehouse. Short burst of transactional data can be processed quickly and your data warehouse will be up to date.

But what if your information needs are different? What if you only need a snapshot of the data at the end of the day and just don’t need all changes that happened during the day in your transactional systems?

In a batch oriented data warehouse that only needs data once a day, or maybe only a few times per day but not anywhere close to (near) realtime, is having a message queue that feeds it really the way to go?

In one of my previous projects we had a batch oriented data warehouse that was fed by a message queue. The amount of “data” pushed to it consisted of 75% overhead just because of all the XML tags that were needed. 25% was the data we really needed. In the end, it was decided to make a special XML schema for us with shorter tags to get rid of the overhead.

Secondly, a buffer area needed to be developed with just one service: create a consolidated daily snapshot of all changes received via the message queue. Without going into all the details of the problems we faced, it was a messy solution and just didn’t feel right.

Do you recognize this or have any experience with it? Then please leave your opinion and thoughts in the comment section.

Much appreciated and have a nice day.