INFOR Data Vault Framework development effort has been cancelled

The development of the INFOR Data Vault Framework, largely inspired by Edwin Weber’s PDI Data Vault Framework has been cancelled for various reasons.

We have a “working” version that is good enough to help us do some of the initial automation of the loading of our Data Vault that is under construction, but we have evaluated solutions that are proven to work and chose a partner to implement this in due time.

It has been a fun ride and I want to thank my team for the effort they have put into it. Also a big thanks to Edwin Weber for his inspiring work.

INFOR Data Vault framework – meta data generation – part deux

INFOR Data Vault framework – meta data generation

This documents describes the possibility of generating meta data out of the Oracle SDDM (SQL Developer Data Modeler) for loading the data vault (using the in house developed framework).

Model

We only do relational modeling and no logical modeling, because the relational model is basically the same. The data vault model adheres to a few standards for the scripts to work, as also partially described in the JEDI architecture document:

  • Hubs
    • Names start with H_
    • Business key
      • Has a unique constraint / index
      • Single column, always called BK (but this can be different, the code doesn’t check for the name)
  • Links
    • Names start with L_
  • Satellites
    • Names start with S_

Meta data

We define meta data for:

  • Hubs
  • Links
    • Maximum 15 hubs can be tied together in a link at the moment
  • Satellites
    • Related to Hubs
    • Related to Links

Mappings

When defining the mappings in the model, we use the following restrictions:

  • No soft business rules (data is not changed)
  • 1:1 copy of source
  • Entered in model column notes:
    • Format: SYSTEM.TABLE.COLUMN (separator is a dot)
    • Multiple sources entered on separate lines delimited by line feed (just hit enter after each source, but not after the last one)

Sample:

Reporting schema

In order to generate the meta data needed from the model design, we need to export it first to a Reporting Schema. Note: when you export a design to the Reporting Schema, versions are created. The queries below don’t take this into account, so for the moment you have to delete old designs from the Reporting Schema (this option can be found on the maintenance tab of the dialog you get when you export).

The following steps need to be executed to export the design to the Reporting Schema:

  • Make sure you have the JEDI warehouse design open
  • Choose File | Export | To Reporting Schema

  • Choose the JEDI_REPOSITORY for the connection
  • Choose Delete Designs to remove existing designs

  • In the pop-up screen, mark all existing designs and click Delete Selected

  • Click OK to confirm that designs have been deleted
  • Click OK to export

When you export a model design to an Oracle Reporting Schema (residing in a database) from the Oracle SDDM, a lot of tables are created that contain information about the design. The following tables from the Reporting Schema are used for the generation of the meta data:

  • DMRS_TABLES
    Contains information about the tables
  • DMRS_COLUMNS
    Contains information about the columns
  • DMRS_FOREIGNKEYS
    Contains information about the foreign keys
  • DMRS_INDEXES
    Contains information about the indexes
  • DMRS_CONSTR_INDEX_COLUMNS
    Contains information about the index columns
  • DMRS_LARGE_TEXT
    Contains information about the comments and the notes
  • DMRS_PK_OID_COLUMNS
    Contains information about the primary key columns

Note that there is a confirmed bug in the Reporting Schema that prevents you from joining the DMRS_CONSTR_INDEX_COLUMNS with DMRS_FOREIGNKEYS based on the keys of these tables.

We have created some functions (for which I have to thank Andreas Delmelle), that are not part of the Reporting schema:

  • Fn_instr_t
    searches for a given string occurring multiple times and returns surrounding positions, so that you can take substrings
  • Fn_udf_split
    splits a string delimited by another string and returns it as separate rows

Types:

create or replace type str_table_type is table of varchar2 (255);
create or replace TYPE nums_table_type AS TABLE OF NUMBER;

Fn_instr_t:

create or replace 
FUNCTION fn_instr_t (sbase IN VARCHAR2, slookup IN VARCHAR2)
    RETURN nums_table_type PIPELINED DETERMINISTIC
IS
    idx   NUMBER := COALESCE (INSTR (sbase, slookup), 0);
BEGIN
    LOOP
        EXIT WHEN idx = 0;
        PIPE ROW (idx);
        idx := INSTR (sbase, slookup, idx + 1);
    END LOOP;
END fn_instr_t;

Fn_udf_split:

create or replace function fn_udf_split ( p_str_in in varchar2, p_sep in varchar2 )   
return str_table_type pipelined deterministic   
is   
cursor cur_instr_t (p_str_in varchar2, p_sep varchar2)   
is   
        select rownum rn, max (rownum) over () max_rn, t.column_value pos   
from table (fn_instr_t (p_str_in, p_sep)) t;   

    str_tmp varchar2 (32767);   
    pos_prev number := 1;   

begin   
    if coalesce (instr (p_str_in, p_sep), 0) = 0 then   
        pipe row (p_str_in);   
    else   
        for i in cur_instr_t (p_str_in, p_sep)   
            loop pipe row (substr (p_str_in, pos_prev, i.pos pos_prev));  
            if i.rn = i.max_rn and length (p_str_in) > i.pos then   
                pipe row (substr (p_str_in, i.pos + length (p_sep)));   
            end if;   
            pos_prev := i.pos + length (p_sep);   
        end loop;   
    end if;   
end fn_udf_split;   

Queries / Views

The following queries can be implemented as views to generate the meta data we need.

Hubs

First we have a generic view VW_HUBS that joins some information together and that can be reused by other views:

CREATE OR REPLACE FORCE VIEW "JEDI_REPOSITORY"."VW_HUBS" ("HUB_OVID", "HUB_NAME", "HUB_BSN_KEY_OVID", "HUB_BSN_KEY", "HUB_SCHEMA", "HUB_UPD_VIEW", "SRC") AS 
  SELECT a.ovid   AS hub_ovid,
    A.TABLE_NAME  AS HUB_NAME,
    a.column_ovid as hub_bsn_key_ovid,
    a.column_name AS hub_bsn_key,
    a.schema_name AS hub_schema,
    CASE
      WHEN LENGTH(concat(concat('VW_', a.table_name), '_UPD')) <= 30
      THEN concat(concat('VW_', a.table_name), '_UPD')
      ELSE concat(concat('VW_', regexp_replace(a.table_name, '[AEUYOI]+', '')), '_UPD')
    END AS hub_upd_view,
    src
  FROM
    (SELECT
      CASE
        WHEN a.schema_name IS NOT NULL
        THEN a.schema_name
        ELSE 'JEDI_WAREHOUSE'
      END AS schema_name,
      a.ovid,
      A.TABLE_NAME,
      ic.column_ovid,
      ic.column_name,
      c.text as src
    FROM dmrs_tables a
    INNER JOIN dmrs_indexes i
    ON a.ovid = i.container_ovid
    AND a.table_name LIKE 'H_%'
    AND a.table_name != 'HUB_TEMPLATE'
    AND i.state       = 'Unique Constraint'
    INNER JOIN dmrs_constr_index_columns ic
    ON i.index_name = ic.index_name
    INNER JOIN dmrs_large_text c
    ON c.ovid  = ic.column_ovid
    AND c.type = 'Note'
    ) a
  ORDER BY 1;

The generate the meta data for loading the hubs, we use the view VW_GEN_META_HUB:

CREATE OR REPLACE FORCE VIEW "JEDI_REPOSITORY"."VW_GEN_META_HUB" ("HUB_NAME", "HUB_BSN_KEY_1", "HUB_BSN_KEY_2", "SRC_ENV", "SRC_TBL", "SRC_BSN_KEY_1", "HUB_SCHEMA", "HUB_UPD_VIEW", "ACTIVE_FLAG", "HUB_BSN_KEY_3", "HUB_BSN_KEY_4", "SRC_BSN_KEY_3", "SRC_BSN_KEY_4") AS 
  SELECT a.hub_name,
  a.hub_bsn_key                                      AS hub_bsn_key_1,
  NULL                                               AS hub_bsn_key_2,
  regexp_substr(trim(b.column_value), '[^.]+', 1, 1) AS src_env ,
  regexp_substr(trim(b.column_value), '[^.]+', 1, 2) AS src_tbl ,
  regexp_substr(trim(b.column_value), '[^.]+', 1, 3) AS src_bsn_key_1,
  a.hub_schema,
  a.hub_upd_view,
  1    AS active_flag,
  NULL AS hub_bsn_key_3,
  NULL AS hub_bsn_key_4,
  NULL AS src_bsn_key_3,
  NULL AS src_bsn_key_4
FROM vw_hubs a,
  TABLE (FN_UDF_SPLIT (A.SRC, CHR(10))) B
ORDER BY 1;

Links

First we have a generic view VW_LINKS that joins some information together and that can be reused by other views:

CREATE OR REPLACE FORCE VIEW "JEDI_REPOSITORY"."VW_LINKS" ("LINK_OVID", "LINK_NAME", "LNK_HUB_OVID", "LNK_HUB", "LNK_HUB_SQN_OVID", "LNK_HUB_SQN", "HUB_BSN_KEY_OVID", "HUB_BSN_KEY", "DM_VIEW_NAME", "SCHEMA_NAME", "LINK_UPD_VIEW") AS 
  SELECT DISTINCT
  f.child_table_ovid AS link_ovid,
  f.child_table_name AS link_name,
  F.REFERRED_TABLE_OVID AS LNK_HUB_OVID,
  F.REFERRED_TABLE_NAME AS LNK_HUB,
  ic.column_ovid as lnk_hub_sqn_ovid,
  IC.COLUMN_NAME AS LNK_HUB_SQN,
  P.COLUMN_OVID AS HUB_BSN_KEY_OVID,
  p.column_name as hub_bsn_key,
  CASE
    WHEN LENGTH(concat(concat('VW_', f.child_table_name), '_DM')) <= 30
    THEN concat(concat('VW_', f.child_table_name), '_DM')
    ELSE concat(concat('VW_', regexp_replace(f.child_table_name, '[AEUYOI]+', '')), '_DM')
  END AS dm_view_name,
  CASE
    WHEN t.schema_name IS NOT NULL
    THEN t.schema_name
    ELSE 'JEDI_WAREHOUSE'
  END AS schema_name,
  CASE
    WHEN LENGTH(concat(concat('VW_', f.child_table_name), '_UPD')) <= 30
    THEN concat(concat('VW_', f.child_table_name), '_UPD')
    ELSE concat(concat('VW_', regexp_replace(f.child_table_name, '[AEUYOI]+', '')), '_UPD')
  END AS link_upd_view
FROM dmrs_foreignkeys f
INNER JOIN dmrs_tables t
ON f.child_table_ovid = t.ovid
AND t.table_name LIKE 'L_%'
INNER JOIN dmrs_constr_index_columns ic
ON f.fk_name = ic.index_name
INNER JOIN dmrs_pk_oid_columns p
ON p.table_ovid = f.referred_table_ovid
ORDER BY 2;

The generate the meta data for loading the hubs, we use the view VW_GEN_META_LINK:

CREATE OR REPLACE FORCE VIEW "JEDI_REPOSITORY"."VW_GEN_META_LINK" ("LINK_NAME", "HUB_1_UPD_VIEW", "HUB_2_UPD_VIEW", "HUB_3_UPD_VIEW", "HUB_4_UPD_VIEW", "HUB_5_UPD_VIEW", "HUB_6_UPD_VIEW", "HUB_7_UPD_VIEW", "HUB_8_UPD_VIEW", "HUB_9_UPD_VIEW", "HUB_10_UPD_VIEW", "HUB_11_UPD_VIEW", "HUB_12_UPD_VIEW", "HUB_13_UPD_VIEW", "HUB_14_UPD_VIEW", "HUB_15_UPD_VIEW", "LNK_HUB_1_SQN", "LNK_HUB_2_SQN", "LNK_HUB_3_SQN", "LNK_HUB_4_SQN", "LNK_HUB_5_SQN", "LNK_HUB_6_SQN", "LNK_HUB_7_SQN", "LNK_HUB_8_SQN", "LNK_HUB_9_SQN", "LNK_HUB_10_SQN", "LNK_HUB_11_SQN", "LNK_HUB_12_SQN", "LNK_HUB_13_SQN", "LNK_HUB_14_SQN", "LNK_HUB_15_SQN", "DM_VIEW_NAME", "SCHEMA_NAME", "LINK_UPD_VIEW", "SRC_ENV", "SRC_TBL", "SRC_BSN_KEY_1", "SRC_BSN_KEY_2", "SRC_BSN_KEY_3", "SRC_BSN_KEY_4", "SRC_BSN_KEY_5", "SRC_BSN_KEY_6", "SRC_BSN_KEY_7", "SRC_BSN_KEY_8", "SRC_BSN_KEY_9", "SRC_BSN_KEY_10", "SRC_BSN_KEY_11", "SRC_BSN_KEY_12", "SRC_BSN_KEY_13", "SRC_BSN_KEY_14", "SRC_BSN_KEY_15") AS 
  SELECT link_name,
  MAX(
  CASE
    WHEN rn = 1
    THEN hub_upd_view
  END) AS hub_1_upd_view,
  MAX(
  CASE
    WHEN rn = 2
    THEN hub_upd_view
  END) AS hub_2_upd_view,
  MAX(
  CASE
    WHEN rn = 3
    THEN hub_upd_view
  END) AS hub_3_upd_view,
  MAX(
  CASE
    WHEN rn = 4
    THEN hub_upd_view
  END) AS hub_4_upd_view,
  MAX(
  CASE
    WHEN rn = 5
    THEN hub_upd_view
  END) AS hub_5_upd_view,
  MAX(
  CASE
    WHEN rn = 6
    THEN hub_upd_view
  END) AS hub_6_upd_view,
  MAX(
  CASE
    WHEN rn = 7
    THEN hub_upd_view
  END) AS hub_7_upd_view,
  MAX(
  CASE
    WHEN rn = 8
    THEN hub_upd_view
  END) AS hub_8_upd_view,
  MAX(
  CASE
    WHEN rn = 9
    THEN hub_upd_view
  END) AS hub_9_upd_view,
  MAX(
  CASE
    WHEN rn = 10
    THEN hub_upd_view
  END) AS hub_10_upd_view,
  MAX(
  CASE
    WHEN rn = 11
    THEN hub_upd_view
  END) AS hub_11_upd_view,
  MAX(
  CASE
    WHEN rn = 12
    THEN hub_upd_view
  END) AS hub_12_upd_view,
  MAX(
  CASE
    WHEN rn = 13
    THEN hub_upd_view
  END) AS hub_13_upd_view,
  MAX(
  CASE
    WHEN rn = 14
    THEN hub_upd_view
  END) AS hub_14_upd_view,
  MAX(
  CASE
    WHEN rn = 15
    THEN hub_upd_view
  END) AS hub_15_upd_view,
  MAX(
  CASE
    WHEN rn = 1
    THEN lnk_hub_sqn
  END) AS lnk_hub_1_sqn,
  MAX(
  CASE
    WHEN rn = 2
    THEN lnk_hub_sqn
  END) AS lnk_hub_2_sqn,
  MAX(
  CASE
    WHEN rn = 3
    THEN lnk_hub_sqn
  END) AS lnk_hub_3_sqn,
  MAX(
  CASE
    WHEN rn = 4
    THEN lnk_hub_sqn
  END) AS lnk_hub_4_sqn,
  MAX(
  CASE
    WHEN rn = 5
    THEN lnk_hub_sqn
  END) AS lnk_hub_5_sqn,
  MAX(
  CASE
    WHEN rn = 6
    THEN lnk_hub_sqn
  END) AS lnk_hub_6_sqn,
  MAX(
  CASE
    WHEN rn = 7
    THEN lnk_hub_sqn
  END) AS lnk_hub_7_sqn,
  MAX(
  CASE
    WHEN rn = 8
    THEN lnk_hub_sqn
  END) AS lnk_hub_8_sqn,
  MAX(
  CASE
    WHEN rn = 9
    THEN lnk_hub_sqn
  END) AS lnk_hub_9_sqn,
  MAX(
  CASE
    WHEN rn = 10
    THEN lnk_hub_sqn
  END) AS lnk_hub_10_sqn,
  MAX(
  CASE
    WHEN rn = 11
    THEN lnk_hub_sqn
  END) AS lnk_hub_11_sqn,
  MAX(
  CASE
    WHEN rn = 12
    THEN lnk_hub_sqn
  END) AS lnk_hub_12_sqn,
  MAX(
  CASE
    WHEN rn = 13
    THEN lnk_hub_sqn
  END) AS lnk_hub_13_sqn,
  MAX(
  CASE
    WHEN rn = 14
    THEN lnk_hub_sqn
  END) AS lnk_hub_14_sqn,
  MAX(
  CASE
    WHEN rn = 15
    THEN lnk_hub_sqn
  END) AS lnk_hub_15_sqn,
  dm_view_name,
  schema_name,
  link_upd_view,
  src_env,
  src_tbl,
  MAX(
  CASE
    WHEN rn = 1
    THEN src_bsn_key
  END) AS src_bsn_key_1,
  MAX(
  CASE
    WHEN rn = 2
    THEN src_bsn_key
  END) AS src_bsn_key_2,
  MAX(
  CASE
    WHEN rn = 3
    THEN src_bsn_key
  END) AS src_bsn_key_3,
  MAX(
  CASE
    WHEN rn = 4
    THEN src_bsn_key
  END) AS src_bsn_key_4,
  MAX(
  CASE
    WHEN rn = 5
    THEN src_bsn_key
  END) AS src_bsn_key_5,
  MAX(
  CASE
    WHEN rn = 6
    THEN src_bsn_key
  END) AS src_bsn_key_6,
  MAX(
  CASE
    WHEN rn = 7
    THEN src_bsn_key
  END) AS src_bsn_key_7,
  MAX(
  CASE
    WHEN rn = 8
    THEN src_bsn_key
  END) AS src_bsn_key_8,
  MAX(
  CASE
    WHEN rn = 9
    THEN src_bsn_key
  END) AS src_bsn_key_9,
  MAX(
  CASE
    WHEN rn = 10
    THEN src_bsn_key
  END) AS src_bsn_key_10,
  MAX(
  CASE
    WHEN rn = 11
    THEN src_bsn_key
  END) AS src_bsn_key_11,
  MAX(
  CASE
    WHEN rn = 12
    THEN src_bsn_key
  END) AS src_bsn_key_12,
  MAX(
  CASE
    WHEN rn = 13
    THEN src_bsn_key
  END) AS src_bsn_key_13,
  MAX(
  CASE
    WHEN rn = 14
    THEN src_bsn_key
  END) AS src_bsn_key_14,
  MAX(
  CASE
    WHEN rn = 15
    THEN src_bsn_key
  END) AS src_bsn_key_15
FROM
  (SELECT a.link_name,
    a.hub_upd_view,
    a.lnk_hub_sqn,
    a.dm_view_name,
    a.schema_name,
    a.link_upd_view,
    regexp_substr(trim(b.column_value), '[^.]+', 1, 1) AS src_env ,
    regexp_substr(trim(b.column_value), '[^.]+', 1, 2) AS src_tbl ,
    regexp_substr(trim(b.column_value), '[^.]+', 1, 3) AS src_bsn_key,
    row_number() OVER ( partition BY a.link_name order by rownum) rn
  FROM
    ( SELECT DISTINCT  
      L.LINK_NAME,
      l.LNK_HUB_SQN,
      L.DM_VIEW_NAME,
      L.LINK_UPD_VIEW,
      H.HUB_SCHEMA AS SCHEMA_NAME,
      h.hub_upd_view,
      TO_CHAR(NOTES.TEXT) AS TEXT
    FROM VW_LINKS L
    INNER JOIN VW_HUBS H
    ON L.LNK_HUB_OVID = H.HUB_OVID
    INNER JOIN DMRS_LARGE_TEXT NOTES
    ON l.lnk_hub_sqn_ovid = notes.ovid
    AND notes.type = 'Note'
    ) a,
    TABLE (fn_udf_split (a.text, chr(10))) b
  )
GROUP BY link_name,
  dm_view_name,
  schema_name,
  link_upd_view,
  src_env,
  src_tbl;

Satellites

Hubs

For generating the meta data to load satellites related to hubs, we use the view VW_GEN_META_HUB_SAT:

CREATE OR REPLACE FORCE VIEW "JEDI_REPOSITORY"."VW_GEN_META_HUB_SAT" ("SAT_NAME", "SAT_HUB_TBL", "SAT_HUB_BSN_KEY", "SAT_COLS", "SRC_ENV", "SRC_TBL", "SRC_BSN_KEY", "SRC_COLS", "ACTIVE_FLAG", "SAT_SEQUENCE") AS 
  SELECT a.table_name                                  AS sat_name,
  a.hub_tbl                                          AS sat_hub_tbl,
  a.hub_bsn_key                                      AS sat_hub_bsn_key,
  a.column_name                                      AS sat_cols,
  regexp_substr(trim(b.column_value), '[^.]+', 1, 1) AS src_env ,
  regexp_substr(trim(b.column_value), '[^.]+', 1, 2) AS src_tbl ,
  regexp_substr(trim(a.src_bsn_key), '[^.]+', 1, 3)  AS src_bsn_key,
  regexp_substr(trim(a.text), '[^.]+', 1, 3)         AS src_cols,
  'Y'                                                AS active_flag,
  concat('SQN_', a.table_name)                       AS sat_sequence
FROM
  (SELECT A.TABLE_NAME,
    h.hub_name AS hub_tbl,
    c.column_name,
    c.sequence,
    NOTES.TEXT,
    h.hub_bsn_key,
    h.src    AS src_bsn_key
  FROM dmrs_tables a
  INNER JOIN dmrs_foreignkeys f
  ON a.ovid = f.child_table_ovid
  AND a.table_name LIKE 'S_%'
  INNER JOIN dmrs_columns c
  ON c.container_ovid = a.ovid
  AND c.pk_flag      IS NULL
  AND c.fk_flag      IS NULL
  AND C.COLUMN_NAME NOT LIKE 'ETL_%'
  INNER JOIN vw_hubs h
  ON F.REFERRED_TABLE_OVID = h.hub_OVID
  INNER JOIN dmrs_large_text notes
  ON notes.ovid  = c.ovid
  AND notes.type = 'Note'

  ORDER BY a.table_name,
    c.sequence
  ) a,
  TABLE (fn_udf_split (a.text, chr(10))) b;

Links

For generating the meta data to load satellites related to hubs, we use the view VW_GEN_META_LINK_SAT:

CREATE OR REPLACE FORCE VIEW "JEDI_REPOSITORY"."VW_GEN_META_LINK_SAT" ("SAT_NAME", "SAT_LNK_TBL", "SAT_HUB_TBL_1", "SAT_HUB_TBL_2", "SAT_HUB_TBL_3", "SAT_HUB_TBL_4", "SAT_HUB_TBL_5", "SAT_HUB_TBL_6", "SAT_HUB_TBL_7", "SAT_HUB_TBL_8", "SAT_HUB_TBL_9", "SAT_HUB_TBL_10", "SAT_HUB_TBL_11", "SAT_HUB_TBL_12", "SAT_HUB_TBL_13", "SAT_HUB_TBL_14", "SAT_HUB_TBL_15", "SAT_HUB_KEY_1", "SAT_HUB_KEY_2", "SAT_HUB_KEY_3", "SAT_HUB_KEY_4", "SAT_HUB_KEY_5", "SAT_HUB_KEY_6", "SAT_HUB_KEY_7", "SAT_HUB_KEY_8", "SAT_HUB_KEY_9", "SAT_HUB_KEY_10", "SAT_HUB_KEY_11", "SAT_HUB_KEY_12", "SAT_HUB_KEY_13", "SAT_HUB_KEY_14", "SAT_HUB_KEY_15", "SRC_ENV", "SRC_TBL", "SRC_BSN_KEY_1", "SRC_BSN_KEY_2", "SRC_BSN_KEY_3", "SRC_BSN_KEY_4", "SRC_BSN_KEY_5", "SRC_BSN_KEY_6", "SRC_BSN_KEY_7", "SRC_BSN_KEY_8", "SRC_BSN_KEY_9", "SRC_BSN_KEY_10", "SRC_BSN_KEY_11", "SRC_BSN_KEY_12", "SRC_BSN_KEY_13", "SRC_BSN_KEY_14", "SRC_BSN_KEY_15", "SAT_COLS", "SRC_COLS", "ACTIVE_FLAG", "SAT_SEQUENCE") AS 
  SELECT sat_name,
  sat_lnk_tbl,
  MAX(
  CASE
    WHEN rn = 1
    THEN SAT_HUB_NAME
  END) AS sat_hub_tbl_1,
  MAX(
  CASE
    WHEN rn = 2
    THEN sat_hub_name
  END) AS sat_hub_tbl_2,
  MAX(
  CASE
    WHEN rn = 3
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_3,
  MAX(
  CASE
    WHEN rn = 4
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_4,
  MAX(
  CASE
    WHEN rn = 5
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_5,
  MAX(
  CASE
    WHEN rn = 6
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_6,
  MAX(
  CASE
    WHEN rn = 7
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_7,
  MAX(
  CASE
    WHEN rn = 8
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_8,
  MAX(
  CASE
    WHEN rn = 9
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_9,
  MAX(
  CASE
    WHEN rn = 10
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_10,
  MAX(
  CASE
    WHEN rn = 11
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_11,
  MAX(
  CASE
    WHEN rn = 12
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_12,
  MAX(
  CASE
    WHEN rn = 13
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_13,
  MAX(
  CASE
    WHEN rn = 14
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_14,
  MAX(
  CASE
    WHEN rn = 15
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_15,
  MAX(
  CASE
    WHEN RN = 1
    THEN SAT_HUB_BSN_KEY
  END) AS sat_hub_key_1,
  MAX(
  CASE
    WHEN rn = 2
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_2,
  MAX(
  CASE
    WHEN rn = 3
    THEN SAT_HUB_BSN_KEY
  END) AS sat_hub_key_3,
  MAX(
  CASE
    WHEN rn = 4
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_4,
  MAX(
  CASE
    WHEN rn = 5
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_5,
  MAX(
  CASE
    WHEN rn = 6
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_6,
  MAX(
  CASE
    WHEN rn = 7
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_7,
  MAX(
  CASE
    WHEN rn = 8
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_8,
  MAX(
  CASE
    WHEN rn = 9
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_9,
  MAX(
  CASE
    WHEN rn = 10
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_10,
  MAX(
  CASE
    WHEN rn = 11
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_11,
  MAX(
  CASE
    WHEN rn = 12
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_12,
  MAX(
  CASE
    WHEN rn = 13
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_13,
  MAX(
  CASE
    WHEN rn = 14
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_14,
  MAX(
  CASE
    WHEN rn = 15
    THEN sat_hub_bsn_key
  END) AS SAT_HUB_KEY_15 ,
  SRC_ENV,
  SRC_TBL,
  MAX(
  CASE
    WHEN RN = 1
    THEN src_bsn_key
  END) AS src_bsn_key_1,
  MAX(
  CASE
    WHEN rn = 2
    THEN src_bsn_key
  END) AS src_bsn_key_2,
  MAX(
  CASE
    WHEN rn = 3
    THEN src_bsn_key
  END) AS src_bsn_key_3,
  MAX(
  CASE
    WHEN rn = 4
    THEN src_bsn_key
  END) AS src_bsn_key_4,
  MAX(
  CASE
    WHEN rn = 5
    THEN src_bsn_key
  END) AS src_bsn_key_5,
  MAX(
  CASE
    WHEN rn = 6
    THEN src_bsn_key
  END) AS src_bsn_key_6,
  MAX(
  CASE
    WHEN rn = 7
    THEN src_bsn_key
  END) AS src_bsn_key_7,
  MAX(
  CASE
    WHEN rn = 8
    THEN src_bsn_key
  END) AS src_bsn_key_8,
  MAX(
  CASE
    WHEN rn = 9
    THEN src_bsn_key
  END) AS src_bsn_key_9,
  MAX(
  CASE
    WHEN rn = 10
    THEN src_bsn_key
  END) AS src_bsn_key_10,
  MAX(
  CASE
    WHEN rn = 11
    THEN src_bsn_key
  END) AS src_bsn_key_11,
  MAX(
  CASE
    WHEN rn = 12
    THEN src_bsn_key
  END) AS src_bsn_key_12,
  MAX(
  CASE
    WHEN rn = 13
    THEN src_bsn_key
  END) AS src_bsn_key_13,
  MAX(
  CASE
    WHEN rn = 14
    THEN src_bsn_key
  END) AS src_bsn_key_14,
  MAX(
  CASE
    WHEN rn = 15
    THEN SRC_BSN_KEY
  END) AS src_bsn_key_15,
  sat_cols,
  src_cols,
  active_flag,
  sat_sequence
FROM
  (SELECT A.TABLE_NAME                                 AS SAT_NAME,
    A.LNK_TBL                                          AS SAT_LNK_TBL,
    a.hub_name                                         AS sat_hub_name,
    a.hub_bsn_key                                      AS sat_hub_bsn_key,
    A.COLUMN_NAME                                      AS SAT_COLS,
    REGEXP_SUBSTR(TRIM(B.COLUMN_VALUE), '[^.]+', 1, 1) AS SRC_ENV ,
    REGEXP_SUBSTR(TRIM(B.COLUMN_VALUE), '[^.]+', 1, 2) AS SRC_TBL ,
    REGEXP_SUBSTR(TRIM(B.COLUMN_VALUE), '[^.]+', 1, 3) AS SRC_COLS,
    REGEXP_SUBSTR(TRIM(A.SRC_BSN_KEY), '[^.]+', 1, 3)  AS SRC_BSN_KEY,
    'Y'                                                AS ACTIVE_FLAG,
    CONCAT('SQN_', A.TABLE_NAME)                       AS SAT_SEQUENCE,
    row_number() OVER ( partition BY a.table_name, a.column_name order by rownum) rn
  FROM
    (SELECT A.TABLE_NAME,
      L.LINK_NAME AS LNK_TBL,
      c.column_name,
      C.SEQUENCE,
      TO_CHAR(NOTES.TEXT) AS text,
      h.hub_name,
      H.HUB_BSN_KEY,
      TO_CHAR(h.src) AS src_bsn_key
    FROM dmrs_tables a
    INNER JOIN dmrs_foreignkeys f
    ON a.ovid = f.child_table_ovid
    AND a.table_name LIKE 'S_%'
    INNER JOIN dmrs_columns c
    ON c.container_ovid = a.ovid
    AND c.pk_flag      IS NULL
    AND c.fk_flag      IS NULL
    AND C.COLUMN_NAME NOT LIKE 'ETL_%'
    INNER JOIN vw_links l
    ON F.REFERRED_TABLE_OVID = L.LINK_OVID
    INNER JOIN VW_HUBS H
    ON h.hub_ovid = l.lnk_hub_ovid
    INNER JOIN dmrs_large_text notes
    ON notes.ovid  = c.ovid
    AND notes.type = 'Note'
    ORDER BY a.table_name,
      c.sequence
    ) A,
    TABLE (FN_UDF_SPLIT (A.TEXT, CHR(10))) B
  ) c
GROUP BY SAT_NAME,
  sat_lnk_tbl,
  SRC_ENV,
  SRC_TBL,
  sat_cols,
  src_cols,
  active_flag,
  sat_sequence;

INFOR Data Vault Framework – sort of modeling automation

This documents describes the possibility of generating meta data out of the Oracle SDDM (SQL Developer Data Modeler) for loading the data vault (using the in house developed framework).

Model

We only do relational modeling and no logical modeling, because the relational model is basically the same. The data vault model adheres to a few standards for the scripts to work, as also partially described in the JEDI architecture document:

  • Hubs
    • Names start with H_
    • Business key
      • Has a unique constraint / index
      • Single column, always called BK (but this can be different, the code doesn’t check for the name)
  • Links
    • Names start with L_
  • Satellites
    • Names start with S_

Meta data

We define meta data for:

  • Hubs
  • Links
    • Maximum 15 hubs can be tied together in a link at the moment
  • Satellites
    • Related to Hubs
    • Related to Links

Mappings

When defining the mappings in the model, we use the following restrictions:

  • No soft business rules (data is not changed)
  • 1:1 copy of source
  • Entered in model column notes:
    • Format: SYSTEM.TABLE.COLUMN (separator is a dot)
    • Multiple sources entered on separate lines delimited by line feed (just hit enter after each source, but not after the last one)

Sample:

Reporting schema

In order to generate the meta data needed from the model design, we need to export it first to a Reporting Schema. Note: when you export a design to the Reporting Schema, versions are created. The queries below don’t take this into account, so for the moment you have to delete old designs from the Reporting Schema (this option can be found on the maintenance tab of the dialog you get when you export).

The following steps need to be executed to export the design to the Reporting Schema:

  • Make sure you have the JEDI warehouse design open
  • Choose File | Export | To Reporting Schema

  • Choose the JEDI_REPOSITORY for the connection
  • Choose Delete Designs to remove existing designs

  • In the pop-up screen, mark all existing designs and click Delete Selected

  • Click OK to confirm that designs have been deleted
  • Click OK to export

When you export a model design to an Oracle Reporting Schema (residing in a database) from the Oracle SDDM, a lot of tables are created that contain information about the design. The following tables from the Reporting Schema are used for the generation of the meta data:

  • DMRS_TABLES
    Contains information about the tables
  • DMRS_COLUMNS
    Contains information about the columns
  • DMRS_FOREIGNKEYS
    Contains information about the foreign keys
  • DMRS_INDEXES
    Contains information about the indexes
  • DMRS_CONSTR_INDEX_COLUMNS
    Contains information about the index columns
  • DMRS_LARGE_TEXT
    Contains information about the comments and the notes
  • DMRS_PK_OID_COLUMNS
    Contains information about the primary key columns

Note that there is a confirmed bug in the Reporting Schema that prevents you from joining the DMRS_CONSTR_INDEX_COLUMNS with DMRS_FOREIGNKEYS based on the keys of these tables.

We have created some functions (for which I have to thank Andreas Delmelle), that are not part of the Reporting schema:

  • Fn_instr_t
    searches for a given string occurring multiple times and returns surrounding positions, so that you can take substrings
  • Fn_udf_split
    splits a string delimited by another string and returns it as separate rows

Types:

create or replace type str_table_type is table of varchar2 (255);
create or replace TYPE nums_table_type AS TABLE OF NUMBER;

Fn_instr_t:

create or replace 
FUNCTION fn_instr_t (sbase IN VARCHAR2, slookup IN VARCHAR2)
    RETURN nums_table_type PIPELINED DETERMINISTIC
IS
    idx   NUMBER := COALESCE (INSTR (sbase, slookup), 0);
BEGIN
    LOOP
        EXIT WHEN idx = 0;
        PIPE ROW (idx);
        idx := INSTR (sbase, slookup, idx + 1);
    END LOOP;
END fn_instr_t;

Fn_udf_split:

create or replace function fn_udf_split ( p_str_in in varchar2, p_sep in varchar2 )   
return str_table_type pipelined deterministic   
is   
cursor cur_instr_t (p_str_in varchar2, p_sep varchar2)   
is   
        select rownum rn, max (rownum) over () max_rn, t.column_value pos   
from table (fn_instr_t (p_str_in, p_sep)) t;   

    str_tmp varchar2 (32767);   
    pos_prev number := 1;   

begin   
    if coalesce (instr (p_str_in, p_sep), 0) = 0 then   
        pipe row (p_str_in);   
    else   
        for i in cur_instr_t (p_str_in, p_sep)   
            loop pipe row (substr (p_str_in, pos_prev, i.pos pos_prev));  
            if i.rn = i.max_rn and length (p_str_in) > i.pos then   
                pipe row (substr (p_str_in, i.pos + length (p_sep)));   
            end if;   
            pos_prev := i.pos + length (p_sep);   
        end loop;   
    end if;   
end fn_udf_split;   

Queries / Views

The following queries can be implemented as views to generate the meta data we need.

Hubs

First we have a generic view VW_HUBS that joins some information together and that can be reused by other views:

CREATE OR REPLACE FORCE VIEW "JEDI_REPOSITORY"."VW_HUBS" ("HUB_OVID", "HUB_NAME", "HUB_BSN_KEY_OVID", "HUB_BSN_KEY", "HUB_SCHEMA", "HUB_UPD_VIEW", "SRC") AS 
  SELECT a.ovid   AS hub_ovid,
    A.TABLE_NAME  AS HUB_NAME,
    a.column_ovid as hub_bsn_key_ovid,
    a.column_name AS hub_bsn_key,
    a.schema_name AS hub_schema,
    CASE
      WHEN LENGTH(concat(concat('VW_', a.table_name), '_UPD')) <= 30
      THEN concat(concat('VW_', a.table_name), '_UPD')
      ELSE concat(concat('VW_', regexp_replace(a.table_name, '[AEUYOI]+', '')), '_UPD')
    END AS hub_upd_view,
    src
  FROM
    (SELECT
      CASE
        WHEN a.schema_name IS NOT NULL
        THEN a.schema_name
        ELSE 'JEDI_WAREHOUSE'
      END AS schema_name,
      a.ovid,
      A.TABLE_NAME,
      ic.column_ovid,
      ic.column_name,
      c.text as src
    FROM dmrs_tables a
    INNER JOIN dmrs_indexes i
    ON a.ovid = i.container_ovid
    AND a.table_name LIKE 'H_%'
    AND a.table_name != 'HUB_TEMPLATE'
    AND i.state       = 'Unique Constraint'
    INNER JOIN dmrs_constr_index_columns ic
    ON i.index_name = ic.index_name
    INNER JOIN dmrs_large_text c
    ON c.ovid  = ic.column_ovid
    AND c.type = 'Note'
    ) a
  ORDER BY 1;

The generate the meta data for loading the hubs, we use the view VW_GEN_META_HUB:

CREATE OR REPLACE FORCE VIEW "JEDI_REPOSITORY"."VW_GEN_META_HUB" ("HUB_NAME", "HUB_BSN_KEY_1", "HUB_BSN_KEY_2", "SRC_ENV", "SRC_TBL", "SRC_BSN_KEY_1", "HUB_SCHEMA", "HUB_UPD_VIEW", "ACTIVE_FLAG", "HUB_BSN_KEY_3", "HUB_BSN_KEY_4", "SRC_BSN_KEY_3", "SRC_BSN_KEY_4") AS 
  SELECT a.hub_name,
  a.hub_bsn_key                                      AS hub_bsn_key_1,
  NULL                                               AS hub_bsn_key_2,
  regexp_substr(trim(b.column_value), '[^.]+', 1, 1) AS src_env ,
  regexp_substr(trim(b.column_value), '[^.]+', 1, 2) AS src_tbl ,
  regexp_substr(trim(b.column_value), '[^.]+', 1, 3) AS src_bsn_key_1,
  a.hub_schema,
  a.hub_upd_view,
  1    AS active_flag,
  NULL AS hub_bsn_key_3,
  NULL AS hub_bsn_key_4,
  NULL AS src_bsn_key_3,
  NULL AS src_bsn_key_4
FROM vw_hubs a,
  TABLE (FN_UDF_SPLIT (A.SRC, CHR(10))) B
ORDER BY 1;

Links

First we have a generic view VW_LINKS that joins some information together and that can be reused by other views:

CREATE OR REPLACE FORCE VIEW "JEDI_REPOSITORY"."VW_LINKS" ("LINK_OVID", "LINK_NAME", "LNK_HUB_OVID", "LNK_HUB", "LNK_HUB_SQN_OVID", "LNK_HUB_SQN", "HUB_BSN_KEY_OVID", "HUB_BSN_KEY", "DM_VIEW_NAME", "SCHEMA_NAME", "LINK_UPD_VIEW") AS 
  SELECT DISTINCT
  f.child_table_ovid AS link_ovid,
  f.child_table_name AS link_name,
  F.REFERRED_TABLE_OVID AS LNK_HUB_OVID,
  F.REFERRED_TABLE_NAME AS LNK_HUB,
  ic.column_ovid as lnk_hub_sqn_ovid,
  IC.COLUMN_NAME AS LNK_HUB_SQN,
  P.COLUMN_OVID AS HUB_BSN_KEY_OVID,
  p.column_name as hub_bsn_key,
  CASE
    WHEN LENGTH(concat(concat('VW_', f.child_table_name), '_DM')) <= 30
    THEN concat(concat('VW_', f.child_table_name), '_DM')
    ELSE concat(concat('VW_', regexp_replace(f.child_table_name, '[AEUYOI]+', '')), '_DM')
  END AS dm_view_name,
  CASE
    WHEN t.schema_name IS NOT NULL
    THEN t.schema_name
    ELSE 'JEDI_WAREHOUSE'
  END AS schema_name,
  CASE
    WHEN LENGTH(concat(concat('VW_', f.child_table_name), '_UPD')) <= 30
    THEN concat(concat('VW_', f.child_table_name), '_UPD')
    ELSE concat(concat('VW_', regexp_replace(f.child_table_name, '[AEUYOI]+', '')), '_UPD')
  END AS link_upd_view
FROM dmrs_foreignkeys f
INNER JOIN dmrs_tables t
ON f.child_table_ovid = t.ovid
AND t.table_name LIKE 'L_%'
INNER JOIN dmrs_constr_index_columns ic
ON f.fk_name = ic.index_name
INNER JOIN dmrs_pk_oid_columns p
ON p.table_ovid = f.referred_table_ovid
ORDER BY 2;

The generate the meta data for loading the hubs, we use the view VW_GEN_META_LINK:

CREATE OR REPLACE FORCE VIEW "JEDI_REPOSITORY"."VW_GEN_META_LINK" ("LINK_NAME", "HUB_1_UPD_VIEW", "HUB_2_UPD_VIEW", "HUB_3_UPD_VIEW", "HUB_4_UPD_VIEW", "HUB_5_UPD_VIEW", "HUB_6_UPD_VIEW", "HUB_7_UPD_VIEW", "HUB_8_UPD_VIEW", "HUB_9_UPD_VIEW", "HUB_10_UPD_VIEW", "HUB_11_UPD_VIEW", "HUB_12_UPD_VIEW", "HUB_13_UPD_VIEW", "HUB_14_UPD_VIEW", "HUB_15_UPD_VIEW", "LNK_HUB_1_SQN", "LNK_HUB_2_SQN", "LNK_HUB_3_SQN", "LNK_HUB_4_SQN", "LNK_HUB_5_SQN", "LNK_HUB_6_SQN", "LNK_HUB_7_SQN", "LNK_HUB_8_SQN", "LNK_HUB_9_SQN", "LNK_HUB_10_SQN", "LNK_HUB_11_SQN", "LNK_HUB_12_SQN", "LNK_HUB_13_SQN", "LNK_HUB_14_SQN", "LNK_HUB_15_SQN", "DM_VIEW_NAME", "SCHEMA_NAME", "LINK_UPD_VIEW", "SRC_ENV", "SRC_TBL", "SRC_BSN_KEY_1", "SRC_BSN_KEY_2", "SRC_BSN_KEY_3", "SRC_BSN_KEY_4", "SRC_BSN_KEY_5", "SRC_BSN_KEY_6", "SRC_BSN_KEY_7", "SRC_BSN_KEY_8", "SRC_BSN_KEY_9", "SRC_BSN_KEY_10", "SRC_BSN_KEY_11", "SRC_BSN_KEY_12", "SRC_BSN_KEY_13", "SRC_BSN_KEY_14", "SRC_BSN_KEY_15") AS 
  SELECT link_name,
  MAX(
  CASE
    WHEN rn = 1
    THEN hub_upd_view
  END) AS hub_1_upd_view,
  MAX(
  CASE
    WHEN rn = 2
    THEN hub_upd_view
  END) AS hub_2_upd_view,
  MAX(
  CASE
    WHEN rn = 3
    THEN hub_upd_view
  END) AS hub_3_upd_view,
  MAX(
  CASE
    WHEN rn = 4
    THEN hub_upd_view
  END) AS hub_4_upd_view,
  MAX(
  CASE
    WHEN rn = 5
    THEN hub_upd_view
  END) AS hub_5_upd_view,
  MAX(
  CASE
    WHEN rn = 6
    THEN hub_upd_view
  END) AS hub_6_upd_view,
  MAX(
  CASE
    WHEN rn = 7
    THEN hub_upd_view
  END) AS hub_7_upd_view,
  MAX(
  CASE
    WHEN rn = 8
    THEN hub_upd_view
  END) AS hub_8_upd_view,
  MAX(
  CASE
    WHEN rn = 9
    THEN hub_upd_view
  END) AS hub_9_upd_view,
  MAX(
  CASE
    WHEN rn = 10
    THEN hub_upd_view
  END) AS hub_10_upd_view,
  MAX(
  CASE
    WHEN rn = 11
    THEN hub_upd_view
  END) AS hub_11_upd_view,
  MAX(
  CASE
    WHEN rn = 12
    THEN hub_upd_view
  END) AS hub_12_upd_view,
  MAX(
  CASE
    WHEN rn = 13
    THEN hub_upd_view
  END) AS hub_13_upd_view,
  MAX(
  CASE
    WHEN rn = 14
    THEN hub_upd_view
  END) AS hub_14_upd_view,
  MAX(
  CASE
    WHEN rn = 15
    THEN hub_upd_view
  END) AS hub_15_upd_view,
  MAX(
  CASE
    WHEN rn = 1
    THEN lnk_hub_sqn
  END) AS lnk_hub_1_sqn,
  MAX(
  CASE
    WHEN rn = 2
    THEN lnk_hub_sqn
  END) AS lnk_hub_2_sqn,
  MAX(
  CASE
    WHEN rn = 3
    THEN lnk_hub_sqn
  END) AS lnk_hub_3_sqn,
  MAX(
  CASE
    WHEN rn = 4
    THEN lnk_hub_sqn
  END) AS lnk_hub_4_sqn,
  MAX(
  CASE
    WHEN rn = 5
    THEN lnk_hub_sqn
  END) AS lnk_hub_5_sqn,
  MAX(
  CASE
    WHEN rn = 6
    THEN lnk_hub_sqn
  END) AS lnk_hub_6_sqn,
  MAX(
  CASE
    WHEN rn = 7
    THEN lnk_hub_sqn
  END) AS lnk_hub_7_sqn,
  MAX(
  CASE
    WHEN rn = 8
    THEN lnk_hub_sqn
  END) AS lnk_hub_8_sqn,
  MAX(
  CASE
    WHEN rn = 9
    THEN lnk_hub_sqn
  END) AS lnk_hub_9_sqn,
  MAX(
  CASE
    WHEN rn = 10
    THEN lnk_hub_sqn
  END) AS lnk_hub_10_sqn,
  MAX(
  CASE
    WHEN rn = 11
    THEN lnk_hub_sqn
  END) AS lnk_hub_11_sqn,
  MAX(
  CASE
    WHEN rn = 12
    THEN lnk_hub_sqn
  END) AS lnk_hub_12_sqn,
  MAX(
  CASE
    WHEN rn = 13
    THEN lnk_hub_sqn
  END) AS lnk_hub_13_sqn,
  MAX(
  CASE
    WHEN rn = 14
    THEN lnk_hub_sqn
  END) AS lnk_hub_14_sqn,
  MAX(
  CASE
    WHEN rn = 15
    THEN lnk_hub_sqn
  END) AS lnk_hub_15_sqn,
  dm_view_name,
  schema_name,
  link_upd_view,
  src_env,
  src_tbl,
  MAX(
  CASE
    WHEN rn = 1
    THEN src_bsn_key
  END) AS src_bsn_key_1,
  MAX(
  CASE
    WHEN rn = 2
    THEN src_bsn_key
  END) AS src_bsn_key_2,
  MAX(
  CASE
    WHEN rn = 3
    THEN src_bsn_key
  END) AS src_bsn_key_3,
  MAX(
  CASE
    WHEN rn = 4
    THEN src_bsn_key
  END) AS src_bsn_key_4,
  MAX(
  CASE
    WHEN rn = 5
    THEN src_bsn_key
  END) AS src_bsn_key_5,
  MAX(
  CASE
    WHEN rn = 6
    THEN src_bsn_key
  END) AS src_bsn_key_6,
  MAX(
  CASE
    WHEN rn = 7
    THEN src_bsn_key
  END) AS src_bsn_key_7,
  MAX(
  CASE
    WHEN rn = 8
    THEN src_bsn_key
  END) AS src_bsn_key_8,
  MAX(
  CASE
    WHEN rn = 9
    THEN src_bsn_key
  END) AS src_bsn_key_9,
  MAX(
  CASE
    WHEN rn = 10
    THEN src_bsn_key
  END) AS src_bsn_key_10,
  MAX(
  CASE
    WHEN rn = 11
    THEN src_bsn_key
  END) AS src_bsn_key_11,
  MAX(
  CASE
    WHEN rn = 12
    THEN src_bsn_key
  END) AS src_bsn_key_12,
  MAX(
  CASE
    WHEN rn = 13
    THEN src_bsn_key
  END) AS src_bsn_key_13,
  MAX(
  CASE
    WHEN rn = 14
    THEN src_bsn_key
  END) AS src_bsn_key_14,
  MAX(
  CASE
    WHEN rn = 15
    THEN src_bsn_key
  END) AS src_bsn_key_15
FROM
  (SELECT a.link_name,
    a.hub_upd_view,
    a.lnk_hub_sqn,
    a.dm_view_name,
    a.schema_name,
    a.link_upd_view,
    regexp_substr(trim(b.column_value), '[^.]+', 1, 1) AS src_env ,
    regexp_substr(trim(b.column_value), '[^.]+', 1, 2) AS src_tbl ,
    regexp_substr(trim(b.column_value), '[^.]+', 1, 3) AS src_bsn_key,
    row_number() OVER ( partition BY a.link_name order by rownum) rn
  FROM
    ( SELECT DISTINCT  
      L.LINK_NAME,
      l.LNK_HUB_SQN,
      L.DM_VIEW_NAME,
      L.LINK_UPD_VIEW,
      H.HUB_SCHEMA AS SCHEMA_NAME,
      h.hub_upd_view,
      TO_CHAR(NOTES.TEXT) AS TEXT
    FROM VW_LINKS L
    INNER JOIN VW_HUBS H
    ON L.LNK_HUB_OVID = H.HUB_OVID
    INNER JOIN DMRS_LARGE_TEXT NOTES
    ON l.lnk_hub_sqn_ovid = notes.ovid
    AND notes.type = 'Note'
    ) a,
    TABLE (fn_udf_split (a.text, chr(10))) b
  )
GROUP BY link_name,
  dm_view_name,
  schema_name,
  link_upd_view,
  src_env,
  src_tbl;

Satellites

Hubs

For generating the meta data to load satellites related to hubs, we use the view VW_GEN_META_HUB_SAT:

CREATE OR REPLACE FORCE VIEW "JEDI_REPOSITORY"."VW_GEN_META_HUB_SAT" ("SAT_NAME", "SAT_HUB_TBL", "SAT_HUB_BSN_KEY", "SAT_COLS", "SRC_ENV", "SRC_TBL", "SRC_BSN_KEY", "SRC_COLS", "ACTIVE_FLAG", "SAT_SEQUENCE") AS 
  SELECT a.table_name                                  AS sat_name,
  a.hub_tbl                                          AS sat_hub_tbl,
  a.hub_bsn_key                                      AS sat_hub_bsn_key,
  a.column_name                                      AS sat_cols,
  regexp_substr(trim(b.column_value), '[^.]+', 1, 1) AS src_env ,
  regexp_substr(trim(b.column_value), '[^.]+', 1, 2) AS src_tbl ,
  regexp_substr(trim(a.src_bsn_key), '[^.]+', 1, 3)  AS src_bsn_key,
  regexp_substr(trim(a.text), '[^.]+', 1, 3)         AS src_cols,
  'Y'                                                AS active_flag,
  concat('SQN_', a.table_name)                       AS sat_sequence
FROM
  (SELECT A.TABLE_NAME,
    h.hub_name AS hub_tbl,
    c.column_name,
    c.sequence,
    NOTES.TEXT,
    h.hub_bsn_key,
    h.src    AS src_bsn_key
  FROM dmrs_tables a
  INNER JOIN dmrs_foreignkeys f
  ON a.ovid = f.child_table_ovid
  AND a.table_name LIKE 'S_%'
  INNER JOIN dmrs_columns c
  ON c.container_ovid = a.ovid
  AND c.pk_flag      IS NULL
  AND c.fk_flag      IS NULL
  AND C.COLUMN_NAME NOT LIKE 'ETL_%'
  INNER JOIN vw_hubs h
  ON F.REFERRED_TABLE_OVID = h.hub_OVID
  INNER JOIN dmrs_large_text notes
  ON notes.ovid  = c.ovid
  AND notes.type = 'Note'

  ORDER BY a.table_name,
    c.sequence
  ) a,
  TABLE (fn_udf_split (a.text, chr(10))) b;

Links

For generating the meta data to load satellites related to hubs, we use the view VW_GEN_META_LINK_SAT:

CREATE OR REPLACE FORCE VIEW "JEDI_REPOSITORY"."VW_GEN_META_LINK_SAT" ("SAT_NAME", "SAT_LNK_TBL", "SAT_HUB_TBL_1", "SAT_HUB_TBL_2", "SAT_HUB_TBL_3", "SAT_HUB_TBL_4", "SAT_HUB_TBL_5", "SAT_HUB_TBL_6", "SAT_HUB_TBL_7", "SAT_HUB_TBL_8", "SAT_HUB_TBL_9", "SAT_HUB_TBL_10", "SAT_HUB_TBL_11", "SAT_HUB_TBL_12", "SAT_HUB_TBL_13", "SAT_HUB_TBL_14", "SAT_HUB_TBL_15", "SAT_HUB_KEY_1", "SAT_HUB_KEY_2", "SAT_HUB_KEY_3", "SAT_HUB_KEY_4", "SAT_HUB_KEY_5", "SAT_HUB_KEY_6", "SAT_HUB_KEY_7", "SAT_HUB_KEY_8", "SAT_HUB_KEY_9", "SAT_HUB_KEY_10", "SAT_HUB_KEY_11", "SAT_HUB_KEY_12", "SAT_HUB_KEY_13", "SAT_HUB_KEY_14", "SAT_HUB_KEY_15", "SRC_ENV", "SRC_TBL", "SRC_BSN_KEY_1", "SRC_BSN_KEY_2", "SRC_BSN_KEY_3", "SRC_BSN_KEY_4", "SRC_BSN_KEY_5", "SRC_BSN_KEY_6", "SRC_BSN_KEY_7", "SRC_BSN_KEY_8", "SRC_BSN_KEY_9", "SRC_BSN_KEY_10", "SRC_BSN_KEY_11", "SRC_BSN_KEY_12", "SRC_BSN_KEY_13", "SRC_BSN_KEY_14", "SRC_BSN_KEY_15", "SAT_COLS", "SRC_COLS", "ACTIVE_FLAG", "SAT_SEQUENCE") AS 
  SELECT sat_name,
  sat_lnk_tbl,
  MAX(
  CASE
    WHEN rn = 1
    THEN SAT_HUB_NAME
  END) AS sat_hub_tbl_1,
  MAX(
  CASE
    WHEN rn = 2
    THEN sat_hub_name
  END) AS sat_hub_tbl_2,
  MAX(
  CASE
    WHEN rn = 3
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_3,
  MAX(
  CASE
    WHEN rn = 4
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_4,
  MAX(
  CASE
    WHEN rn = 5
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_5,
  MAX(
  CASE
    WHEN rn = 6
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_6,
  MAX(
  CASE
    WHEN rn = 7
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_7,
  MAX(
  CASE
    WHEN rn = 8
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_8,
  MAX(
  CASE
    WHEN rn = 9
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_9,
  MAX(
  CASE
    WHEN rn = 10
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_10,
  MAX(
  CASE
    WHEN rn = 11
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_11,
  MAX(
  CASE
    WHEN rn = 12
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_12,
  MAX(
  CASE
    WHEN rn = 13
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_13,
  MAX(
  CASE
    WHEN rn = 14
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_14,
  MAX(
  CASE
    WHEN rn = 15
    THEN sat_hub_name
  END) AS SAT_HUB_TBL_15,
  MAX(
  CASE
    WHEN RN = 1
    THEN SAT_HUB_BSN_KEY
  END) AS sat_hub_key_1,
  MAX(
  CASE
    WHEN rn = 2
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_2,
  MAX(
  CASE
    WHEN rn = 3
    THEN SAT_HUB_BSN_KEY
  END) AS sat_hub_key_3,
  MAX(
  CASE
    WHEN rn = 4
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_4,
  MAX(
  CASE
    WHEN rn = 5
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_5,
  MAX(
  CASE
    WHEN rn = 6
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_6,
  MAX(
  CASE
    WHEN rn = 7
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_7,
  MAX(
  CASE
    WHEN rn = 8
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_8,
  MAX(
  CASE
    WHEN rn = 9
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_9,
  MAX(
  CASE
    WHEN rn = 10
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_10,
  MAX(
  CASE
    WHEN rn = 11
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_11,
  MAX(
  CASE
    WHEN rn = 12
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_12,
  MAX(
  CASE
    WHEN rn = 13
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_13,
  MAX(
  CASE
    WHEN rn = 14
    THEN sat_hub_bsn_key
  END) AS sat_hub_key_14,
  MAX(
  CASE
    WHEN rn = 15
    THEN sat_hub_bsn_key
  END) AS SAT_HUB_KEY_15 ,
  SRC_ENV,
  SRC_TBL,
  MAX(
  CASE
    WHEN RN = 1
    THEN src_bsn_key
  END) AS src_bsn_key_1,
  MAX(
  CASE
    WHEN rn = 2
    THEN src_bsn_key
  END) AS src_bsn_key_2,
  MAX(
  CASE
    WHEN rn = 3
    THEN src_bsn_key
  END) AS src_bsn_key_3,
  MAX(
  CASE
    WHEN rn = 4
    THEN src_bsn_key
  END) AS src_bsn_key_4,
  MAX(
  CASE
    WHEN rn = 5
    THEN src_bsn_key
  END) AS src_bsn_key_5,
  MAX(
  CASE
    WHEN rn = 6
    THEN src_bsn_key
  END) AS src_bsn_key_6,
  MAX(
  CASE
    WHEN rn = 7
    THEN src_bsn_key
  END) AS src_bsn_key_7,
  MAX(
  CASE
    WHEN rn = 8
    THEN src_bsn_key
  END) AS src_bsn_key_8,
  MAX(
  CASE
    WHEN rn = 9
    THEN src_bsn_key
  END) AS src_bsn_key_9,
  MAX(
  CASE
    WHEN rn = 10
    THEN src_bsn_key
  END) AS src_bsn_key_10,
  MAX(
  CASE
    WHEN rn = 11
    THEN src_bsn_key
  END) AS src_bsn_key_11,
  MAX(
  CASE
    WHEN rn = 12
    THEN src_bsn_key
  END) AS src_bsn_key_12,
  MAX(
  CASE
    WHEN rn = 13
    THEN src_bsn_key
  END) AS src_bsn_key_13,
  MAX(
  CASE
    WHEN rn = 14
    THEN src_bsn_key
  END) AS src_bsn_key_14,
  MAX(
  CASE
    WHEN rn = 15
    THEN SRC_BSN_KEY
  END) AS src_bsn_key_15,
  sat_cols,
  src_cols,
  active_flag,
  sat_sequence
FROM
  (SELECT A.TABLE_NAME                                 AS SAT_NAME,
    A.LNK_TBL                                          AS SAT_LNK_TBL,
    a.hub_name                                         AS sat_hub_name,
    a.hub_bsn_key                                      AS sat_hub_bsn_key,
    A.COLUMN_NAME                                      AS SAT_COLS,
    REGEXP_SUBSTR(TRIM(B.COLUMN_VALUE), '[^.]+', 1, 1) AS SRC_ENV ,
    REGEXP_SUBSTR(TRIM(B.COLUMN_VALUE), '[^.]+', 1, 2) AS SRC_TBL ,
    REGEXP_SUBSTR(TRIM(B.COLUMN_VALUE), '[^.]+', 1, 3) AS SRC_COLS,
    REGEXP_SUBSTR(TRIM(A.SRC_BSN_KEY), '[^.]+', 1, 3)  AS SRC_BSN_KEY,
    'Y'                                                AS ACTIVE_FLAG,
    CONCAT('SQN_', A.TABLE_NAME)                       AS SAT_SEQUENCE,
    row_number() OVER ( partition BY a.table_name, a.column_name order by rownum) rn
  FROM
    (SELECT A.TABLE_NAME,
      L.LINK_NAME AS LNK_TBL,
      c.column_name,
      C.SEQUENCE,
      TO_CHAR(NOTES.TEXT) AS text,
      h.hub_name,
      H.HUB_BSN_KEY,
      TO_CHAR(h.src) AS src_bsn_key
    FROM dmrs_tables a
    INNER JOIN dmrs_foreignkeys f
    ON a.ovid = f.child_table_ovid
    AND a.table_name LIKE 'S_%'
    INNER JOIN dmrs_columns c
    ON c.container_ovid = a.ovid
    AND c.pk_flag      IS NULL
    AND c.fk_flag      IS NULL
    AND C.COLUMN_NAME NOT LIKE 'ETL_%'
    INNER JOIN vw_links l
    ON F.REFERRED_TABLE_OVID = L.LINK_OVID
    INNER JOIN VW_HUBS H
    ON h.hub_ovid = l.lnk_hub_ovid
    INNER JOIN dmrs_large_text notes
    ON notes.ovid  = c.ovid
    AND notes.type = 'Note'
    ORDER BY a.table_name,
      c.sequence
    ) A,
    TABLE (FN_UDF_SPLIT (A.TEXT, CHR(10))) B
  ) c
GROUP BY SAT_NAME,
  sat_lnk_tbl,
  SRC_ENV,
  SRC_TBL,
  sat_cols,
  src_cols,
  active_flag,
  sat_sequence;

INFOR Data Vault Framework – modeling and metadata

As mentioned in my previous post, we are looking at a different way to produce the metadata needed for the Data Vault Framework. Excel isn’t our thing.

We are currently using Oracle for our database and do the modeling of the Data Vault by hand using Oracle SQL Developer Data Modeler. We believe that modeling by hand is better than a Data Vault model that is generated based on a source model.

In the modeler, we are able to specify the metadata we need for the automation framework. We use the “notes” field of tables and columns to specify the source(s).

Getting that metadata out again in a format we want, is quite a challenge. Up to now, we have three different possibilities:

  • use the reporting schema capabilities, which basically stores the model into a database schema with a lot of tables, and write SQL queries to combine the necessary information and load it directly into the metadata tables that are used by the automation framework
  • use transformation scripts within the modeler, written using java script, that take parts from the underlying design and “does something with it (still to be defined what exactly)
  • use a combination of both

To be continued…

INFOR Data Vault Framework

Some may know that I have been working on porting the concepts of the PDI Data Vault Framework by Edwin Weber (@edwin_weber) to Informatica and Oracle.

Together with my team we have made quite some progress already, but it could be that this effort will be stopped for reasons I can’t disclose yet.

However, whether we continue this effort or not, we may have found an alternative for using the Excel sheets as meta data entry for the framework to work. More on this will come soon, when we have worked it out in more detail.

Stay tuned.

The “catch” with data warehouse automation tools

During my evaluation of several data warehouse automation tools such as BIReady, Quipu and RapidAce, I have come to some sort of conclusion that is crucial to the success of using these tools.

As most of these tools take the source data models as a starting point, you better make sure it is correctly modeled. Even with BIReady that takes a “business” model, you need to have a good model. By correctly modeled, I mean that preferably your source is modeled according to 3NF. When reverse engineering an existing database model, make sure primary keys and foreign keys are defined.

If not, you can be sure that the resulting generated data warehouse (datavault) models are pretty worthless.

I noticed this when using some of the tools on a source model that I have at hand from a client. This model is basically based on flat wide files loaded into (flat wide) tables. Primary keys are sometimes not defined. Foreign keys almost do not exist at all. Normalization is not done.

You can argue whether this is a true source model. It is not, that is true. But it is all we have. A situation that you will probably encounter very often.

BIReady evaluation continues…

The issue that I had with the ODBC connection has been solved. I was using a 64-bit driver and should have been using the 32-bit driver for MySQL.

Thanks to Jan Vos of BIReady for helping me out! I will now continue my evaluation and post an update soon.

However, I’m under NDA, so I need to check what I can and cannot post here.