Many to Many relationship in Dimension Modelling - Part 2 (Arrays / Alternate Hierarchy)
Continuing to the business problem mentioned in last post, consider a retail firm.
They sell many products, one of it is Umbrella. According to business owner, Umbrella falls under two Product Groups: Fashion and Seasonal. So they want to capture the Umbrella sales under both Product Groups.
Now, say we create below schema to hold the data.
Sales
Invoice Prod_ID QTY
1 1 2
2 1 1
3 1 4
4 2 3
PRODUCT
Prod_ID Prod_Name Prod_Group
1 Umbrella Fashion
2 Paracetamol Medical
Notice we can not add the information that Umbrella also belongs to Seasonal as a new row. (Prod_ID will be primary key and we need one and only one entry in the dimension table. If we add multiple entries, the facts will be double counted)
So we are left with a option to create another column. Say, we call it Secondary_Prod_Group. So our table looks like this.
PRODUCT
Prod_ID Prod_Name Prod_Group Secondary_Prod_Group
1 Umbrella Fashion Seasonal
2 Paracetamol Medical Seasonal
3 Babyspoon Medical Other
Now, if business wants to report on Sales figure for FASHION, it is possible to generate it. Also, Sales figure for SEASONAL will be correct.
If you look at this model functionally, it is forcing the business to look at sales from one of the two perspectives. So, either classify the sales as Fashion, Medical, etc or classify them as Seasonal and Other.
This can be then modelled as 'alternate hierarchies' in the Product dimension.
Please note that it does not give a clear trend. Every time there is jump in Umbrella sales, they will be jump in Fashion sales and Seasonal sales. This is 'fundamental flaw' I was talking about. We will talk about it in my next post.
Also, in above approach, there is a limitation on number of Product Groups a Product can belong to. You would design the schema with two or three or whatever number of columns, and that becomes the upper limit. We will see how this problem can be solved with a different approach called 'Bridging'!!
They sell many products, one of it is Umbrella. According to business owner, Umbrella falls under two Product Groups: Fashion and Seasonal. So they want to capture the Umbrella sales under both Product Groups.
Now, say we create below schema to hold the data.
Sales
Invoice Prod_ID QTY
1 1 2
2 1 1
3 1 4
4 2 3
PRODUCT
Prod_ID Prod_Name Prod_Group
1 Umbrella Fashion
2 Paracetamol Medical
Notice we can not add the information that Umbrella also belongs to Seasonal as a new row. (Prod_ID will be primary key and we need one and only one entry in the dimension table. If we add multiple entries, the facts will be double counted)
So we are left with a option to create another column. Say, we call it Secondary_Prod_Group. So our table looks like this.
PRODUCT
Prod_ID Prod_Name Prod_Group Secondary_Prod_Group
1 Umbrella Fashion Seasonal
2 Paracetamol Medical Seasonal
3 Babyspoon Medical Other
Now, if business wants to report on Sales figure for FASHION, it is possible to generate it. Also, Sales figure for SEASONAL will be correct.
If you look at this model functionally, it is forcing the business to look at sales from one of the two perspectives. So, either classify the sales as Fashion, Medical, etc or classify them as Seasonal and Other.
This can be then modelled as 'alternate hierarchies' in the Product dimension.
Please note that it does not give a clear trend. Every time there is jump in Umbrella sales, they will be jump in Fashion sales and Seasonal sales. This is 'fundamental flaw' I was talking about. We will talk about it in my next post.
Also, in above approach, there is a limitation on number of Product Groups a Product can belong to. You would design the schema with two or three or whatever number of columns, and that becomes the upper limit. We will see how this problem can be solved with a different approach called 'Bridging'!!
Comments