Skip to content.

Sections
Home » Resource Center » Real-World Decision Support (RWDS) Journal » July 2002 - Volume 1, Issue 16 » So What Normal Form is the Dimensional Data Model Anyway?

So What Normal Form is the Dimensional Data Model Anyway?

by Madhur Limdi - Netra Technologies, Inc.

We have all read or heard a dimensional data model being described as de-normalized. That statement seems pretty vague especially if one cares to know what normal form is the dimensional data model? Dimensional data models come in different schema's star, snow flake or variants of those. On top of that we have two specific types of entities Fact and Dimension. Given all this information how does one go about determining what normal form is a dimensional data model anyway?

The approach applied below is to walk through the first Three Normal Forms, related rules and verify which rules if any are upheld and identify ones violated in a sample dimensional data model.

First Normal Form Rules:

  • Assign a unique identifier for every record in each entity i.e. primary key
  • Ensure all attributes in every entity are in their basic form
  • Remove attributes which can be derived (i.e. computed from other attributes)
  • We eliminate repeating groups, i.e. move related repeating attributes into a separate entity.

Figure 1

In a dimensional data model as depicted above:

  1. We definitely assign a primary key, as the unique identifier. In a fact table the primary key is usually comprised of the foreign keys, as above the primary key for Promotional_Analysis_Fact is the combination of Time_Id, Cust_Id, Product_Id, Store_Id and Promotion_Id. While for the dimension tables we tend to choose surrogate keys that are artificial keys generated and managed within the data warehouse.
  2. Consider Gross Revenue attribute, this attribute can be derived by computing the product of Quantity Sold and Product Price attributes at real-time. But in order to reduce the query response time we store the derived data i.e. Gross Revenue in the Promotional_Analysis_Fact.
  3. In our dimension entities often the attributes are not in their basic form. In the above diagram the customer name attribute contains both the First Name and Last Name in the same attribute.
  4. To accommodate certain business needs for keeping track of history, we apply slowly changing dimension modeling technique, where we may include repeating attributes. E.g.: In the business situation above the store changes regions frequently and the client wants to keep track of the last and the current region the store belongs to. Here we may choose to apply Type 3 Slowly Changing Dimension technique. Type 3 technique requires us to create a "Old Region Id" and a "Current Region Id" attribute; thus we are effectively including repeating attributes.

Since three out of the four rules for the first normal form are violated this dimensional data model definitely cannot be referred to as a first normal form.

Second Normal Form Rule:

  • If an attribute does not depend on the entire primary key move it to a separate entity.

Figure 2

In a dimensional data model as above:

We have added attributes that do not depend functionally on the key, consider the timestamp attribute: DW_Load_Timestamp. This attribute is added to support the control and audit steps associated with the extract, transform and load processes and is in no way functionally dependent on the key. So this dimensional data model violates the second normal form rule.

Third Normal Form Rule:

  • Move attributes not contributing to the description of the key to separate entities.

In a dimensional data model as below:

Figure 3

It can be argued that the Promotion Discount attribute in the above diagram as per the rules of third normal form should ideally be in the Promotion Dimension entity. So this dimensional data model does not conform to the third normal form rule. Also no dimensional data modeler will not allow moving the Promotion_Discount attribute to the Promotion Dimension.

So the above three examples show us that the only rule we surely uphold from the first three normal forms is the creation of a unique identifier "Primary Key" for every dimension and fact table. In my experience most dimensional data models share the same characteristics will typically violate most of the above rules as discussed. Thus the typical dimensional data model is not even in the first normal form, so why not just call it De-normalized Form huh?