A Conceptual Data Model is an essential component for describing business data and should be part of every solution model. Conceptual Data Models (CDM) have relationships that should be resolved in the logical data model
Many organizations take a shortcut on the time required for modeling by skipping the vital step of developing a Conceptual Data Model (CDM). Conceptual Data Models describe the business and should be developed before attempting to develop any kind of solution model, whether for a Data Warehouse / business intelligence effort, or any other application.
A conceptual data model is the most abstract-level data model or summary-level data model. Information specific to the platform and other implementation information such as interface definition or procedures are eliminated from this data model. A conceptual data model is useful due to its simplicity. It is often used for communicating ideas and in strategic data projects.
In the Conceptual Data Model, key business entities or objects and their relationships are identified. In the CDM it is common to discover a significant number of many to many (M:M) relationships – how these relationships are addressed in the logical model can make the difference between a successful or failed application.
Many-to-Many Relationships and the CDM
A many-to-many relationship does not just affect the logical data model – it can affect nearly every aspect of an application. Apart from the tremendous impact to data design and data quality, other aspects of the application are much more complex. Implications of neglecting to develop a conceptual data model that identifies the many-to-many relationships include:
- Project scheduling and estimation issues
- Application design and development confusion
- User interface issues
- SQL errors
- Report challenges
- Data exchange interface confusion
- Testing challenges
- Maintenance and support issues
Typically an M:M (or many to one) relationship is resolved by an associative entity (though this is not the only way to solve a many-to-many relationship), therefore one more table and one more relationship is involved which have to be accommodated in the application. The bad news is that the other ways of dealing with an M:M have their own complexities and compromises. The good news is that by identifying these complexities in advance (during requirements gathering) the team can scope the project more accurately, define and understand requirements, and increase the likelihood of coming in on time and under budget with the correct solution.
Assume that in the Conceptual Data Model, 50% of relationships are M:M. This can be a reasonable assumption since one must account for longitudinal or over time perspectives – a CDM is not a state model and so the model needs to reflect the “over time” view. Assume that 65% of M:M relationships in the CDM must be resolved by an associative entity. Now, assume there are 50 conceptual entities and 100 relationships overall – roughly 32 of these relationships would be resolved with an associative entity. This means there are an additional 64 new objects to the logical model (32 associative entities/tables plus 32 extra relationships, increasing complexity by over 40%.
Methods for Resolving Many-to-Many Relationships
There are several methods for resolving an M:M relationship. These relationships can arise from different scenarios – it may be helpful to be cognizant of the origination of the M:M to help determine the best way to resolve each relationship.
- Inherent nature of the business. E.g. A customer may have many accounts, an account may have many customers. This is resolved through an associative entity (Customer / Account)
- Longitudinal perspective. Something that normally would be thought of as a 1:M may over time become M:M. For example, a clerk works for only 1 store in the chain at a time, but over time a clerk can move and work at different stores in the organization. This situation can be resolved through an associative entity (Clerk / Store), or through other means in programming.
- Some conceptual entities may be implemented as an attribute in the logical model – but there may still be M:M’s to resolve. For example, in a de-normalized model it may be acceptable or desirable for a dimension to have repeating groups (e.g. for a Type 3 Slowly Changing Dimension)
Resolving a M:M relationship is not a trivial affair, so identifying these relationships in advance is critical to understanding the complexity of the application. The Conceptual Data Model, besides its benefits for semantic reconciliation, improved business/IT alignment, data design and quality is just the tool to identify these many to many relationships.
Not just an organization’s data is at stake – the success of the entire initiative is at risk when the design does not identify relationships properly. For any project to be successful, identify the correct data relationships through the development of a Conceptual Data Model. A CDM should be done before designing the logical data model or the physical data model / database. Uncovering these M:M’s late in a project or in production results in exponential cost increases.