A well-modeled data dictionary can provide lasting value and ensure consistency of use throughout an organization for all data elements.
The primary purpose organizations create and maintain a data dictionary is to standardize data content , context, and definitions to achieve consistency, reusability, and quality of the data that drives or supports the organization’s tactical and strategic business initiatives. A data dictionary can provide:
- Easier integration and communication between systems
- More standardized messaging between applications
- Higher quality business intelligence and analytics
- Better understanding between all subject matter experts
The building blocks of the data dictionary include data elements, data items and their use as entity attributes, entity relationships and the definitions for each. Data dictionaries ensure consistency of use throughout the organization by providing a single version of the truth for all common data elements used across the enterprise.
When developing a data dictionary, there are a few key considerations that need to be taken into account. As the content of the data dictionary matures over time, version control becomes important. Keeping track of what version of the data dictionary a project has derived its data implementation from will help others to understand why definitions may differ compared to other implementations, and how to rectify that in integration, federation and business intelligence transformation design projects. Repositories will be very helpful in managing versions and configurations of models, to ensure that old versions remain referenced.
Data dictionaries will transform over time, and a good design will accommodate that. Best practices include not overloading any single concept, making it easy to reuse and easy to maintain. Also, if a concept becomes obsolete, do not reuse it to avoid confusion with legacy implementations. It is important to allow for any data concept to expand over time, and therefore important to have a clear alignment between data entities and their attributes. Use this process of ‘normalization’ to achieve this alignment.
Requirements for the Data Dictionary
Preparing the data dictionary needs to take certain key criteria into account. The following list identifies the most common requirements for supporting the storage and maintenance of the data dictionary, but additionally includes the workflow and processes that support the creation and management of the information over time.
A data dictionary must contain:
- A unique list of entities and data items
- Descriptions of data artifacts
- Entity-Attribute relationships resulting from a data item being assigned to describe an entity or entities
- Entity to entity relationships
- Archived obsolete concepts based on the changing business
- Synonyms or aliases for data concept
Entity-Attribute Relationships: The Conceptual Data Model (CDM) is designed to have a list of data items independent of entity attribute participation. This can be taken advantage of in a number of ways: by disallowing reuse, we can be sure that a data item represents an entity attribute only once. We can scan the model for data items that are not yet participating as an entity attribute, this exercise assists in developing a normalized model, and can also indicate where data concepts are not fully completed.
We can allow for reuse, allowing for the duplication of a data item concept on more than one entity as an entity attribute, either as a copy (two data items with the same name, but completely divergent definitions) or as a reused data item (one data item participating as an attribute equally to more than one entity, with one common name and one common definition). The choice of which method to follow largely depends on the definition of the corporate dictionary itself (is a name a name, whether it is a customer name or a company name, or are all data items uniquely identified by their business name, and as such will not have a redundancy possibility).
Maintain Relationships Between Entities: The CDM allows for easy definition of relationships between entities, supporting one-to-one, many-to-one and many-to-many relationships as well as super type/subtype relationships using a number of industry standard notations including IDEF 1/x, IE, and Barker’s notation.
Archive Obsolete Concepts: If the modeling tool being used provides a robust enterprise repository that backs the metadata lifecycle, we can delete the concept from the current model or models, and allow the past existence to remain in past versions within the repository. This will require a past version comparison to catch the object to avoid re-defining the concept again in a future version of the dictionary, but it will keep the current model clean. A better way would be to tag or contain the object to keep it present, but “off to the side.”
Allow For Data Concept Expansion (more attributes, more relationships, versioned): The enterprise repository should allow for versioning of all model objects at the object level, and for comparison between any two versions of models and objects within them. As the concept expands, new details can be saved in a new version of the object, keeping the older versions around in case there are version-specific dependencies on the implementation side to keep track of. When a specific set of models needs to be used together at a specific version of the data dictionary model or models, the use of a repository configuration feature ensures proper version matching.
Represent Data Construct In A Normalized Model: The CDM should allow for easy normalization, independent of logical and physical details. The modeling tool chosen should be designed to view the conceptual data model as a level of abstraction above the storage paradigm and the implementation platform. The conceptual data model is not a relational model (does not have migrated foreign keys, allows for unresolved many-to-many relationships) and can therefore be normalized in a way that is consistent with the pure data in concept, not in implementation.
For example, an “order” concept should not have a “customer identifier” attribute, as this attribute is not really functionally dependent on the “Order Identifier” itself, but is a construct inserted to implement the relationship (i.e.: a place to store the data that defines the implementation of this relationship concept within a relational database) and can confuse the analyst looking at “Order” for its pure data definition.
Maintain Synonyms Or Aliases For Each Data Concept: Each organization has its own way to define and describe alternate names for the same common concept. An example is “customer”, “prospect”, “client” and “patient” from the sales, marketing, service and medical practitioner views respectively. All 4 names refer to the same common data concept, a person that is receiving goods and services from a vendor. The entity attributes will all be essentially the same, the core concept is the same, but what we think of it as will differ.
To develop the data dictionary with the least redundancy we should really build one concept, and assign that concept multiple names. A modeling tool should be able to manage the situation if there is a fixed number of aliases (zero, one or more, but a fixed limit) by adding extended attributes. If there is an unknown number of aliases (zero or more, no upper limit), create an extended object to contain the alias as an object, and create an extended collection of aliases on a given data concept.
Limit the use of aliases to entities only, or extend it down to the data item level. The choice for implementing this concept will be based on the business vocabulary being mapped to the dictionary, and if the idea of name “overloading” can be extended to the attributes of an entity as well (for example, “customer ID” and “patient ID” for the same customer concept’s identification attribute).
Provide Metadata Governance To Enforce Rules And Practices To Increase Consistency Of The Dictionary As A Standard: Metadata Governance ensures a consistent use of metadata throughout the organization. The right metadata management tools will have the ability to integrate any number of custom checks and balances to enforce standard definitions, while the people and policies shape how the data stewards and other professionals involved in the metadata gathering and validating activities will operate. Metadata governance ensures the enterprise data dictionary provides value to the business, is consistent and reliable in quality and availability, and truly provides the enterprise scope and context. It is too easy to allow the context of a given project or given business unit (or even workgroup within a business unit) to influence the data dictionary and block easy reuse, cross-department and cross-discipline understanding, and easy integration of data systems. Data Governance (and the governance of metadata) forces us to look beyond our context into the enterprise context, giving us the perspective, the guidance, and discipline needed to develop the data dictionary as an enterprise asset and not just a tool for today’s project.
Data dictionaries ensure consistency of use throughout the organization by providing a single version of the truth for all common data elements used across the enterprise. Following some best practices and using some proven technologies, along with focusing on the business needs of the organization for standard data contextual information can result in a sustainable enterprise data dictionary that will be used for a long time.