Different models represent and communicate different types of data; all are important to effective data management.
Introduction
The level of attention paid to a data model should be determined by its intended use and place in the enterprise. There are different types of models that represent different types of data, all of which should be “touched” by data administration. The actual procedures depend upon the availability of modeling resources, but every model should, at a minimum, be checked for naming standards, common formats and bundled elements. All requests should be evaluated to make sure we are not unnecessarily creating redundant data stores. These steps are important because models that at first appear to be at a purely application level end up containing data that could easily become a corporate data store. Checking for these issues is one of the main roles of a data management function.
Enterprise Source Data
Enterprise source data is data defined at its owning source. This data is used by the business and information systems to make the decisions that directly affect the business. This data is highly shareable therefore there should only be one enterprise source for each distinct piece of business data. It should have a complete logical model, follow all data modeling rules, be in third normal form (or close to it) and contain no bundled elements. Because of its importance to the business all attributes should be completely defined and the definitions should be approved by the business. A full peer review by the data administration team should occur. This is the data at its source, therefore it should be designed in a way to ensure that it will be shareable across the enterprise. An example of enterprise source data would be customer address data or detailed product information.
Reporting Data
Reporting data contains enterprise source data or transaction data that is extracted and summarized in order to provide queries or reports to answer business questions. This is the data being used by the business to make decisions. The decisions made by analyzing this data directly affect the business but the data itself indirectly affects the business. There are two types of reporting data. The first, and more common, type is a small table or set of tables used to answer a specific business questions (sometimes called a data mart). The second would be a large store of analytical data extracted from many enterprise sources (sometimes called a data warehouse). This type of data can be analyzed directly at its source or extracted and summarized into smaller reporting data stores (data marts) .
Reporting data has different requirements and therefore should not be held to the same measures as enterprise source data. While we should create a logical model, the very nature of reporting data assumes de-normalization. If we enforce third normal form on reporting structures we will negatively impact the usefulness of the end product. Reporting data is typically derived from previously modeled data; therefore, we are not creating any new corporate data sources. It is more important that the data has been designed to reflect the access paths required (e.g., what questions are we trying to answer).
Any new attributes created or derived should be given complete definitions that would include any calculation or summary rules. For large reporting models a full DA peer review should occur, for smaller models a review by one knowledgeable DA should be appropriate.
Transaction data
Transaction data is a temporary hold area used to store transactions entered by a customer or department until it can be posted against shared data stores. Transaction data should be stored in a format that closely resembles the source. We should still enforce naming and format and avoid bundled elements but we should not be overly concerned with fully normalizing transaction data. In some cases, normalizing transaction data could be detrimental to the business. If transactions are fed from an outside source it is probably important to be able to completely recreate the transaction as it was received. The attributes on a transaction are logically related to each other because of their relation to the transaction and therefore may exists on a single entity. Dividing and normalizing transaction data only causes confusion and application nightmares with no real business gain, especially when there is conflicting metadata.
Small or one-time use (non-enterprise) data
This data is requested by a department for a special project outside of the normal company business. An example would be the temporary hold area for online United Way donations or employee ARTS fund auction bidding. It contains data that is not, or never could become, enterprise source data. It will be used for a limited time for one specific function. Information Architecture should touch this data solely to ensure it is not enterprise source data or does not become enterprise source data. Small requests should be turned around in a relatively short period (less than 10 simple tables could be turned around in less than one business day).
Department Level Data fed from an un-modeled source
This data is requested from an application team supporting un-modeled data. The source data may live in VSAM, IMS, flat files, or any other storage method. The team may want to take advantage of some of the benefits of relational technology but is not ready to re-engineer the source data. In this case, if the source data is replaced the data being modeled will also need replaced. The data is small in scope and meant to fulfill one particular business need, the data is all derived data from the un-modeled source but not a replacement to the un-modeled source.
The main benefit of modeling this data is that it can later be used as documentation to assist in redesigning the source data. Therefore, it is important to un-bundle wherever practical and insist on following any accepted naming standards.
The difficult part of the above guidelines is determining where the requests fit. It is not always obvious which type of model we are working on. Sometimes the request may contain entities that fit in different categories. Pay the most attention to the pieces of the model that are most likely to be shared. When in doubt, model at the higher standard.
Model Type | Create a Logical Model? | Normalization | Definitions of Attributes | Level of Data Analyst (DA) Review |
Enterprise Source Data | Yes | Third Normal Form | Complete and Accurate- Approved by the Business | Full Review by Data Analysis team |
Reporting Data (Small) | Yes | Modeled to Requirements | Definitions of any new attributes – including calculation or summarization rules | Distribute documentation to the team – review with one other DA |
Reporting Data (Large) | Yes | Modeled to Requirements | Definitions of any new attributes – including calculations or summarization rules | Full Review by DA team |
Transaction Data | Yes | Modeled to Requirements | Definitions of any new attributes | Distribute documentation to the team – review with one other DA |
Small or one-time use (non-enterprise) | Not needed | Modeled to Requirements | Define enough to ensure that the intended use is evident | Review with one other DA |
Department Level Data – fed from an un-modeled source | Yes | Goal of Third Normal Form (may have practical exceptions) | Definitions of any new attributes – including calculation or summarization rules | Review with one other DA |
Conclusion
In summary, all data management teams should to “touch” every request for new tables. Each request needs to be evaluated for completeness and redundancy. So my answer to the question is yes; everything should, at some level, be modeled.