Should We Model Everything?

Article Summary: The article emphasizes that the level of attention given to data models should align with their intended use within the enterprise, highlighting the importance of data administration in evaluating requests to avoid redundant data stores. It categorizes different types of data—enterprise source data, reporting data, transaction data, and small or one-time use data—each with specific modeling requirements, normalization practices, and review processes to ensure data integrity and usability across the organization.

Last Updated: March 19, 2026

Introduction

The level of attention paid to a data model should be determined by its intended use and place in the enterprise. There are different types of models that represent different types of data, all of which should be “touched” by data administration. The actual procedures depend upon the availability of modeling resources, but every model should, at a minimum, be checked for naming standards, common formats and bundled elements. All requests should be evaluated to make sure we are not unnecessarily creating redundant data stores. These steps are important because models that at first appear to be at a purely application level end up containing data that could easily become a corporate data store. Checking for these issues is one of the main roles of a data management function.

Enterprise Source Data

Enterprise source data is data defined at its owning source. This data is used by the business and information systems to make the decisions that directly affect the business. This data is highly shareable therefore there should only be one enterprise source for each distinct piece of business data. It should have a complete logical model, follow all data modeling rules, be in third normal form (or close to it) and contain no bundled elements. Because of its importance to the business all attributes should be completely defined and the definitions should be approved by the business. A full peer review by the data administration team should occur. This is the data at its source, therefore it should be designed in a way to ensure that it will be shareable across the enterprise. An example of enterprise source data would be customer address data or detailed product information.

Reporting Data

Reporting data contains enterprise source data or transaction data that is extracted and summarized in order to provide queries or reports to answer business questions. This is the data being used by the business to make decisions. The decisions made by analyzing this data directly affect the business but the data itself indirectly affects the business. There are two types of reporting data. The first, and more common, type is a small table or set of tables used to answer a specific business questions (sometimes called a data mart). The second would be a large store of analytical data extracted from many enterprise sources (sometimes called a data warehouse). This type of data can be analyzed directly at its source or extracted and summarized into smaller reporting data stores (data marts) .

Reporting data has different requirements and therefore should not be held to the same measures as enterprise source data. While we should create a logical model, the very nature of reporting data assumes de-normalization. If we enforce third normal form on reporting structures we will negatively impact the usefulness of the end product. Reporting data is typically derived from previously modeled data; therefore, we are not creating any new corporate data sources. It is more important that the data has been designed to reflect the access paths required (e.g., what questions are we trying to answer).

Any new attributes created or derived should be given complete definitions that would include any calculation or summary rules. For large reporting models a full DA peer review should occur, for smaller models a review by one knowledgeable DA should be appropriate.

Transaction data

Transaction data is a temporary hold area used to store transactions entered by a customer or department until it can be posted against shared data stores. Transaction data should be stored in a format that closely resembles the source. We should still enforce naming and format and avoid bundled elements but we should not be overly concerned with fully normalizing transaction data. In some cases, normalizing transaction data could be detrimental to the business. If transactions are fed from an outside source it is probably important to be able to completely recreate the transaction as it was received. The attributes on a transaction are logically related to each other because of their relation to the transaction and therefore may exists on a single entity. Dividing and normalizing transaction data only causes confusion and application nightmares with no real business gain, especially when there is conflicting metadata.

Small or one-time use (non-enterprise) data

This data is requested by a department for a special project outside of the normal company business. An example would be the temporary hold area for online United Way donations or employee ARTS fund auction bidding. It contains data that is not, or never could become, enterprise source data. It will be used for a limited time for one specific function. Information Architecture should touch this data solely to ensure it is not enterprise source data or does not become enterprise source data. Small requests should be turned around in a relatively short period (less than 10 simple tables could be turned around in less than one business day).

Department Level Data fed from an un-modeled source

This data is requested from an application team supporting un-modeled data. The source data may live in VSAM, IMS, flat files, or any other storage method. The team may want to take advantage of some of the benefits of relational technology but is not ready to re-engineer the source data. In this case, if the source data is replaced the data being modeled will also need replaced. The data is small in scope and meant to fulfill one particular business need, the data is all derived data from the un-modeled source but not a replacement to the un-modeled source.

The main benefit of modeling this data is that it can later be used as documentation to assist in redesigning the source data. Therefore, it is important to un-bundle wherever practical and insist on following any accepted naming standards.

The difficult part of the above guidelines is determining where the requests fit. It is not always obvious which type of model we are working on. Sometimes the request may contain entities that fit in different categories. Pay the most attention to the pieces of the model that are most likely to be shared. When in doubt, model at the higher standard.

Model Type	Create a Logical Model?	Normalization	Definitions of Attributes	Level of Data Analyst (DA) Review
Enterprise Source Data	Yes	Third Normal Form	Complete and Accurate- Approved by the Business	Full Review by Data Analysis team
Reporting Data (Small)	Yes	Modeled to Requirements	Definitions of any new attributes – including calculation or summarization rules	Distribute documentation to the team – review with one other DA
Reporting Data (Large)	Yes	Modeled to Requirements	Definitions of any new attributes – including calculations or summarization rules	Full Review by DA team
Transaction Data	Yes	Modeled to Requirements	Definitions of any new attributes	Distribute documentation to the team – review with one other DA
Small or one-time use (non-enterprise)	Not needed	Modeled to Requirements	Define enough to ensure that the intended use is evident	Review with one other DA
Department Level Data – fed from an un-modeled source	Yes	Goal of Third Normal Form (may have practical exceptions)	Definitions of any new attributes – including calculation or summarization rules	Review with one other DA

Conclusion

In summary, all data management teams should to “touch” every request for new tables. Each request needs to be evaluated for completeness and redundancy. So my answer to the question is yes; everything should, at some level, be modeled.

Daniel Roth

Daniel Roth is an Information Architect with experience in health insurance, warehouse/distribution, banking and manufacturing. He has many years of data processing experience as an application programmer, technical support specialist, DBA, and most recently as a data architect. He has been the lead architect on large OLTP and OLAP projects.