Skip to content.

Sections

Should We Model Everything?

by Dan Roth

The level of attention paid to a model should be determined by its intended use and place in the enterprise. There are different types of models that represent different types of data, all of which should be "touched" by data administration. The actual procedures depend upon the availability of modeling resources, but every model should, at a minimum, be checked for naming standards, common formats and bundled elements. All requests should be evaluated to make sure we are not unnecessarily creating redundant data stores. These steps are important because often times models that at first appear to be at a purely application level end up containing data that could easily become a corporate data store. Checking for these issues is one of the main roles of data administration.

Enterprise Source Data

Enterprise source data is data defined at its owning source. This data is used by the business and information systems to make the decisions that directly affect the business. This data is highly shareable therefore there should only be one Enterprise Source for each distinct piece of business data. It should have a complete logical model, follow all data modeling rules, be in third normal form (or close to it) and contain no bundled elements. Because of it’s importance to the business all attributes should be completely defined and the definitions should be approved by the business. A full peer review by the data administration team should occur. This is the data at it’s source therefore it should be designed in a way to ensure that it will be shareable across the enterprise. An example of enterprise source data would be customer address data or detailed product information.

Reporting Data

Reporting data contains enterprise source data or transaction data that is extracted and summarized in order to provide queries or reports to answer business questions. This is the data being used by the business to make decisions. The decisions made by analyzing this data directly affect the business but the data itself indirectly affects the business. There are two types of reporting data. The first, and more common, type is a small table or set of tables used to answer a specific business questions (sometimes called a data mart). The second would be a large store of analytical data extracted from many enterprise sources (sometimes called a data warehouse). This type of data can be analyzed directly at its source or extracted and summarized into smaller reporting data stores (data marts).

Reporting data has different requirements and therefore should not be held to the same measures as enterprise source data. While we should create a logical model, the very nature of reporting data assumes denormalization. If we enforce third normal form on reporting structures we will negatively impact the usefulness of the end product. Reporting data is typically derived from previously modeled data therefore we are not creating any new corporate data sources. It is more important that the data is designed to reflect the access paths required (what questions are we trying to answer).

Any new attributes created or derived should be given complete definitions which would include any calculation or summarization rules. For large reporting models a full DA peer review should occur, for smaller models a review by one knowledgeable DA should be appropriate.

Transaction data

Transaction data is a temporary hold area used to store transactions entered by a customer or department until it can be posted against shared data stores. Transaction data should be stored in a format that closely resembles the source. We should still enforce naming and format and avoid bundled elements but we should not be overly concerned with fully normalizing transaction data. In some cases normalizing transaction data could be detrimental to the business. If we are being fed transactions from an outside source it is probably important to be able to completely recreate the transaction as it was received. The attributes on a transaction are logically related to each other because of their relation to the transaction and therefore may exists on a single entity. Dividing up and normalizing transaction data only causes confusion and application nightmares with no real business gain.

Small or one-time use (non-enterprise) data

This data is requested by a department for a special project outside of the normal company business. An example would be the temporary hold area for online United Way donations or employee ARTS fund auction bidding. It contains data that is not, or never could become, Enterprise Source Data. It will be used for a limited timeframe for one specific function. Information Architecture should touch this data solely to ensure it is not enterprise source data or does not become enterprise source data. Small requests should be turned around in a relatively short time frame (less than 10 simple tables could be turned around in less than one business day).

Department Level Data fed from an unmodeled source

This data is requested from an application team supporting unmodeled data. The source data may live in VSAM, IMS, flat files, or any other storage method. The team may want to take advantage of some of the benefits of relational technology but is not ready to re-engineer the source data. In this case, if the source data is replaced the data being modeled will also need replaced. The data is small in scope and meant to fulfill one particular business need, the data is all derived data from the unmodeled source but not a replacement to the unmodeled source.

The main benefit of modeling this data is that it can later be used as documentation to assist in redesigning the source data. Therefore, it is important that we unbundle wherever practical and insist on following any accepted naming standards.

The difficult part of the above guidelines is determining where the requests fit. It is not always obvious which type of model we are working on. Some times the request may contain entities that fit in different categories. Pay the must attention to the pieces of the model that are most likely to be shared. When in doubt, model at the higher standard.

In summary, I believe that the data administration group needs to "touch" every request for new tables. Each request needs to be evaluated for completeness and redundancy. So my answer to the question is yes. I believe everything should, at some level, be modeled.

Model Type

Create a Logical Model?

Normalization

Definitions of Attributes

Level of Data Analyst (DA) Review

Enterprise Source Data

Yes

Third Normal Form

Complete and Accurate- Approved by the Business

Full Review by Data Analysis team

Reporting Data (Small)

Yes

Modeled to Requirements

Definitions of any new attributes – including calculation or summarization rules

Distribute documentation to the team – review with one other DA

Reporting Data (Large)

Yes

Modeled to Requirements

Definitions of any new attributes – including calculations or summarization rules

Full Review by DA team

Transaction Data

Yes

Modeled to Requirements

Definitions of any new attributes

Distribute documentation to the team – review with one other DA

Small or one-time use (non-enterprise)

Not needed

Modeled to Requirements

Define enough to ensure that the intended use is evident

Review with one other DA

Department Level Data – fed from an unmodeled source

Yes

Goal of Third Normal Form (may have practical exceptions)

Definitions of any new attributes – including calculation or summarization rules

Review with one other DA