Logical and physical data modeling are essential components of every organization’s enterprise data architecture and should form the foundation of every database design. Standard techniques for logical and physical data modeling enable consistent development and usability.
Even though the concept of data modeling has been around for a long time, in many organizations it is interpreted differently. A data model is a set of data specifications and related diagrams that reflect data requirements and designs. Generally, conceptual data modeling and logical data modeling are requirements analysis activities, while physical data modeling is a design activity. All are parts of an enterprise data model.
The Logical Data Model refers to a higher level of data – the business data. Once the business data requirements are known and modeled in a logical data model, the physical data model and database design can proceed. However, if there is no clear understanding of the business requirements all the design and implementation work in the world will not result in a good quality application.
Data is the most important part of any application or system. A good, strong, accurate data structure allows application developers to design any processing, user interface, reporting, or statistical analysis ever needed. The only things more important in developing application systems than a quality data structure are the business rules and requirements. The most elegant, highly technical system in the world will be abandoned if it does not meet business requirements. Therefore, logical data modeling combines the two most important components of application development: business requirements and quality data structure.
What is a logical data model?
A logical data model is a graphical representation of the information requirements of a business area; it is not a database.
The word logical is critical because it modifies the phrase data modeling to a more specific activity. A logical data model is independent of a physical, data storage device (database management system, file system, etc). This is the key concept of the logical data model. A logical data model must be independent of technology since technology changes so rapidly.
It is dangerous for application developers to continue to gather requirements with a specific technology in mind. The problem with legacy applications continues to grow and multiply in complexity. Applications should be as independent of technology as possible. Build a system that is independent of technology; isn’t this an oxymoron? Yes and No. There are components of an application that are linked intimately to the technology: the programs, the database management systems, and the screen components. There are also components of the system that can be technology independent: the logical data model, the logical process model, the business rules and other relevant business metadata. These components are connected to the business, not the technology.
Most business areas do not change as rapidly as technology. Think of industries that have been in business for 100 or 200 years. An insurance company provides coverage, receives payments, and pays claims; and has done so for its entire history. They performed this business without computers and now perform it on mainframes, minicomputers, PCs, networks, and the Internet. What the business does has not changed, but how the work is accomplished has changed significantly. This difference between WHAT the business requirements are and HOW they are accomplished describes the difference between a logical data model and physical database.
The logical data model is a picture of all the pieces of information necessary to run the business. The logical data model is built using an Entity Relationship diagram (ERD) – a standard modeling technique used by data modelers around the world. Entity relationship diagramming is a structured technique used as a communication tool. It includes the complete business requirements without reference to any technical components.
The components of a logical data model include Entities, Relationships, and Attributes. Each Entity represents a set of persons, things, or concepts about which the business needs information. Each Relationship represents an association between two entities. Each Attribute is a characteristic or piece of information that further describes an entity. A name and a textual definition describe each of these components. These name and definitions provide ongoing documentation of the business rules and information requirements of the business area. They describe what the business requirements are, not how they are implemented, stored, or processed.
Who uses the logical data model?
In most organizations, the logical data models are the responsibility of a Data Architecture team, with input from each business unit. Each business unit has subject matter experts who describe their data requirements to the data modeler and review the models created. They use the models for impact analysis of changes to business requirements. The Data Modeler conducts facilitated sessions with subject matter experts to gather the data requirements and build the logical data model. The data modeler also works with the process analyst to link data with processes. The data modeler is responsible for getting approval of the logical data model from the subject matter experts and then works with the DBA to transition the logical model to the physical model. In many cases, the logical data modeler and the process analyst are the same person, but different tools and techniques are used in data and process modeling. An expert logical data modeler is trained in gathering rules and data and synthesizing the essential elements from the non-essential to display the business needs in an Entity Relationship diagram and supporting documentation. An expert data modeler’s skills are independent of any specific database management system’s techniques.
The Database Administrator (DBA) builds the physical data model from the logical data model. To create a good quality database design, the database administrator reviews the logical model to select technology-appropriate keys, create indexes, detail data types, and build referential integrity to protect the data values. The database administrator may de-normalize the database for efficiency. DBA’s also are responsible for writing stored procedures, triggers, maintaining referential integrity, and monitoring database performance.
What is the difference between a logical data model and a physical database design?
Reviewing the information in the table below, one could gloss over the importance of the differences between the models. Many people carelessly interchange the word entity with the word table. However, the distinction is significant. An entity represents a set of persons, things, or concepts that are important to the business. A table is a data structure that contains data values and is physically stored on a medium using a specific DBMS. The difference is the link to technology. An entity has no link to technology; a table is a technology.
Also, consider the difference between a unique identifier in the logical model and a primary key in the physical model. A unique identifier represents the data element(s) used by the business to discriminate between one occurrence of an entity and another. How do you tell the difference between one CUSTOMER and another? A subject matter expert may answer this question: ‘Customer Name’. This is an important business requirement and must be captured as such.
However, most database designers would not select Customer Name as a primary key. In the physical model, a primary key, such as a computer-generated Customer ID, is selected because it uniquely identifies each row in the table and can serve as a foreign key in related tables.
|THE LOGICAL MODEL||THE PHYSICAL DATABASE DESIGN|
|Includes all entities, relationships, and attributes (and their information types) whether supported by a technology or not.||Includes tables, columns, keys, data types, validation rules, DB triggers, stored procedures, domains, and access constraints (security).|
|Uses business names, regardless of length.||Names may be limited by the DBMS.|
|Captures and records information necessary for the business.||Includes technology-specific data elements such as flags, switches, and timestamps. Includes unique identifiers.|
|Includes primary keys, foreign keys, and indices for fast data access. Is normalized to at least 3rd normal form.||May be de-normalized to meet performance requirements.|
|Does not include any redundant data.||Does not include any redundant data. May include redundant data elements.|
|Does not include any derived data.||May include results of complex or difficult to recreate calculations.|
|Business experts drive the model||DBAs drive the database|
Why build a logical data model?
The most important reason to build a logical data model is to confirm the users’ and analysts’ understanding of the business requirements to assure that the system developed satisfies the business need. Logical data modeling provides the analyst with a tool and technique to conduct analysis. Most subject matter experts can articulate problems, and often, solutions. Unfortunately, their problems and solutions are often based on current system constraints, not true business needs. Assuming perfect technology forces users and analysts to look beyond the current system limitations. Asking the business people to detail every piece of data (attributes), requires them to understand and articulate every aspect of their business. This process allows the business to drive the application / solution, not the other way around. It also stimulates discussion and thoughts. By identifying and detailing data in a model, further requirements and problem areas arise simply because of discussing and reviewing the model.
A logical data model is a foundation for designing a database that supports the business requirements. Database designers start their design with a complete picture of the business requirements and can then determine the best implementation approach. Doing so allows designers to use their expertise in data access paths, data distribution and placement, and access efficiency to create a database that will satisfy business requirements for years to come.
A logical data model also facilitates data re-use and sharing. Data is stable over time; therefore, the model remains stable over time. As additional project teams define their needs, they can re-use model components that are shared by various parts of the business. This leads to physical data sharing and less storage of redundant data. It also helps the organization recognize that information is an organization-wide resource, not the property of one department or another. Data sharing makes the organization more cohesive and increases the quality of service to outside customers and suppliers.
Building and maintaining a logical data model decreases system development and maintenance time and cost. Identifying all business requirements and rules at the beginning of a project makes the design, coding, testing, and implementation phases go much smoother and faster. A model is easier and cheaper to modify during the development life cycle. Mistakes, missed data, and misinterpretations are less costly when corrected in a model than in an implemented application. It also decreases user requests for changes.
When changes are necessary, the logical model can be used for impact analysis. A logical data model, documented in an Entity Relationship diagram, is a picture of the business area. Every entity, attribute, and relationship created and defined in the model is a piece of the resulting system documentation. The objects in the model contain textual definitions that describe their characteristics in business language. Because of data modeling, system documentation can become easier to produce.
A logical data model confirms a logical process model and provides documentation of the information requirements of the business area for ongoing impact analysis. Each business process and rule is tied to the logical data model to assure that all data and process model components have been discovered.
Issues When a Logical Data Model is not Developed
When users are not asked to focus on data as a critical part of a new system design, they talk about processes and activities. They may forget to tell designers about all data requirements. Traditionally designers create tables and files based on screen and report layouts, searching out data elements as they go. These data elements are not well organized or structured properly. Designing a model based on physical workflow could result in a model that does not fully represent the business requirements. This can occur when the technology of the workflow is forced into the physical model. Often the result is a database that is missing critical data and must be changed immediately after implementation.
The database design task is a much longer activity and the database may be poorly structured without a logical data model as a guide. If not all business data requirements have been thoroughly identified and defined, database designs will be unstable throughout the development process. During coding, testing and even implementation, developers can find additional data elements, requiring the DBAs to be re-active instead of pro-active. Resulting non-modeled databases may be poorly structured, looking like a patchwork quilt, rather than being well planned and easy to maintain. Errors in the database design will cause the entire system to be unusable.
System documentation may not be created, or it may be sparse, or it is textual vs. text with graphics and difficult to maintain. User requirements are lengthy, textual documents that are time consuming to review. Without a logical model to refer to, data definition language must be referenced when planning system enhancements. DBMS systems often limit the length of column names making them difficult to decipher.
An ERD is a structured, graphical document that describes the business supported by the system. It can be used to identify business rules, which must be fully documented before application development can begin.
Logical data modeling forces analysts to think about the current business requirements, independent of technology, thereby highlighting opportunities for business process improvement rather than simply automating an existing procedure or recreating a legacy system on a new platform. Users who understand their current systems can prevent analysts from identifying incorrect business requirements. The main reason for missed target dates is a poor understanding of the business requirements. Building a complete, essential, logical data model (and linking it to an essential process model) forces the analysts and the business users to completely describe all information requirements of the business area. Without this rigorous analysis, data elements will be missed or defined improperly, causing poor database design and ineffective application development efforts.
A logical data model is a crucial and essential part of every application development project and should proceed every database design. A logical data model should form the basis for a physical data model and serve as the foundation for recording business requirements and metadata. Using data modeling standard notation and applying proven techniques for data modeling will provide organizations that perform logical data modeling with many tangible and intangible benefits.