Physical Modeling
by Dan Roth
Physical modeling involves transforming the logical model from a purely business design to a design optimized to run in a particular environment. Things that must be considered when doing physical modeling include the specific RDBMS, the hardware environment, the data access frequency and the data access paths. Physical modeling involves adding properties, such as space, free space and indexes. It may also involve altering the logical structures. I am going to focus on the physical changes you may make to the logical structures. The three main reasons to deviate from the logical model are to improve performance improve usability and save DASD. There are many drawbacks to deviating from the logical model, so any changes should not be taken lightly. If significant improvements can be gained by the business we should consider the options.
A Data Base Administrator (DBA) usually creates the actual physical model. There also needs to be involvement from the Data Administrator(DA), the business user, and the applications developer. I am going to step through a real-life example (the names were changed to protect the guilty) of a database design. This example resulted in physical changes to improve performance, improve usability and save DASD. If the tables were physically created in a logically pure format it would have meant many headaches for the company. First I am going to attempt to step you through, at a high level, the logical design then explain the physical design decisions we made.
Joe's Cheesy Collectibles enrolls people in a program where they select a line of collectibles and are sent a different cheesy collectible each month (sort of like those record and tape clubs). The customer may choose from many cheesy collectible lines. The most popular lines are Elvis velvet paintings, Three Stooges picture plates, and Soap Opera statuettes. One customer may enroll in one or many cheesy collectible lines. The customer may also enroll in the same line more than once (can you ever have enough Elvis velvet paintings). Each time the customer enrolls in a new line he will get a new account. Our first three entities would be customer, product line and account. (see diagram 1)
Next we need to know where to ship the product and send the bill each month. In this case a customer can have more than one product line so we need to store the address at the customer level not the account level. This will enable us to share the address across multiple customer accounts. To complicate things, we also allow customers to use more than one address. Maybe Tom Smith wants the Elvis Velvet Paintings sent to his house but wants to hide his obsession with soap opera statuettes from his wife so he has those sent to his office. We will then need to store the address at the customer level but point the account table to the correct address. We could do this by adding an address id to the account table. (see diagram 2)
Let's complicate things a little further (remember this is based on a real life example). We also need to allow a customer to send the shipments to one of his addresses and the bill's to another. This will require a junction table between account and address. This will enable: 1) a customer to have many addresses 2) a customer to have many accounts 3) an account to use many of the customers' addresses.
In reality the account may use two of the customer's addresses. In the junction table we will add an address use code. The valid values would be 1=billing, 2=shipping. (see diagram 3)
One more complication: Customer Tom wants to send the Three Stooges picture plates to his Grandmother but have the bills sent to him. This would require us to set up Grandma as a customer and point the shipments to her. The problem is we can't use the junction between account and address because that would point us to the wrong customer id. We therefore needed to create an optional alternate customer table. We would only need to insert to this table if the shipments are going to a different customer then the bills. This table would need to contain the account number, Grandma's customer id and Grandma's address pointer. (see diagram 4)
Pretty complicated to ship Elvis velvet painting isn't it? That is also what we thought. Even if performance was not an issue the complications involved in maintaining and accessing these tables was not acceptable. We agreed that diagram 4 represented our logical needs but we needed to physically create something more useable. The most obvious change was getting rid of the junction between account and address. We only have two usages of address (billing and shipping). Therefore we decided to add two columns to the account table; bill address number and ship address number. If the customer was sending the bills and the shipments to the same address they could both contain the same number. (see diagram 5)
The biggest drawback of this design is that it limits the number of alternate customers to one and the number of addresses for an account to two. We discussed this with the business and agreed that it would be impractical to have more than one customer at a time designated as the recipient of the shipments.
Now compare diagram 6 with diagram 4.We managed to improve performance, improve usability, and reduce DASD. While we did violate some pure modeling rules we provided a faster, cheaper, and more useable database to the business. It was important to go through this entire process. If you jump right to the physical model without truly understanding the logical model it becomes easy to make mistakes and incorrect assumptions. Keep in mind the words of Albert Einstein. Everything should be made as simple as possible, but not simpler.
About the Author
The author, Daniel Roth, is an Information Architect with experience in health insurance, warehouse/distribution, banking and manufacturing. He has many years of data processing experience as an application programmer, technical support specialist, DBA, and for the last 6 years as a data analyst. He has been the lead architect on large OLTP and OLAP projects. Dan can be reached at rothdaniel@aol.com