Abstraction is a powerful design tactic for creating flexible, robust and scalable data warehouse data models
I remember meandering through the large galleries of a modern art museum, and stopping in my tracks in front of a very large canvas painted completely red with a single white dot in the center. The title was “City Skyline”, and as hard as a squinted or twisted my head, I just couldn’t see anything but a large red canvas with a white dot in the center. This was an abstract painting of a detailed city skyline yet was so generic and vague it could have been a painting of a tiger, a summer beach scene, or anything else for that matter. Abstraction is a tool that lets the artist efficiently capture and represent complex topics. As data modelers we are all artists to an extent, and abstraction is an efficient tool at our disposal as well.
The data modeler is responsible for not only correctly representing the requirements of an application, but also having the foresight to design flexible structures in areas where requirements might change in the near future. There are certain designs we create such as for the Data warehouse where flexibility can make the difference between success and failure of the entire project. The data warehouse is a living design, one that is constantly being expected to easily accommodate new requirements, yet at the same time minimize design changes that impact existing downstream data marts. A very good design tactic I have used over the years to accommodate this need for flexibility is abstraction.
Abstraction is a very important tool for data modelers to be aware of and use appropriately.
For more on abstraction and other modeling techniques, please refer to my book, The Data Modeler’s Workbench, Tools and Techniques for Analysis & Design. Also, if you are up for a data modeling challenge, sign up for my Design Challenges on my web site (www.stevehoberman.com). I send out periodic complex data modeling scenarios followed by several possible solutions.
What is Abstraction?
Abstraction, which we will discuss in detail in the sections that follow, is a method to bring flexibility to your logical and physical data models by redefining and combining some of the data elements, entities, and relationships within the model into more generic terms. Abstraction is the removal of details in such a way as to make something applicable to a wide class of situations, while preserving the important properties and essential nature from concepts or subjects. By removing these details, we remove differences and therefore change the way we view these concepts or subjects, including seeing similarities that were not apparent or even existent before. For example, instead of having separate Customer and Employee entities, there might be situations where it would be valuable to have the generic Person entity, which groups both customers and employees together.
Let’s go through a more detailed example. Say you are creating a data model for the owner of a tuxedo shop. You have already completed a fully normalized design for this project. One of the main entities on your design is Customer. The Customer entity on your data model includes a number of data elements including the following:
- Customer waist size
- Customer leg length
- Customer neck size
- Customer arm length
- Customer shoulder width
The owner of the tuxedo shop is very happy with your design and his application is developed based on this model. Two months later you receive a phone call from the owner of the shop, asking where he can store the customer’s shoe size. You explain to him there is no place to put the shoe size, as that was not part of the original requirements. There is a long silence. The owner of the shop explains that he is expanding his business and many of his customers who rent tuxedos will require shoes. You explain how much money and time it would take to make this change to his application. Again there is a long silence. Then you hear a few clicks as the owner hangs up the phone in disgust.
To paint a brighter picture, let’s say you realize that all of the data elements around leg, waist, neck, arm and shoulder measurements are really just a person’s apparel measurements. Instead of listing each measurement as a separate data element, you abstract this concept in your data model to just two data elements:
- Customer Measurement Number
- Customer Measurement Type Name
The Customer Measurement Number data element includes any size value such as:
- 17 1/2
And each of these values has a type, captured in Customer Measurement Type Name. Such as:
- 42 = Arm length
- 17 1/2 = Neck size
- 34 = Leg length
A Customer can have one or more of these measurements. Then when we get the phone call from the tuxedo shop owner requesting where shoe size should be stored, we simply add a new row in the Measurement Type table called “Shoe size” and the owner can now enter his shoe size values with this shoe size type. This flexibility in our design has led to effortlessly supporting this new requirement.
Benefits Of Abstraction
Abstraction offers the following benefits:
Provides greater flexibility. Flexibility is the primary benefit of abstraction. We are able to design our structures to better handle the unknown. New situations can arise that we did not originally anticipate, and if we use abstraction these situations may require little if any changes to our data model and therefore keep the database and code intact. In a recent design, I decided to use abstraction in the part of the model that captured sales organization information. I abstracted all of the relationships between salespeople and their positions. Even though we did not immediately need all possible combinations of relationships, my abstracted model would handle them if they arose. Sure enough, shortly after the sales organization subject area went into production, we had the need to add a new type of relationship. Our sales organization structure was able to handle this new requirement without any changes to the data model and hence keep the database structure and code intact.
Reduces analysis and design time. Once we are familiar with abstraction, we can apply it to areas of our model before completing a thorough analysis of the requirements. In the example mentioned previously, instead of taking the time to capture all of the additional types of measurements for a tuxedo fitting, we can simply add the measurement number and type data elements and have all possible measurements automatically covered within your model. Within two weeks, I was able to create a model showing the entire manufacturing industry by using a substantial amount of abstraction. Instead of spending significant amounts of time analyzing each data element, I abstracted to make sure any possible requirement was handled within the model. Realize that you might save time here but you can also forfeit an understanding of the business requirements.
Costs Of Abstraction
Abstraction might sound like the cure for everything from the common cold to world peace (slight exaggeration). However, its benefits come with a price:
Hides business concepts on the model. When we just have Measurement Number and Measurement Type Name in our data model, we can no longer see the individual tuxedo fitting measurements. Customer Arm Length is no longer a data element we can clearly identify and explain, but a value within the Measurement Number data element. When we abstract we lose the actual business data element name. The data model however, still contains the same information. These measurements become row values instead of individual data elements. Six months after we create the tuxedo data model with abstraction, someone might ask us, “Hey, what does Measurement Type mean?” What sounded like a great idea when we were doing the data model, six months later might no longer make sense. So there is a task we need to do even more rigorously than before. We need to create very good definitions.
This way down the road when you or someone else needs to understand what Measurement Type Name encompasses, they can view all of the values of this code within the definition. You can also use metadata helper entities to capture this important name and definition meta data, which will be discussed in a future article.
Leads to indulgence. Too much abstraction can give us a design totally meaningless in understanding and impossible to develop. Let’s say we continue to abstract the tuxedo data model. Eventually, we will wind up with a single entity called “Thing” with a recursive relationship, as shown in Figure 1. Imagine the power of this entity! We can represent anything with this entity. In fact, we can represent an entire Customer Relationship Management (CRM) application using this single entity! However imagine explaining this model to someone in our business area. Imagine trying to write the code that uses this entity. Now the “Thing” concept is a bit of an exaggeration, but I have seen an item data model where abstraction was taken to the point where it was difficult to find anything business related on the model. It was also very difficult to walk a business user through this model.
Lengthens development time. Although we can decrease analysis and design time using abstraction, we increase development time. It takes longer to write code and logic against row values than against columns. For example, it would be more complex and take longer to write the code to populate or select information out of the Measurement Number and Measurement Type Name data elements than it would be if we had the specific measurement data elements at our disposal. We would need to have more complicated query statements. Without abstraction, our queries would be much simpler and easier to create. For example, without abstraction, returning the name and leg length of all of the customers with a waist size of 42 would look like this:
SELECT CUSTOMER_FIRST_NAME, CUSTOMER_LAST_NAME, CUSTOMER_LEG_LENGTH
CUSTOMER_WAIST_SIZE = ’42’
It is very easy and straightforward to create this query. On the other hand, using abstraction to get the same results might look like this:
FROM CUSTOMER, MEASUREMENT_TYPE
CUSTOMER.CUSTOMER_IDENTIFIER = MEASUREMENT_TYPE.CUSTOMER_IDENTIFIER
AND MEASUREMENT_TYPE.MEASUREMENT_TYPE_NAME = ‘Customer waist size’
AND MEASUREMENT_TYPE.MEASUREMENT_NUMBER = ’42’
Note that we need to do an additional table join on the customer’s identifier. In addition, abstract queries can be much more complicated than this example, especially if we have a separate code table or lookup table for the different measurement type names (e.g. 01 = Customer Waist Size).
Increases reporting difficulty and complexity. There are some reporting tools on the market today that have a very difficult if not impossible time extracting data from abstract structures. There have been several cases where I introduced abstraction into the logical and physical data models, and then had to reverse this decision because the reporting tool, and the query language in general, had difficulty retrieving information from these structures.
This article provided an overview to abstraction along with its pros and cons. Abstraction is such a powerful technique, that the modeler can easily abuse it. I have some guidelines to make sure we only use it where most appropriate. We want to avoid the “Thing” model that is really nothing more than a large red canvas with a white dot in the center.