Skip to content.

Sections
Home » Resource Center » Real-World Decision Support (RWDS) Journal » July 2002 - Volume 1, Issue 16 » The Power of Abstraction - Part 2 of a series on abstraction

The Power of Abstraction - Part 2 of a series on abstraction

by Steve Hoberman

Steve Hoberman

In my last article I described a very abstract painting consisting of a white dot on a red canvas that represented a city skyline. I concluded that abstraction is a tool that lets an artist efficiently capture and represent complex topics in a generic way. As data modelers, we are all artists to an extent, and abstraction is an efficient tool at our disposal as well. We explained abstraction in our last article including its benefits and risks. To maximize the benefits and minimize the risks of abstraction requires applying certain guidelines that are the subject of this article.

This article is part of a series on abstraction. Here are the future topics that will be covered:

  • Use of abstraction in our data warehouse architecture. I'll discuss those areas within our architecture where using abstraction is valuable and those areas where using abstraction is not valuable.
  • Use of meta data helper entities. I'll show an approach to capturing definitions and business rules on an abstracted model and therefore in the corresponding database.
  • Reusable abstract entities. I will share the abstract entities I use most often.
  • Reusable abstract relationships. I will share the abstract relationships I use most often.
  • Reusable abstract data elements. I will share the abstract data elements I use most often.

The guidelines on when to abstract take the form of the following three questions. We can apply these three questions to the entities, relationships, and data elements on our model.

  1. Does this data element, entity, or relationship have something in common with one or more other data elements, entities, or relationships?
  2. If yes, are there concrete situations that would warrant an abstract structure?
  3. If yes, is the extra development effort now, substantially less than the development effort down the road?

The first question addresses commonality, the second question addresses purpose, and the third question addresses effort. Let's go into more detail on each of these.

Commonality

The first of these three questions asks, "Do we have a match anywhere on our model?" Did we find two or more entities, relationships, or data elements that appear to share a common trait? This is the detective work when we scan our model in search of anything that might appear to match something else. Remember playing the card game Concentration when you were a child? You would put a deck of picture cards face down on a table and then turn over two to see if they match. It is the same concept on the data model. For example, we can take the concepts on a data model and list them in much the same way as the Concentration cards on the table. See Table 1.

Table 1 - Can You Find A Match?

Order Date Supplier Actual Delivery Date
Carrier Ship Date Vendor
Product Size Product Brand Product Dimension

Which of these concepts match? I noticed a few date data elements that might make a good abstraction candidate. Order Date, Ship Date, and Actual Delivery Date have in common that they are dates around the order lifecycle process. Also Carrier, Supplier, and Vendor are all entities that share the common trait that they might be external companies. Product Size, Product Brand, and Product Dimension all appear to be different ways of classifying or categorizing a product. First we list these in a table format to more easily see their similarities. This format is a useful way to view and validate that we have grouped similar concepts together. See Table 2.

Table 2 - Commonality

Concepts that are similar Reasons they are similar
Order Date
Ship Date
Actual Delivery Date
Dates around the order lifecycle
Carrier
Supplier
Vendor
External companies
Product Size
Product Brand
Product Dimension
Ways of classifying products

Purpose

Now that we have found several potential candidates for abstraction, we need to ask for each of these, "Are there concrete situations that would warrant an abstract structure?" In other words, is there value in abstracting? Many times when we abuse abstraction and over-abstract, we avoid asking this question. We usually find a match and then abstract without regard for the usefulness of the abstraction. "Somebody will use this flexibility someday" might be how we rationalize the abuse of abstraction. But because abstraction costs understanding and clarity in the model, as well as extra development effort, we can not be so generous in abstracting where there is no immediate or short-term benefit. Let's add a purpose column to each of these similar concepts. See Table 3.

Table 3 - Purpose

Concepts that are similar Reasons they are similar Value in abstracting
Order Date
Ship Date
Actual Delivery Date
Dates around the order lifecycle Would you want to store additional types of order lifecycle dates in the future? The business expert replies: Yes, we might need to handle several additional order lifecycle dates in the near future.
Carrier
Supplier
Vendor
External companies Do you need to represent additional types of external companies?The business expert replies: We already have the complete list of external companies defined. I do not think we will have other external companies for quite a few years.You are thinking to yourself "But how can they be sure?"
Product Size
Product Brand
Product Dimension
Ways of classifying products Do you need to represent additional types of product groupings or classifications?The business expert replies: Yes, I believe there will be several additional product groupings in the near future.

Note the question and answer approach I chose to take in determining if each of these similar concepts are worth abstracting. When we abstract, we gain the ability to have additional "types" of something. So if we were to abstract the different order lifecycle dates from the first row in Table 3, we would be able to represent additional types of order lifecycle dates.

After filling in the purpose column, we can review the reasons and make intelligent decisions as to what has value in abstracting. Both the dates and the product groupings appear like very good candidates for abstraction. However, the reason for abstracting external companies is not as solid. It appears the modeler might be looking for an area to abuse abstraction. In real life, we will never see "But how can they be sure?" documented officially, but how often do you think people think it? So we should not pursue abstracting external companies in this case. It will not be valuable in the near future and therefore we should not sacrifice understanding on the model by abstracting these data elements.

Effort

The final question is around the effort involved in implementing the abstract structure. After determining that there is value in having such an abstract structure, we now need to answer the question, "Is the extra development effort now, substantially less than the development effort down the road?" This is a very tricky question, because it depends on who you ask. If you ask the data modeler (probably yourself) or anyone from the data administration team, the answer will probably be a resounding "Yes, it is worth the effort now." That is because we are looking ahead for the next requirement that can take advantage of these abstract structures. We are looking beyond just the current application. But who really needs to answer the question is the individual or department that pays the bills for the current application. Unfortunately, in many cases the bill payer is usually just concerned about their particular application at a point in time and may not see the value of such an abstract structure. See Table 4.

Table 4 - Effort

Concepts that are similar Reasons they are similar Value in abstracting Effort
Order Date
Ship Date
Actual Delivery Date
Dates around the order lifecycle Would you want to store additional types of order lifecycle dates in the future? The business expert replies: Yes, we might need to handle several additional order lifecycle dates in the near future. 1 week
Product Size
Product Brand
Product Dimension
Ways of classifying products Do you need to represent additional types of product groupings or classifications?The business expert replies: Yes, I believe there will be several additional product groupings in the near future. 2 weeks

I usually list the effort in weeks, but you can list it in any time duration that makes the most sense for your application. By using the three questions in this example, you can see the following benefits that this guide provides:

  • Identifies all areas to abstract. We want to make sure we don't miss any abstraction opportunities. Our first question on commonality helps ensure we completely find any areas to abstract.
  • Prohibits over abstracting. Over abstracting causes unnecessary loss of business information on the model, and extra effort and complexity in development. The purpose question in the safety guide makes sure the only abstraction used will be those that provide some amount of benefit and value.
  • Provides consistency. By applying the same three questions to every situation where abstraction can be used, we build a level of consistency with regard to abstraction in our models. This consistency leads to making quicker abstraction decisions over time, and helps people reading the models more rapidly come up to speed.

This article provided a set of guidelines on when to abstract. These guidelines address commonality, purpose, and effort. Our next article in this series will contain the optimal places to abstract within our data warehouse architecture.

If you have questions on abstraction or if there are other areas within flexible design strategies you would like me to address, please let me know. I can be reached at me@stevehoberman.com. For more on abstraction and other modeling techniques, please refer to my book, The Data Modeler's Workbench, Tools and Techniques for Analysis & Design. Also, if you are up for a data modeling challenge, sign up for my Design Challenges on my web site (www.stevehoberman.com). I send out periodic complex data modeling scenarios followed by several possible solutions.

About the Author

Steve Hoberman is an expert in the fields of data modeling and data warehousing, and teaches several data modeling courses throughout the year including a brand new Data Modeling Master Class. He is currently a global reference data expert for Mars, Inc. He has been data modeling since 1990 across industries as diverse as telecommunications, finance, and manufacturing. Steve speaks regularly for the Data Warehousing Institute. He is the author of The Data Modeler's Workbench, Tools and Techniques for Analysis & Design. Steve specializes in data modeling training, design strategy, and in creating techniques to improve the data modeling process and deliverables. He enjoys reviewing data models and is the founder of Design Challenges, a discussion group which tackles complex data modeling scenarios. To learn more about his data model reviews and to add your email address to the Design Challenge distribution list, please visit his web site at www.stevehoberman.com. He can be reached at me@stevehoberman.com