Use of meta data entities - Part 3 of a series on abstraction
by Steve Hoberman
In every article in this abstraction series, I like to remind the reader of that very abstract painting I observed in a museum, which consisted of a white dot on a red canvas that represented a city skyline. In my first article in this series, I concluded that abstraction is a tool that lets an artist efficiently capture and represent complex topics in a generic way. As data modelers, we are all artists to an extent, and abstraction is an efficient tool at our disposal as well. The first article explained what abstraction is, the second article explained when to use abstraction, and this article will discuss where to use abstraction. This article starts off discussing the two conditions that need to be present to make abstraction effective. Then we will discuss four applications that can meet these two conditions and therefore make excellent candidates for abstraction.
This article is part of a series on abstraction. Here are the future topics that will be covered:
- Use of meta data helper entities. I'll show an approach to capturing definitions and business rules on an abstracted model and therefore in the corresponding database.
- Reusable abstract entities. I will share the abstract entities I use most often.
- Reusable abstract relationships. I will share the abstract relationships I use most often.
- Reusable abstract data elements. I will share the abstract data elements I use most often.
If you have questions on abstraction or if there are other areas within flexible design strategies you would like me to address, please let me know. I can be reached at me@stevehoberman.com. For more on abstraction and other modeling techniques, please refer to my book, The Data Modeler's Workbench, Tools and Techniques for Analysis & Design. If you are up for a data modeling challenge, sign up for my Design Challenges on my web site (www.stevehoberman.com). I send out periodic complex data modeling scenarios followed by several possible solutions. Also, I've recently added two new data modeling courses, so please click here to see my latest offerings.
Conditions to maximize the effectiveness of abstraction
I have noticed two main conditions that must be present for abstraction to be effective:
Design needs to last "forever". Abstraction is very useful when the data model you are working on is for an application which is for a critical component of a systems architecture or the source application for very important and strategic information. These types of applications are expected to last a very long time, and therefore it is important to design with flexibility in mind. I reviewed the model for an integration hub not too long ago. This integration hub was a critical component to the marketing department of a large company. Many source applications fed this integration hub. Many other applications depended on data from this integration hub. This integration hub has a very normalized physical design. Its business rules are clearly understood and there is minimal redundancy in the database. Despite a clear understanding and minimal redundancy, enhancements to this integration hub require months of effort and cost lots of money. There was such a large investment because the data model had to be changed, and therefore the database and usually a substantial amount of programming code also had to be changed. One of the main reasons for this huge effort to make any modifications is the lack of abstraction within this integration hub's design. A hub, like any other integrated application, is very important to the business and is usually the heart of a systems architecture. Therefore, it needs to be designed with flexibility using abstraction, or else every little change will become a huge development effort and eventually the integration hub will be slowly replaced by something more flexible, but only after spending lots of money and time.
Requirements can change. There are applications whose requirements can slowly change with the passage of time. This is not necessarily the fault of incomplete business or functional requirements. Rather, the applications are anticipated to grow and meet the needs of more individuals and departments, and therefore will be required to handle additional types of information. You can imagine, in the integration hub example mentioned in the previous paragraph, how new requirements would naturally emerge. With new source systems, and new target systems, there will always be another data element or business rule that is requested to become part of the integration hub's design. There is no way to predict all future requirements in the initial application design. Therefore, there is a need for flexibility and abstraction.
Applications that benefit from abstraction
Applications that can benefit from using abstraction due to their need to last "forever" and the realization that their requirements that can change quite often include data warehouses, meta data repositories, packaged software, and reference databases. Let's discuss each of these.
Data warehouses. Data warehouses represent the infrastructure behind our data marts and we would consider the data warehouses to be a critical component of our reporting architecture. These types of applications take lots of data in from a variety of sources and send lots of data out to many data marts. The data warehouse is the heart of our reporting architecture, and is not expected to disappear any time soon. Therefore, we need to design the data warehouse with flexibility in mind. An example of using abstraction within a data warehouse was a customer structure I modeled that used abstraction to represent different ways of classifying customers. Regardless of whether we wanted to classify customers by size, industry, or location, this classification structure would flexibly handle any customer grouping scenario. This abstract structure would receive customer information from several different source applications, and pass this customer information to several different data marts. Keeping customer classification within an abstract structure means that when new types of classifications are passed from source applications, we will not need to make any changes to the data warehouse design. We would just add a new row to a classification type entity and we are done.
Meta data repositories. With the current state of the meta data repository market, many of us are currently not sure what to do about a meta data repository. Most of our companies seem to fall into one of three categories:
- Using a meta data repository that does not meet all of your requirements
- Considering purchasing a meta data repository
- Considering building your own meta data repository
A small percentage of companies have meta data repositories that are used often and provide all required functionality. Most of us who have repositories probably belong to the first category, and not all of our requirements are being satisfied. This is most likely due to an inflexible design. If you belong to the second or third categories, you need to be aware that the data model behind the repository needs to be flexible enough to handle new types of meta data requirements, and therefore be able to last a very long time. This translates into being able to accommodate new meta data requirements. For example, let's say you suddenly want to store the new type of meta data of subject area stewardship information within your repository. Can you do it? Or do you need to change the data model, and therefore the repository's structure and code to accommodate this new requirement? If the repository needs changing every time new types of meta data needs to be added, eventually someone who is paying the bills for these enhancements will realize that it is no longer worth the extra effort and the repository will die a slow death. Repositories become more valuable the more information we have to store and retrieve. Therefore, we want more and more people to use repositories to retrieve and store valuable business and technical meta data. Because we can't predict all possible types of meta data that might be requested, we need a meta data repository data model that uses abstraction to be flexible enough to handle new requirements. Therefore repositories require an abstract structure to have a long and meaningful life. I have seen the data models to a number of packaged meta data repositories, and even though some of the models have hundreds of entities, somewhere in their design should be a set of four abstract tables similar to Figure 1. Please note that I changed the names of these entities to avoid specifically referring to any actual packaged software entity names.
Figure 1 Meta data repository abstract model
The Object entity includes any piece of meta data in our organizations. Examples of some of the values of Object include:
- Customer Last Name
- Product Classification
- Vendor
- Customer to Sales Representative relationship
- Order Data Model
Objects can relate to other objects, hence the recursive relationship. For example the Customer Last Name data element can belong to the Customer entity. The Object Type entity categorizes the meta data, examples being:
- Entity
- Data element
- Relationship
- Data model
Object Types can also have relationships to each other, shown with its recursive relationship. For example, Entities can contain Data Elements. The Characteristic entity contains all of the descriptive information about the meta data, such as definition, format, nullability, and version information. For example:
- The Customer Last Name is the surname of the customer
- Customer Last Name is 30 characters in length
- Order Data Model last changed on March 2nd, 2001, 5 PM by Bob Jones
Characteristics can also have relationships to each other, shown through its recursive relationship. Characteristic types contain the descriptive information about the characteristics, such as:
- Definition
- Format
- Version
Characteristic Types can also have relationships to each other via its recursive relationship, such as Definitions always require a Version. Take a moment or two and let you imagine run rampant, and try to identify types of meta data that this abstract model will not support. I can almost guarantee that any type of meta data you can think of can be handled by this structure. Whether we have business, functional, or technical meta data, this design will support it. Be aware that this meta data repository abstract data model is also useful when evaluating packaged meta data repositories in determining whether they will meet your needs.
Packaged software. In determining whether to purchase a packaged piece of software, it is a good idea to compare your needs against the package software's data model to see if you will be satisfied with their functionality. Although it can sometimes be very difficult getting the data model from the software company (I have heard "That's proprietary!" way too many times), if it is critical to purchasing the product, most software vendors will provide it to you. Don't just compare your existing normalized logical data model with their packaged software data model. This will only help you answer the question "Does this piece of software meet my current needs?" You want to know if it will meet some of your future needs as well. Therefore, apply abstraction according to the guide mentioned in our last article. Then match your abstract model against the packaged software model to see how flexible the packaged software model really is. You will quickly find out whether this product will meet your current and future needs or whether it will limit your functionality and not be worth the investment. So for example, if you are evaluating a packaged meta data repository, compare their data model with the one in Figure 1 to see if it can handle all possible types of meta data within your organization.
Recently, I analyzed several data elements within a consumer contact database that was purchased as a packaged piece of software. I was shocked when I noticed email addresses within the Company Name data element. When I brought this up to the functional analyst as a potential data quality issue, he responded that they always store their email address there. He must have noticed a surprised look on my face, so he added that there are no other places to put email address in the packaged software and therefore they decided to use one of their least populated data elements for this information, which was the Company Name. This could be a very good example of an inflexible design, due to a data model with not enough abstraction to handle an obvious future requirement.
Reference databases. Many of our companies have separate applications for specific reference subject areas. The most popular examples are Item and Customer databases. These applications need to handle a variety of changing requirements and pass this information on to many downstream applications. There might always be new item brands or different ways of classifying items. Using abstraction allows us to handle these types of changes without making changes to the design. Making changes to a design for a reference database not only impacts the reference database but also potentially impacts all of the applications downstream that receive information from the reference database. So you want to minimize making changes to reference databases, hence the usefulness of abstraction. I have seen both item and customer applications with abstraction applied to increase flexibility. I have seen an item model where an abstract classification structure was used to handle any way of classifying products, such as by brand or by size. I have seen a customer model where an abstract association structure was used to handle any possible relationships between customers.
This article focused on where to use abstraction. We first discussed the two conditions that need to be present for abstraction to be most effective. Then we discussed four applications that generally meet these two conditions and therefore make excellent candidates for abstraction. Our next article in this series will discuss a very good friend of abstraction, the meta data helper entity.
About the Author
Steve Hoberman is an expert in the fields of data modeling and data warehousing, and teaches several data modeling courses throughout the year including a brand new Data Modeling Master Class. He is currently a global reference data expert for Mars, Inc. He has been data modeling since 1990 across industries as diverse as telecommunications, finance, and manufacturing. Steve speaks regularly for the Data Warehousing Institute. He is the author of The Data Modeler's Workbench, Tools and Techniques for Analysis & Design. Steve specializes in data modeling training, design strategy, and in creating techniques to improve the data modeling process and deliverables. He enjoys reviewing data models and is the founder of Design Challenges, a discussion group which tackles complex data modeling scenarios. To learn more about his data model reviews and to add your email address to the Design Challenge distribution list, please visit his web site at www.stevehoberman.com. He can be reached at me@stevehoberman.com