Metadata supports the conditions that uses of abstraction in data modeling, ensuring lasting and scalable data models
When I write on abstraction, I like to remind the reader of a very abstract painting I observed in a museum, which consisted of a white dot on a red canvas that represented a city skyline. In my first article in this series, I concluded that abstraction is a tool that lets an artist efficiently capture and represent complex topics in a generic way. As data modelers , we are all artists to an extent, and abstraction is an efficient tool at our disposal as well. This article discusses the two conditions that need to be present to make abstraction effective. Then we will explore four applications that can meet these two conditions to make excellent candidates for abstraction.
Conditions to maximize the effectiveness of abstraction
I have noticed two main conditions that must be present for abstraction to be effective:
Design needs to last “forever.” Abstraction is very useful when the data model you are working on is for an application which is for a critical component of a systems architecture or the source application for very important and strategic information. These types of applications are expected to last a very long time, and therefore it is important to design with flexibility in mind. I reviewed the model for an integration hub not too long ago. This integration hub was a critical component to the marketing department of a large company. Many source applications fed this integration hub. Many other applications depended on data from this integration hub. This integration hub has an extremely normalized physical design. Its business rules are clearly understood and there is minimal redundancy in the database.
Despite a clear understanding and minimal redundancy, enhancements to this integration hub require months of effort and cost lots of money. There was such a large investment because the data model had to be changed, and therefore the database and usually a substantial amount of programming code also had to be changed. One of the main reasons for this huge effort to make any modifications is the lack of abstraction within this integration hub’s design. A hub, like any other integrated application, is very important to the business and is usually the heart of systems architecture. Therefore, it needs to be designed with flexibility using abstraction, or else every little change will become a huge development effort and eventually the integration hub will be slowly replaced by something more flexible, but only after spending lots of money and time.
Requirements can change. There are applications whose requirements can slowly change with the passage of time. This is not necessarily the fault of incomplete business or functional requirements. Rather, the applications are anticipated to grow and meet the needs of more individuals and departments, and therefore will be required to handle additional types of information. You can imagine, in the integration hub example mentioned in the previous paragraph, how new requirements would naturally emerge. With new source systems, and new target systems, there will always be another data element or business rule that is requested to become part of the integration hub’s design. There is no way to predict all future requirements in the initial application design. Therefore, there is a need for flexibility and abstraction.
Applications that benefit from abstraction
Applications that can benefit from using abstraction due to their need to last “forever” and the realization that their requirements that can change quite often include data warehouses, metadata repositories, packaged software, and reference databases. Let us discuss each of these.
Data warehouses. Data warehouses represent the infrastructure behind our data marts and we would consider the data warehouses to be a critical component of our reporting architecture. These types of applications take lots of data in from a variety of sources and send lots of data out to many data marts. The data warehouse is the heart of our reporting architecture, and is not expected to disappear any time soon. Therefore, we need to design the data warehouse with flexibility in mind. An example of using abstraction within a data warehouse was a customer structure I modeled that used abstraction to represent different ways of classifying customers. Regardless of whether we wanted to classify customers by size, industry, or location, this classification structure would flexibly handle any customer grouping scenario.
This abstract structure would receive customer information from several different source applications, and pass this customer information to several different data marts. Keeping customer classification within an abstract structure means that when new types of classifications are passed from source applications, we will not need to make any changes to the data warehouse design. We would just add a new row to a classification type entity and we are done.
Metadata repositories. With the current state of the metadata repository market, many of us are currently not sure what to do about a metadata repository. Most of our companies seem to fall into one of three categories:
- Using a metadata repository that does not meet all of your requirements
- Considering purchasing a metadata repository
- Considering building your own metadata repository
A small percentage of companies have metadata repositories that are used often and provide all required functionality. Most of us who have repositories probably belong to the first category, and not all of our requirements are satisfied. This is most likely due to an inflexible design. If you belong to the second or third categories, you need to be aware that the data model behind the repository needs to be flexible enough to handle new types of metadata requirements, and therefore be able to last a very long time. This translates into being able to accommodate new metadata requirements. For example, let’s say you suddenly want to store the new type of metadata of subject area stewardship information within your repository. Can you do it? Or, do you need to change the data model, and therefore the repository’s structure and code to accommodate this new requirement?
If the repository needs changing every time new types of metadata needs to be added, eventually someone who is paying the bills for these enhancements will realize that it is no longer worth the extra effort and the repository will die a slow death. Repositories become more valuable with more information we have to store and retrieve. Therefore, we want more and more people to use repositories to retrieve and store valuable business and technical metadata. Because we can’t predict all possible types of metadata that might be requested, we need a metadata repository data model that uses abstraction to be flexible enough to handle new requirements. Therefore, repositories require an abstract structure to have a long and meaningful life. I have seen the data models to a number of packaged metadata repositories, and even though some of the models have hundreds of entities, somewhere in their design should be a set of four abstract tables similar to Figure 1. Please note that I changed the names of these entities to avoid specifically referring to any actual packaged software entity names.
Figure 1. Metadata repository abstract model
The Object entity includes any piece of metadata in our organizations. Examples of some of the values of Object include:
- Customer Last Name
- Product Classification
- Customer to Sales Representative relationship
- Order Data Model
Objects can relate to other objects, hence the recursive relationship. For example the Customer Last Name data element can belong to the Customer entity. The Object Type entity categorizes the metadata, examples being:
- Data element
- Data model
Object Types can also have relationships to each other, shown with its recursive relationship. For example, Entities can contain Data Elements. The Characteristic entity contains all of the descriptive information about the metadata, such as definition, format, nullability, and version information. For example:
- The Customer Last Name is the surname of the customer
- Customer Last Name is 30 characters in length
- Order Data Model last changed on March 2nd, 2001, 5 PM by Bob Jones
Characteristics can also have relationships to each other, shown through its recursive relationship. Characteristic types contain the descriptive information about the characteristics, such as:
Characteristic Types can also have relationships to each other via its recursive relationship, such as Definitions always require a Version. Take a moment or two and let you imagine run rampant, and try to identify types of metadata that this abstract model will not support. I can almost guarantee that any type of metadata you can think of can be handled by this structure. Whether we have business, functional, or technical metadata, this design will support it. Be aware that this metadata repository abstract data model is also useful when evaluating packaged metadata repositories in determining whether they will meet your needs.
Packaged software. In determining whether to purchase a packaged piece of software, it is a good idea to compare your needs against the package software’s data model to see if you will be satisfied with their functionality. Although it can sometimes be very difficult getting the data model from the software company (I have heard “That’s proprietary!” way too many times), if it is critical to purchasing the product, most software vendors will provide it to you. Do not just compare your existing normalized logical data model with their packaged software data model. This will only help you answer the question “Does this piece of software meet my current needs?” You want to know if it will meet some of your future needs as well. Therefore, apply abstraction according to the guide mentioned in our last article. Then match your abstract model against the packaged software model to see how flexible the packaged software model really is. You will find out whether this product will meet your current and future needs or whether it will limit your functionality and not be worth the investment, and you will make this discovery quickly. So for example, if you are evaluating a packaged metadata repository, compare their data model with the one in Figure 1 to see if it can handle all possible types of metadata within your organization.
Recently, I analyzed several data elements within a consumer contact database that was purchased as a packaged piece of software. I was shocked when I noticed email addresses within the Company Name data element. When I brought this up to the functional analyst as a potential data quality issue, he responded that they always store their email address there. He must have noticed a surprised look on my face, so he added that there are no other places to put email address in the packaged software and therefore they decided to use one of their least populated data elements for this information, which was the Company Name. This could be a very good example of an inflexible design, due to a data model with not enough abstraction to handle an obvious future requirement.
Reference databases. Many of our companies have separate applications for specific reference subject areas. The most popular examples are Item and Customer databases. These applications need to handle a variety of changing requirements and pass this information on to many downstream applications. There might always be new item brands or different ways of classifying items. Using abstraction allows us to handle these types of changes without making changes to the design. Making changes to a design for a reference database not only impacts the reference database but also potentially impacts all of the applications downstream that receive information from the reference database. So you want to minimize making changes to reference databases, hence the usefulness of abstraction. I have seen both item and customer applications with abstraction applied to increase flexibility. I have seen an item model where an abstract classification structure was used to handle any way of classifying products, such as by brand or by size. I have seen a customer model where an abstract association structure was used to handle any possible relationships between customers.
This article focused on where to use abstraction. We first discussed the two conditions that need to be present for abstraction to be most effective. Then we discussed four applications that generally meet these two conditions and therefore make excellent candidates for abstraction. Other areas for exploration in the abstraction topic would include use of metadata helper entities and reusable abstract objects (entities, relationships, attributes).