A managed metadata environment (MME) is the optimal approach to metadata architecture for meeting business and technical requirements
Almost every corporation and government agency has already built, is in the process of building, or is looking to build a Managed Metadata Environment (MME), either as part of a metadata solution or within an enterprise data management initiative. Many organizations, however, are making fundamental mistakes. An enterprise may build many metadata repositories, or “islands of metadata” that are not linked together, and as a result do not provide as much value (see “Where’s my metadata architecture?” sidebar).
As defined in the DAMA-DMBOK©, metadata is information about the physical data, technical and business processes, data rules and constraints, and logical and physical structures of the data, as used by an organization. These descriptive tags describe data, concepts, and the relationships between the data and concepts.
Let’s take a quick metadata management quiz. What is the most common form of metadata architecture? It is likely that most of you will answer, “centralized”; but the real answer is “bad architecture”. Most metadata repository architectures are built the same way data warehouse architectures were built: badly. The data warehouse architecture issue resulted in many Global 2000 companies rebuilding their data warehousing applications, sometimes from the ground up. Many of the metadata repositories under development or already in use need to be completely rebuilt.
|Where’s my metadata architecture?|
At EWSolutions one of our clients is a large pharmaceutical company. Since knowledge is the lifeblood of any pharmaceutical company, these types of firms tend to have very large metadata requirements and staffs. This company had decided to have a “Metadata Day” and as such, they had asked me to come on-site and give a keynote address to kick the day off. Between 60-80 people attended “Metadata Day”.
After the keynote address were a series of workshops. We counted 4 separate metadata repositories in production and 3 other separate new metadata repository initiatives starting up – a classic “islands of metadata” problem. This is not an approach that leads to long-term positive results. None of these islands are linked to each other and much of the most valuable metadata functionality will come from the relationships that the metadata has with itself. For example, it is highly valuable to view a physical column name (technical metadata) and then drill-across to the business definition (business metadata) of that physical column name.
The managed metadata environment represents the architectural components, people and processes that are required to properly and systematically gather, retain, and disseminate metadata throughout the enterprise. The MME encapsulates the concepts of metadata repositories, catalogs, data dictionaries and any other term that people have thrown out to refer to the systematic management of metadata. Some people mistakenly describe an MME as a data warehouse for metadata. In actuality, an MME is an operational system and as such is architected in a vastly different manner than a data warehouse.
Companies that are looking to truly and efficiently manage metadata from an enterprise perspective need to have a fully functional MME. It is important to note that a company should not try to store all of their metadata in a MME, just as the company would not try to store all of their data in a data warehouse. Without the MME’s components, it is very difficult to be effective managing metadata in a large organization.
The six components of the MME, shown in Figure 1, are:
- Meta data sourcing layer
- Meta data integration layer
- Meta data repository
- Meta data management layer
- Meta data marts
- Meta data delivery layer
Figure 1: Managed Metadata Environment
An MME can be used in either the centralized, decentralized or distributed architecture approaches: Centralized architecture offers a single, uniform, and consistent meta model that mandates the schema for defining and organizing the various metadata stored in a global metadata repository. This allows for a consolidated approach to administering and sharing metadata across the enterprise. Decentralized architecture creates a uniform and consistent meta model that mandates the schema for defining and organizing a global subset metadata to be stored in a global metadata repository and in the designated shared metadata elements that appear in local metadata repositories. All metadata that is shared and re-used among the various local repositories must first go through the global repository, but sharing and access to the local metadata are independent of the global repository.
Distributed architecture includes several disjointed and autonomous metadata repositories that have their own meta models to dictate their internal metadata content and organization with each repository solely responsible for the sharing and administration of its metadata. The global metadata repository will not hold metadata that appears in the local repositories; instead it will have pointers to the metadata in the local repositories and metadata on how to access it. See Figure (1). At EWSolutions we have built MMEs that use each of these three architectural approaches and some implementations use combinations of these techniques in one MME.
Metadata Sourcing Layer
The Metadata Sourcing Layer is the first component of the MME architecture. The purpose of the Metadata Sourcing Layer is to extract metadata from its source and to send it into the Metadata Integration Layer or directly into the metadata repository (see Figure 2). Some metadata will be accessed by the MME through the use of pointers (distributed) that will present the metadata to the end user at the time that it is requested. The pointers are managed by the Metadata Sourcing Layer and stored in the Metadata Repository.
Figure 2: Metadata Sourcing Layer
It is best to send the extracted metadata to the same hardware location as the Metadata Repository. Often metadata architects incorrectly build metadata integration processes on the platform that the metadata is sourced from (other than record selection, which is acceptable). This merging of the metadata sourcing layer with the metadata integration layer is a common mistake that causes a whole host of issues.
As sources of metadata are changed and added (and they will), the metadata integration process is negatively impacted. When the metadata sourcing layer is separated from the metadata integration layer only the metadata sourcing layer is affected by this type of change. By keeping all of the metadata together on the target platform, the metadata architect can adapt the integration processes much more easily.
Keeping the extraction layer separate from the sourcing layer provides a tidy backup and restart point. Metadata loading errors typically happen in the metadata transformation layer. Without the extraction layer, if an error occurred the architect would have to go back to the source of the metadata and re-read it. This can cause a number of problems. If the source of metadata has been updated it may become out of sync with some of the other sources of metadata that it integrates with. In addition, the metadata source may currently be in use and this processing could affect the performance of the metadata source. The golden rule of metadata extraction is:
Never have multiple processes extracting the same metadata from the same metadata source.
In these situations, the timeliness and consequently the accuracy of the metadata can be compromised. For example, suppose that you have built one metadata extraction process (Process #1) that reads physical attribute names from a modeling tool’s tables to load a target entity in the meta model table that contains physical attribute names. You also built a second process (Process #2) to read and load attribute domain values. It is possible that the attribute table in the modeling tool has been changed between the running of Process #1 and Process #2. This situation would cause the metadata to be out-of-sync.
This situation can also cause unnecessary delays in the loading of the metadata with metadata sources that have limited availability/batch windows. For example, if you were reading database logs from your enterprise resource planning (ERP) system you would not want to run multiple extraction processes on these logs since they most likely have a limited amount of available batch window. While this situation does not happen often, there is no reason to build in unnecessary flaws into your metadata architecture.
The number and variety of metadata sources will vary greatly based on the business requirements of your MME. Though there are sources of metadata that many companies commonly use, I have never seen two metadata repositories that have exactly the same metadata sources. Have you ever seen two data warehouses with exactly the same source information? Following are the most common metadata sources:
- Software tools
- End users
- Documents and spreadsheets
- Messaging and transactions
- Web sites and E-commerce
- Third parties
Metadata Integration Layer
The metadata integration layer (Figure 3) takes the various sources of metadata, integrates them, and load it into the metadata repository. This approach differs slightly from the common techniques used to load data into a data warehouse, as the data warehouse clearly separates the transformation (what we call integration) process from the load process. In a MME these steps are combined because, unlike a data warehouse, the volume of metadata is not nearly as large as in data warehousing data.
As a general rule the MMEs holds between 5-20 gigabytes of metadata; however, as MME’s are looking to target data audit related metadata then storage can grow into the 20-75 gigabyte range and over the next few years you will see some MME’s reach the terabyte range.
Figure 3: Metadata Integration Layer
The specific steps in this process depend on whether you are building a custom process or if you are using a metadata integration tool to assist your effort. If you decide to use a metadata integration tool, the specific tool selection can also greatly affect this process.
A metadata repository is a fancy name for a database designed to gather, retain, and disseminate metadata. The metadata repository (Figure 4) is responsible for the cataloging and persistent physical storage of the metadata.
Figure 4: Metadata Repository
The Metadata Repository should be generic, integrated, current and historical. Generic means that the physical meta model looks to store metadata by metadata subject area as opposed to application-specific.
For example, a generic meta model (a model of the metadata concepts) will have an attribute named “DATABASE_PHYS_NAME” that will hold the physical database names within the company. A meta model that is application-specific would name this same attribute “ORACLE_PHYS_NAME”. The problem with application-specific meta models is that metadata subject areas change. To return to our example, today Oracle may be our company’s database standard. Tomorrow we may switch the standard to SQL Server for cost or compatibility advantages. This situation would cause needless additional changes to the change to the physical meta model. (2)
A Metadata Repository also provides an integrated view of the enterprise’s major metadata subject areas. The repository should allow the user to view all entities within the company, and not just entities loaded in Oracle or entities that are just in the customer relationship management (CRM) applications.
Third, the Metadata Repository contains current and future metadata, meaning that the metadata is periodically updated to reflect the current and future technical and business environment. Keep in mind that a Metadata Repository is constantly updated and it needs to be, in order to be truly valuable.
Lastly, metadata repositories are historical. A good repository will hold historical views of the metadata, even as it changes over time. This allows a corporation to understand how their business has changed over time. This is especially critical if the MME is supporting an application that contains historical data, like a data warehouse or a CRM application. For example, if the business metadata definition for “customer” is “anyone that has purchased a product from our company within one of our stores or through our catalog”. A year later, a new distribution channel is added to the strategy. The company constructs a Web site to allow customers to order our products. At that point, the business metadata definition for customer would be modified to “anyone that has purchased a product from our company within one of our stores, through our mail order catalog or through the web”.
A good Metadata Repository stores both of these definitions because they both have validity, depending on what data you are analyzing (and the age of that data).
Lastly, it is strongly recommended that you implement your Metadata Repository component on an open, relational database platform, as opposed to a custon-designed, proprietary database engine.
Metadata Management Layer
The Metadata Management Layer provides systematic management of the Metadata Repository and the other MME components (see Figure 5). As with other layers, the approach to this component greatly differs depending on whether a metadata integration tool is used or if the entire MME is custom built. If an enterprise metadata integration tool is used for the construction of the MME, than a metadata management interface is most likely built within the product. This is almost never the case; however, if it is not built in the product, than you would be doing a custom build. The Metadata Management Layer performs the following functions:
- Database modifications
- Database tuning
- Environment management
- Job scheduling
- Load statistics
- Query statistics
- Query and report generation
- Security processes
- Source mapping and movement
- User interface management
Figure 5: Metadata Management Layer
A Metadata Mart is a database structure, usually sourced from a Metadata Repository, designed for a homogenous metadata user group (see Figure 6). “Homogenous metadata user group” is a fancy term for a group of users with similar needs.
Figure 6: Metadata Marts
There are two reasons why an MME may need to have metadata marts. First, a particular metadata user community may require metadata organized in a manner other than what is in the Metadata Repository component.
Second, an MME with a larger user base often experiences performance problems because of the number of table joins that are required for the metadata reports. In these situations it is best to create metadata mart(s) targeted specifically to meet those user’s needs. The Metadata Marts will not experience the performance degradation because they will be modeled multi-dimensionally.
In addition, a separate meta mart provides a buffer layer between the end users from the Metadata Repository. This allows routine maintenance, upgrades, and backup and recovery to the repository without affecting the availability of the Metadata Mart.
Metadata Delivery Layer
The Metadata Delivery Layer is the sixth and final component of the MME architecture. It delivers the metadata from the Metadata Repository to the end users and any applications or tools that require metadata feeds to them (Figure 7). (3)
Figure 7: Metadata Delivery Layer
The most common targets that require metadata from the MME are:
- Data warehouses and data marts
- End users (business and technical)
- Messaging and transactions
- Meta data marts
- Software tools
- Third parties
- Web sites and e-commerce
For professionals that have built an enterprise metadata repository they realize that it is so much more than just a database that holds metadata and pointers to metadata. Rather it is an entire environment. The purpose of the MME is to illustrate the major architecture components of that managed metadata environment.
This article is adapted from the book “Universal Metadata Models” by David Marco & Michael Jennings, © John Wiley & Sons (2000)
(1) See Chapter 7 of “Building and Managing the Metadata Repository” (David Marco, Wiley 2000) for a more detailed walkthrough of these approaches.
(2) See Chapters 4 – 8 of “Universal Metadata Models” (David Marco & Michael Jennings, Wiley 2004) to see various physical meta models.
(3) See Chapter 10 of “Building and Managing the Metadata Repository” (David Marco, Wiley 2000) for a detailed discussion on metadata consumers and metadata delivery.