Metadata management is one of the foundational components of an enterprise data management initiative. Many organizations struggle with its incorporation into their data management processes.
Metadata is “data about data.” It is the context for the raw data content. Metadata can be thought of with the metaphor of a library’s card catalog. The card catalog contains the information about each item stored in the library. All the information on a card is metadata concerning the actual item (book, video, monograph, etc.).
As defined by David Marco, president of EWSolutions and author of two seminal books on metadata management:
All physical data (contained in software and other media) and knowledge (contained in employees and various media) from within and outside an organization, containing information about your company’s physical data, industry, technical processes, and business processes.
In short, metadata provides knowledge of the raw data.
Every component of enterprise data management has connections to metadata and its management.
1. Business analytics: Data definitions, reports, users, usage, performance.
2. Business architecture: Roles and organizations, goals and objectives.
3. Business definitions: The business terms and explanations for a particular concept, fact, or other item found in an organization.
4. Business rules: Standard calculations and derivation methods.
5. Data governance: Policies, standards, procedures, programs, roles, organizations, stewardship assignments.
6. Data integration: Sources, targets, transformations, lineage, ETL workflows, EAI, EII, migration / conversion.
7. Data quality: Definitions, defects, metrics, ratings.
8. Document content management: Unstructured data, documents, taxonomies, ontologies, name sets, legal discovery, search engine indexes.
9. Information technology infrastructure: Platforms, networks, configurations, licenses.
10. Logical data models: Entities, attributes, relationships and rules, business names and definitions.
11. Physical data models: Files, tables, columns, views, business definitions, indexes, usage, performance, change management.
12. Process models: Functions, activities, roles, inputs / outputs, workflow, business rules, timing, stores.
13. Systems portfolio and IT governance: Databases, applications, projects and programs, integration roadmap, change management.
14. Service-oriented architecture (SOA) information: Components, services, messages, master data.
15. System design and development: Requirements, designs and test plans, impact analysis.
16. Systems management: Data security, licenses, configuration, reliability, service levels.
Key Terms in Metadata Management
|Metadata||Metadata is all the information data and knowledge possessed by an organization that shows business and technical users where to find information in data repositories.|
|Technical Metadata||Technical Metadata is metadata describing technical aspects of IT systems, which designers and developers use to build and maintain them. Examples of technical metadata include descriptions of database tables, columns, sizes, data types, database key attributes and indices and technical data transformation rules.|
|Business Metadata||Business Metadata is metadata about the business terms, business processes and business rules. Business metadata provides the semantic layer between a company’s systems (operational and business intelligence) and their business users. It provides users a roadmap for navigating all the data in the enterprise by documenting what information is available and, when accessed, provides a context for interpreting the data. It is invaluable for making sound business decisions.|
|Managed Metadata Environment (MME)||The managed metadata environment represents the architectural components, people and processes that are required to properly and systematically gather, retain and disseminate metadata throughout the enterprise. (definition from “Building and Managing the Metadata Repository”, David Marco, J. Wiley, 2000)|
|Data Heritage||Data heritage represents information about the original source of the data. For example, a sales person types in the customer name in the Sales system or CUST_ID is a sequential number assigned by the Sales system.|
|Data Lineage||Data lineage represents information about everything that has “happened” to the data. Whether it was moved from one system to another, transformed, aggregated, etc., ETL (extraction, transformation, and load) tools can capture this metadata electronically.|
Table 1: Types of Metadata
Sources of metadata exist in every information technology activity, and metadata is produced by every information technology process and application. In many cases, there is much more metadata to examine than there is actual (raw) data. This proliferation of metadata makes its management a complex endeavor.
To manage metadata successfully, an organization should start by developing and implementing a metadata strategy. A metadata strategy provides vision for the organization’s use of its metadata and a high-level plan (roadmap) for implementing that vision. The strategy focuses on all types of metadata (see Table 1), and it should include an enterprise focus rather than concentrate on specific functional areas in the organization.
These are the major steps in the development of an organization’s metadata strategy:
1. Metadata Strategy Initiation and Planning: Prepares the metadata strategy team and various participants for the upcoming effort to facilitate the process and improve results. It outlines the charter and organization of the metadata strategy, including alignment with the data governance efforts, and establishes the communication of these objectives to all parties. Determine / confirm the scope of the metadata strategy and communicate the potential business value and objectives.
2. Conduct Key Stakeholder Interviews: The stakeholder interviews provide a foundation of knowledge for the metadata strategy. Stakeholders would usually include both business and technical stakeholders. Identify types of metadata required by each stakeholder; identify stakeholders’ needs for metadata (present and future).
3. Assess Existing Metadata Sources and Information Architecture: Determines the current sources of metadata and identifies systems issues noted in the interviews and documentation review. During this stage, conduct detailed interviews of key IT staff and review documentation of the system architectures, data models, etc.
4. Develop Future Metadata Architecture: Refine and confirm the organization’s vision for managing metadata, and develop the long-term target architecture for the managed metadata environment (MME) in this stage. This phase includes all of the strategy components, such as organization structure, including data governance and stewardship alignment recommendations; managed metadata architecture; metadata delivery architecture; technical architecture; and security architecture.
5. Develop Phased MME Implementation Strategy and Plan: Review, validate, integrate, prioritize, and agree to the findings from the interviews and data analyses. Plan to implement the metadata strategy, incorporating a phased implementation approach that takes the organization from the current environment to the future managed metadata environment (MME).
Metadata and Data Governance
Metadata and data governance are the two enterprise data management components whose success is dependent on its companion discipline. Data governance establishes policies and standards for metadata as well as data, and business data stewards (insert data stewardship anchor article here) manage the execution of those policies and standards. A metadata strategy and implementation plan must include active enterprise data governance and formal data stewardship, following best practices.
There are two types of metadata standards, industry (consensus), and international. In most cases, the international standards serve as the basis for the industry (consensus) standards. The most active organizations in metadata include ANSI (American National Standards Institute) and ISO (International Organization for Standardization). The core metadata registry standard is ISO/IEC 11179 (and its various parts) state the international standards for a schema for recording both the meaning and technical structure of the data for unambiguous usage by humans and computers (registries and repositories).
The Dublin Core metadata terms are a set of vocabulary terms which can be used to describe resources for the purposes of discovery. The original set of 15 classic metadata terms, known as the Dublin Core Metadata Element Set, is endorsed in several standards documents:
- IETF RFC 5013
- ISO Standard 15836-2009
- NISO Standard Z39.85
Most industry (consensus) standards are based on these internationally accepted standards. Some industry standards for metadata and its management would include:
- OMG specifications: OMG is a nonprofit consortium of computer industry leaders dedicated to the definition, promotion, and maintenance of industry standards for interoperable enterprise applications. OMG is the creator of the CORBA middleware standard and has defined the other meta-data related standards:
- Common Warehouse Metadata (CWM): Specifies the interchange of meta-data among data warehousing, BI, KM, and portal technologies.
- Information Management Metamodel (IMM): The next iteration of CWM, now under OMG direction and development attempts to bridge the gap between Object Oriented tech, data, and XML while incorporating CWM.
- MDC Open Information Model (OIM): A vendor-neutral and technology-independent specification of core metadata types found in operational, data warehousing, and knowledge management environments.
- The Extensible Markup Language (XML): The standard format for the interchange of metadata using the MDC OIM.
- Ontology Definition Metamodel (ODM): A specification for formal representation, management, interoperability, and application of business semantics in support of OMG vision model driven architectures (MDA).
Metadata and Data Warehousing / Business Intelligence
Without proper metadata (business and technical), any form of data integration, including a data warehouse, cannot succeed fully. Metadata gives context to the data in the source systems, provides understanding for what data should be extracted from the source according to the warehouse’s goals, indicates the format from the source data to enable transformation and cleansing, and allows the end-user community to comprehend and analyze the data in the target database.
As stated by William H. Inmon, father of data warehousing, in a white paper on the subject of metadata management and data warehousing / business intelligence:
Simply from the standpoint of who needs help the most in terms of finding one’s way around data and systems, it is assumed the DSS analysis community requires a much more formal and intensive level of support than the information technology community. For this reason alone, the formal establishment of ongoing support of metadata becomes important in the data warehouse environment.
But there is a secondary, yet important, reason why metadata plays an important role in the data warehouse environment. In the data warehouse environment, the first thing the DSS analyst needs to know to do his/her job is what data is available and where it is in the data warehouse. In other words, when the DSS analyst receives an assignment, the first thing the DSS analyst needs to know is what data exists that might be useful in fulfilling the assignment. To this end, the metadata for the warehouse is vital to the preparatory work done by the DSS analyst.
Types of metadata that are found in, used by, or aligned with a data warehouse / business intelligence environment include but are not limited to:
- Source-to-target mapping of data into the data warehouse
- Extract/transformation history
- Data element alias information
- DW status information (database and ETL processes)
- DW volume statistics
- DW data aging / purging criteria
- Summary/calculation data across levels of the warehouse
- Data relationship artifact information (including history)
- Data ownership/stewardship information for data elements, entities, etc.
- DW access and related data security information
Metadata Environment Development
The managed metadata environment (MME) represents the architectural components, people and processes that are required to properly and systematically gather, retain, and disseminate metadata throughout the enterprise. The MME encapsulates the concepts of metadata repositories, catalogs, data dictionaries and any other term that people have thrown out to refer to the systematic management of metadata.
Figure 1 – Managed Metadata Environment (MME)
There are a variety of products on the market that can be combined to create a robust, enterprise managed metadata environment. Selection of the right products for an organization depends on the result of the metadata strategy, which is part of the enterprise’s data strategy.
Metadata is one of the foundational components of any successful enterprise data management initiative and is the companion to every effective data governance program. Metadata’s importance to an organization’s advancement in all its operations is essential and business areas as well as information technology efforts will benefit from managing metadata according to best practices and industry standards.