Metadata, defined as “data about data,” provides context and information about raw data, similar to how a library’s card catalog describes the items within.

It plays a crucial role in enterprise data management by connecting various components such as business analytics, data governance, and data quality, and is essential for effective decision-making and understanding data lineage.

Metadata Strategy Definition

Metadata is “data about data.”  It is the context for the raw data content.  Metadata can be thought of with the metaphor of a library’s card catalog.  The card catalog contains the information about each item stored in the library.  All the information on a card is metadata concerning the actual item (book, video, monograph, etc.). 

As defined by David Marco, president of EWSolutions and author of two seminal books on metadata management:

All physical data (contained in software and other media) and knowledge (contained in employees and various media) from within and outside an organization, containing information about your company’s physical data, industry, technical processes, and business processes.

In short, metadata provides knowledge of the raw data.

Key Components of Enterprise Data Management and Their Role in a Metadata Management Strategy

Every component of enterprise data management has connections to metadata and its management.

  1. Business analytics: Data definitions, reports, users, usage, performance.
  2. Business architecture: Roles and organizations, goals and objectives.
  3. Business definitions: The business terms and explanations for a particular concept, fact, or other item found in an organization.
  4. Business rules: Standard calculations and derivation methods.
  5. Data governance: Policies, standards, procedures, programs, roles, organizations, stewardship assignments.
  6. Data integration: Sources, targets, transformations, lineage, ETL workflows, EAI, EII, migration / conversion.
  7. Data quality: Definitions, defects, metrics, ratings.
  8. Document content management: Unstructured data, documents, taxonomies, ontologies, name sets, legal discovery, search engine indexes.
  9. Information technology infrastructure: Platforms, networks, configurations, licenses.
  10. Logical data models: Entities, attributes, relationships and rules, business names and definitions.
  11. Physical data models: Files, tables, columns, views, business definitions, indexes, usage, performance, change management.
  12. Process models: Functions, activities, roles, inputs / outputs, workflow, business rules, timing, stores.
  13. Systems portfolio and IT governance: Databases, applications, projects and programs, integration roadmap, change management.
  14. Service-oriented architecture (SOA) information: Components, services, messages, master data.
  15. System design and development: Requirements, designs and test plans, impact analysis.
  16. Systems management: Data security, licenses, configuration, reliability, service levels.

Key Terms in Metadata Management

TermDefinition
MetadataMetadata is all the information data and knowledge possessed by an organization that shows business and technical users where to find information in data repositories. 
Technical MetadataTechnical Metadata is metadata describing technical aspects of IT systems, which designers and developers use to build and maintain them.  Examples of technical metadata include descriptions of database tables, columns, sizes, data types, database key attributes and indices and technical data transformation rules.
Business Metadata  Business Metadata is metadata about the business terms, business processes and business rules. Business metadata provides the semantic layer between a company’s systems (operational and business intelligence) and their business users. It provides users a roadmap for navigating all the data in the enterprise by documenting what information is available and, when accessed, provides a context for interpreting the data. It is invaluable for making sound business decisions.
Managed Metadata Environment (MME)The managed metadata environment represents the architectural components, people and processes that are required to properly and systematically gather, retain and disseminate metadata throughout the enterprise.  (definition from “Building and Managing the Metadata Repository”, David Marco, J. Wiley, 2000)
Data HeritageData heritage represents information about the original source of the data.  For example, a sales person types in the customer name in the Sales system or CUST_ID is a sequential number assigned by the Sales system.
Data LineageData lineage represents information about everything that has “happened” to the data.  Whether it was moved from one system to another, transformed, aggregated, etc., ETL (extraction, transformation, and load) tools can capture this metadata electronically.
Table 1: Types of Metadata

Sources of metadata exist in every information technology activity, and metadata is produced by every information technology process and application. 

Building an Effective Metadata Management Strategy

To manage metadata successfully, organizations must develop a comprehensive metadata management strategy. This strategy outlines the vision for managing data assets and a roadmap for implementing it across the organization. A well-designed metadata strategy ensures that all types of metadata—descriptive metadata, administrative metadata, and structural metadata—are accounted for, with a focus on enterprise metadata management rather than isolated functional areas.

Key Steps in Developing a Metadata Management Strategy

  1. Initiation and Planning: Begin by preparing the metadata strategy team and relevant data professionals for the upcoming effort. This phase establishes the metadata management frameworks aligned with data governance and regulatory compliance (e.g., General Data Protection Regulation). It sets the scope and communicates the business value and goals.
  2. Conduct Key Stakeholder Interviews: Engage with data owners, data users, and technical stakeholders to gather insights. Identify current and future metadata needs, including the metadata management tools required for effective use and the types of data catalogs, data discovery processes, and data access protocols needed to support effective metadata management.
  3. Assess Existing Metadata Sources: Review the organization’s existing metadata management solutions, data catalogs, and information architecture. This assessment identifies issues within current data warehouses, data lakes, and modern data stacks. Detailed interviews with IT staff and an evaluation of metadata capture processes provide insights for further improvements.
  4. Develop Future Metadata Architecture: Establish a future-oriented metadata model that aligns with the organization’s goals. This phase includes recommendations for metadata management software, the metadata delivery architecture, and master data management structures. Ensure metadata security, especially with customer data and data privacy regulations, and develop metadata management best practices to maintain data integrity.
  5. Implement a Phased Strategy: Plan a phased approach to the implementation of the Managed Metadata Environment (MME), transitioning from the current state to the desired future state. This includes using active metadata to support data-driven decisions, machine learning for automating metadata processes, and good metadata management practices to improve data efficiency and regulatory compliance.

Metadata and Data Governance

Metadata and data governance are the two enterprise data management components whose success is dependent on their companion discipline. 

Data governance establishes policies and standards for metadata as well as data, and business data stewards (insert data stewardship anchor article here) manage the execution of those policies and standards. 

A metadata strategy and implementation plan must include active enterprise data governance and formal data stewardship, following best practices.

Metadata Standards

There are two types of metadata standards, industry (consensus), and international.  In most cases, the international standards serve as the basis for the industry (consensus) standards. 

Active Organizations in Metadata Standards

Key organizations involved in setting metadata standards include:

  • ANSI (American National Standards Institute)
  • ISO (International Organization for Standardization)

The most active organizations in metadata include ANSI (American National Standards Institute) and ISO (International Organization for Standardization). The core metadata registry standard is ISO/IEC 11179 (and its various parts) states the international standards for a schema for recording both the meaning and technical structure of the data for unambiguous usage by humans and computers (registries and repositories).

Dublin Core Metadata Terms

The Dublin Core metadata terms are a set of vocabulary terms that can be used to describe resources for the purposes of discovery. The original set of 15 terms, known as the Dublin Core Metadata Element Set, is endorsed in several standards, including:

  • IETF RFC 5013
  • ISO Standard 15836-2009
  • NISO Standard Z39.85

Industry Standards Based on International Standards

Many industry-specific standards are built upon internationally accepted standards. Key industry standards include:

  • OMG specifications: OMG is a nonprofit consortium of computer industry leaders dedicated to the definition, promotion, and maintenance of industry standards for interoperable enterprise applications. OMG is the creator of the CORBA middleware standard and has defined the other meta-data-related standards:
    • Common Warehouse Metadata (CWM): Specifies the interchange of meta-data among data warehousing, BI, KM, and portal technologies.
    • Information Management Metamodel (IMM): The next iteration of CWM, now under OMG’s direction and development attempts to bridge the gap between object-oriented tech, data, and XML while incorporating CWM.
    • MDC Open Information Model (OIM): A vendor-neutral and technology-independent specification of core metadata types found in operational, data warehousing, and knowledge management environments.
    • The Extensible Markup Language (XML): The standard format for the interchange of metadata using the MDC OIM.
    • Ontology Definition Metamodel (ODM): A specification for formal representation, management, interoperability, and application of business semantics in support of OMG vision model-driven architectures (MDA).

Metadata in Data Warehousing and Business Intelligence

Effective metadata management is crucial for the success of any data integration effort, including data warehouses. Without proper business and technical metadata, a data warehouse cannot fully support data teams in managing and analyzing an organization’s data assets. Metadata provides the rich context needed to understand data sources, identify what data should be extracted, determine how to transform and cleanse data, and allow stakeholders to analyze the information in the target system.

As William H. Inmon, the father of data warehousing, stated:

Simply from the standpoint of who needs help the most in terms of finding one’s way around data and systems, it is assumed the DSS analysis community requires a much more formal and intensive level of support than the information technology community.  For this reason alone, the formal establishment of ongoing support of metadata becomes important in the data warehouse environment.

In a modern data stack, metadata collection is essential not only for understanding what data is available but also for knowing where it exists in the data catalog. When an analyst begins an assignment, the first task is to identify relevant data within the data warehouse, making metadata vital for the preparatory work of the DSS analyst.

Types of Metadata in Data Warehousing/Business Intelligence

The following are key types of metadata used in data warehousing and business intelligence environments:

  • Source-to-target mapping of data in the warehouse
  • Extract and transformation history (tracking the flow of data)
  • Alias information for data elements, supporting easier classification and identification
  • Data warehouse status (database and ETL process tracking)
  • Volume statistics for the warehouse, useful for performance optimization
  • Aging and purging criteria for historical data management
  • Summary and calculation metadata across warehouse levels
  • Data relationship artifacts, including the historical lineage of data assets
  • Data ownership and stewardship details for accountability and compliance
  • Data access and security information to ensure correct data handling and protection

By adhering to metadata standards and employing the right metadata management tools, organizations can classify data, ensure data integrity, and manage their digital resources more efficiently. This ultimately enhances the ability of data teams to make data-driven decisions, while complying with regulatory requirements and protecting sensitive data assets.

Metadata Environment Development

The managed metadata environment (MME) represents the architectural components, people, and processes that are required to properly and systematically gather, retain, and disseminate metadata throughout the enterprise.  The MME encapsulates the concepts of metadata repositories, catalogs, data dictionaries, and any other term that people have thrown out to refer to the systematic management of metadata.

Foundations Img 2

Figure 1 – Managed Metadata Environment (MME)

There are a variety of products on the market that can be combined to create a robust, enterprise-managed metadata environment. The selection of the right products for an organization depends on the result of the metadata strategy, which is part of the enterprise’s data strategy.

The Critical Role of Metadata in Enterprise Data Management and Governance

Metadata is one of the foundational components of any successful enterprise data management initiative and is the companion to every effective data governance program.  Metadata’s importance to an organization’s advancement in all its operations is essential and business areas as well as information technology efforts will benefit from managing metadata according to best practices and industry standards.