Metadata management should start with the identification of business subject areas to enable prioritization of all enterprise data management initiatives and related projects.
Metadata is commonly defined as “data about data.” In fact, a more accurate definition might be “information about information assets.” These information assets may be data, processes, applications, or information technology. Metadata may be “business metadata” defining business terms. It may also be “technical metadata”, providing detailed technical specifications. Metadata may inventory your production information assets, or it may describe the design of information assets in planning or development.
Metadata is the key to managing information assets, just as information about money is essential to managing money. Metadata provides the context that transforms data into information, guiding more effective reporting and analysis. Metadata is the glue that enables enterprise integration and the foundation of data governance.
Different purposes benefit from different types of metadata. Business users use metadata to understand available data and the possible conclusions the data may support. Technical users need metadata to develop systems that deliver high quality data. Metadata may support different kinds of initiatives, including data warehousing and business intelligence (data warehousing) projects, systems integration projects, infrastructure support efforts, and enterprise architecture initiatives.
Metadata management seeks to provide business and technical users with easier access to integrated, high quality metadata. However, because there are so many different types of metadata, trying to manage all types of metadata at once will only frustrate the organization, exhausting its efforts without satisfying its needs. Organizations need to prioritize the different types of metadata, focusing their strategy and efforts to manage successfully the types of metadata needed to support critical business needs. Many organizations choose to develop a metadata model, starting with a metadata subject area model.
The subject area model (enterprise data modeling) shown below defines 16 different subject areas. The colors depict the phased implementation sequence one organization chose for inclusion in its enterprise metadata repository.
Figure 1: Metadata Subject Areas (Prioritized for One Organization)
Several of these subject areas are “core” subject areas that describe data and its use, and are more commonly found within an integrated metadata repository:
- Data Structures Metadata – data about physical files, relational database tables and columns, indexes and other database control objects. Data Structures Metadata are the specifications within physical designs for data structures in development, test or production. Data Structures Metadata also includes data about usage, performance, and change management. Data Structures Metadata may describe transactional databases, data warehouses or data marts. For relational databases, the definitive source for Data Structures Metadata is the relational database management system catalog. Until implemented, the best alternative sources are the Data Definition Language (DDL) specifications or the physical design model created using a data modeling tool. For flat files, the definitive source for Data Structures Metadata may be a COBOL copybook.
While most metadata in this subject area is technical metadata (describing technical names, lengths and physical data types, the subject area also includes the business name equivalents for these technical table and column names and their business definitions. In terms of the Zachman Framework, Data Structures Metadata define the cells in Column One/Row 4 and Column One/Row 5. Design, test and production metadata represent separate mini-subject areas within the overall Data Structures Metadata subject area. This is the most widely used metadata subject area. The meta models for virtually all metadata repository tools support this central, primary subject area.
- Data Modeling Metadata – data about business entities, their attributes, subject areas, the relationships between entities and the business rules governing these relationships. Data Modeling Metadata includes the business names and business definitions for these entities and attributes. Data Modeling Metadata define the cells in Rows 1-3 of Column One in the Zachman Framework. The definitive sources for Data Modeling Metadata are the conceptual and logical data models created using data modeling tools. The Data Modeling Metadata subject area also includes metadata that relates equivalent data entities and attributes across different data models, and relates these models to metadata in the Data Structures Metadata subject area describing the physical implementation of conceptual and logical data models. This “linkage metadata” may be recorded in extensions to the data models or entered independently into the metadata repository once the metadata from each data model has been extracted and loaded into the metadata repository.
- Data Stewardship Metadata – data about data stewards, data governance organizations, data steward responsibility assignments for data modeling metadata about subject areas, business entities and/or attributes, and data steward responsibility assignments for the quality of data in databases, relational tables and columns. This metadata may be entered directly into a metadata repository or loaded from other sources.
- Data Integration Metadata – data about the mappings and transformations used in Extract-Transform-Load (ETL) programs to migrate data from sources to targets. Data Integration Metadata defines the lineage and heritage of data as it moves from one data store to another. Data Integration Metadata also defines how data is converted, how enterprise application interface (EAI) tools transform data as it is obtained from and delivered to different organizations, and how Enterprise Information Integration (EII) tools provide transparent integration of data stored in different and disparate databases. This metadata is sourced from ETL tools, EAI tools and EII tools. This metadata is especially important in developing and maintaining data warehouses, data marts and business intelligence environments.
- Business Intelligence Metadata – data about business intelligence views and interfaces, queries, reports and usage. Metadata in this subject area is sourced from business intelligence tools and related to Data Structures Metadata.
Some subject areas provide other useful data about related aspects of data assets:
- Reference Values Metadata – data about controlled vocabularies and defined domain values, including valid internal & external codes, names/labels and business meanings. Reference data values appear in transactional data, and the business definitions for these values provide context for transactions, transforming this data into meaningful information. Some of this business metadata is sourced from external standards organizations, while other reference data values may be controlled through master data management databases.
- Data Quality Metadata – statistical data about the quality of information, including the number of defects found at given points in time, summarized to create overall data quality metrics. This subject area is closely linked to the Data Structures Metadata subject area, but the definitive source for this metadata is typically a data quality profiling and analysis tool.
- Data Security Metadata – data about how data access is controlled through security classifications, privileges, users and group, filters, authentication, security audits and privacy rule compliance.
- Application Data Access Metadata – data about how views, queries and application programs access data stored in data structures, and how this data is presented in reports, screens, web pages, XML documents, spreadsheets and flat files representing inputs to and output deliverables from business processes. Metadata in this subject area is sourced primarily from application program specifications.
- Content Management Metadata – metadata about unstructured data found in documents, including taxonomies, ontologies, XML name sets, search engine key words, indexes, and parameters for legal electronic discovery. The metadata that controls enterprise content management (ECM) may be integrated into a comprehensive enterprise metadata repository.
Other metadata subject areas describe closely related information technology assets (such as application systems, software components and technology infrastructure) and business knowledge assets (such as business processes, organizations, goals and projects) which may also be found within an integrated metadata repository, including:
- Legacy Systems Metadata – data about legacy application program software code modules, the logic defined within the code and their relationships between modules, supporting impact analysis, restructuring, reuse and compartmentalization. Metadata in this subject area is sourced primarily from reverse engineering tools.
- Application Development Metadata – data about application business and technical requirements, application design (including UML specifications), service oriented architecture (SOA), contemporary application programming objects (Java code, EJB, legacy system wrappers, web services), test plans and test data. Much of this metadata is found within integrated application development environments, design tools, testing tools, and software configuration management (SCM) libraries. Some metadata repositories integrate this metadata with metadata from other subject areas and sources for more comprehensive impact analysis, change control, and reuse. Much of this metadata may be found in unstructured formats, such as text documents; a structured metadata repository database may be supplemented with a document library for unstructured metadata.
- Process Modeling Metadata – data about processes at any level (including functions, activities, tasks and steps), workflow dependency relationships between processes, business rules, events, roles and responsibilities, and the input / output relationships between processes and deliverables defined in Data Access Metadata. Metadata in this subject area is sourced primarily from process modeling tools. Process Modeling Metadata is essential to any comprehensive understanding of enterprise architecture.
- Portfolio Management Metadata – data about goals, objectives, strategies, projects, programs, costs and benefits, organizations, resources, the relationships between applications and the business processes they support, and the alignment of projects with business strategies. Portfolio Management Metadata is also an important aspect of enterprise architecture.
- Technology Architecture Metadata – data about technology standards, including preferred hardware and software products, protocols, and configurations.
- Technology Inventory Metadata – data about implemented infrastructure, including hardware, software, networks and their configurations. The metadata in this subject area is often maintained and sourced from systems management tools.
The metadata that may be of most value to the enterprise is the metadata connecting these subject areas, defining the relationships between data, process, applications and technology. This data may not be stored in any traditional metadata source tool and instead may need to be captured separately, perhaps through direct entry by subject matter experts into the integrated metadata repository.
The matrix below identifies the metadata subject areas of most interest to different kinds of projects.
Figure 2: Metadata Subject Areas – Usage in Different Types of Initiatives
When an organization defines its metadata management strategy, planners should identify the most critical business needs for metadata and focus the organization’s efforts on managing the metadata within their highest priority metadata subject areas. Don’t boil the ocean – focus on managing metadata with the most business value.