Affiliated with:

meta bann

Metadata management is one of the foundational components of an enterprise data management initiative.  Many organizations struggle with its incorporation into their data management processes.

Metadata Management’s Relationship to Data Management

Data Management is one of the hottest topics in our industry as Global 2000 companies and large government agencies are beginning to understand that without accurate, timely, and well-understood data they cannot realize the benefits of advanced analytics, big data, mobile analytics, data lakes, and the vast reservoir of data opportunities from the internet of things (IoT).

The practice of metadata management is foundational to every aspect of data management. Imagine trying to build a sustainable data governance practice without metadata management. Data stewards spend most of their time working with metadata and a smaller amount of time on data. Without proper metadata management, these stewards would be limited to working with only Sharepoint, Excel spreadsheets, Word documents, and a bunch of non-automated processes to accomplish their vital tasks.

Master data management needs metadata management as much as data governance does. In a master data management program, we need to have automated and accurate metadata on the systems, data movement, data stores, and golden records in our information technology (IT) environment. Every component of enterprise data management has deep connections to metadata and its management including:[1]

  1. Business analytics: Data definitions, reports, users, usage, performance.
  2. Business architecture: Roles and organizations, goals, and objectives.
  3. Business definitions: The business terms and explanations for a particular concept, fact, or other item found in an organization.
  4. Business rules: Standard calculations and derivation methods.
  5. Data governance: Policies, standards, procedures, programs, roles, organizations, stewardship assignments.
  6. Data integration: Sources, targets, transformations, lineage, ETL workflows, EAI (enterprise application integration), EII (enterprise information integration), migration / conversion.
  7. Data quality: Definitions, defects, metrics, ratings.
  8. Document content management: Unstructured data, documents, taxonomies, ontologies, name sets, legal discovery, search engine indexes.
  9. Information technology infrastructure: Platforms, networks, configurations, licenses.
  10. Logical data models: Entities, attributes, relationships and rules, business names and definitions.
  11. Physical data models: Files, tables, columns, views, business definitions, indexes, usage, performance, change management.
  12. Process models: Functions, activities, roles, inputs / outputs, workflow, business rules, timing, stores.
  13. Systems portfolio and IT governance: Databases, applications, projects and programs, integration roadmap, change management.
  14. Service-oriented architecture (SOA) information: Components, services, messages, master data.
  15. System design and development: Requirements, designs and test plans, impact analysis.
  16. Systems management: Data security, licenses, configuration, reliability, service levels.

Understanding Metadata Management

The field of metadata management can be intimidating. There are many best practices and terminology that needs to be understood to effectively function in this line of work. The goal of this article is to present the key concepts and the basic best practices of metadata management so that you have a solid foundation on this valuable topic.

Metadata Defined

The classic definition of metadata is “data about data.” Unfortunately, this definition is limiting as metadata is about much more than “data about data”.

Metadata is a type of data that digitally describes the who, what, when, where, why, and how of an organization’s data, processes, applications, assets, business concepts, and/or other things of interest.

More simply, metadata provides the context to the content of our digital assets.

From this definition, we can see that metadata is a type of data. Like data, metadata is a set of digitized values, facts or information that provides knowledge. This knowledge looks to answer the who, what, when, where, why, and how. The 5 Ws & 1 H (who, what, when, where, why, and how) table provides definitions for each type of knowledge.

Knowledge TypeDefinition
WhoRefers to person(s)
WhatUsed as a request for specific information; to inquire about the character, occupation, etc. of a person or thing; to inquire as to the origin, identity, etc. of something; to inquire as to the worth, usefulness, force, or importance of something; or to refer to how much something is.
WhenRefers to time, a time period, how long ago, how soon, under what circumstances or upon what occasion.
WhereRefers to a thing or person that is in or at a place, part, point, etc..
WhyUsed to ask for what, for what reason, cause, or purpose.
HowUsed to refer what way or manner; by what means; to what extent, degree, etc.; in what state or condition?; for what reason; why?; to what effect; with what meaning?; a question concerning the way or manner in which something is done, achieved, etc.; a way or manner of doing something.  
  

Table 1: 5 W’s & 1 H Description

It’s important to understand that metadata management has two distinct but related uses. It is valuable to the information technology (IT) department of a company and to the business side of the organization as well.

Knowledge TypeTechnical ExampleBusiness Example
WhoWho is the programmer responsible for a specific data movement process?Who is the chief data steward of the CUSTOMER subject area?
WhatWhat is the data lineage between our customer system and our enterprise data warehouse?What field on our analytics shows the profitability of our products?
WhenWhen do the extraction, transformation and load (ETL) jobs run and what is each job’s dependencies?When was the data that I am analyzing last refreshed?
WhereWhere in our IT environment are there servers operating at less than 40% of capacity?Where do we have a report that shows our social media analytics by marketing campaign?
HowHow do we setup security privileges for a new analyst?How do we calculate the fields on our key report?
WhyWhy are we experiencing more errors in the quality of our data?Why are we missing some customers in our analytical reports?
   

Table 2: 5 W’s & 1 H Business and Technical Examples

Managed Metadata Environment

The managed metadata environment (MME) represents the architectural components, and processes that are required to properly and systematically gather, retain, and disseminate metadata throughout the enterprise.  The MME encapsulates the concepts of metadata repositories, catalogs, data dictionaries and any other term that people have thrown out to refer to the systematic management of metadata.


1 1

Figure 1: Managed Metadata Environment (MME)

Meta Model & Metadata Model

The terms meta model and metadata model are synonymous. They both refer to the physical model that is designed to store the metadata. A meta model looks much like the data models that most of us are familiar with. They have elements (metadata elements instead of data elements), tables and relationships. In fact, almost all the best modeling practices for data are equally applicable to metadata modeling as well. Even with the similarities, the modeling of metadata does pose some challenges that very few data modelers are familiar with.

The 4 Characteristics of a Meta Model

A great meta model has 4 key characteristics. It is generic, integrated, current, and historical.

Generic

Generic means that the physical meta model looks to store metadata by metadata subject area as opposed to application-specific.  For example, a generic meta model will have an attribute named “DATABASE_PHYS_NAME” that will hold the physical database names (a metadata subject area) within the company.  A meta model that is application-specific would name this same attribute “ORACLE_ACCTREC_PHYS_NAME”.  The problem with application-specific meta models is that metadata subject areas expand their scope and can even change over time.  To return to our example, today Oracle may be our company’s database standard.  Tomorrow we may switch the standard to SQL Server for cost or compatibility advantages.  This situation would cause needless additional changes to the change to the physical meta model.  Further, we should not have application-specific names in to meta model like ACCTREC (i.e. Accounts Receivable). It has inputs (data coming in), processes and outputs (data coming out) just like any other system. Therefore, there is no reason to have our meta model have application-specific names for our attributes or tables as this is limiting and a poor meta modeling practice.

Integrated

A meta model provides an integrated view of the enterprise’s major metadata subject areas. Suppose we need a meta model that holds business definitions for our data elements and captures technical data lineage. Meta modelers make the mistake of putting the business metadata (definitions) in a separate set of tables and the technical metadata in a different set of tables without any relationships.

As a result, if the business is considering adding a new “customer type”, the metadata team can’t query the data lineage related metadata in the model to see what data elements would be impacted by this business decision. This severely limits the power that metadata management can provide.

The best practice of having an integrated meta model is missed by the vast majority of organizations as they chose to implement many smaller metadata management solutions, rather than an enterprise-wide metadata management effort.

Current

A fundamentally sound meta model contains metadata that relates to both the current environment and the future/planned environment. Metadata management is very valuable in understanding and managing our current business and technical landscape; however, it can also play a central role in our organization’s future plans. For example, let’s assume that our company is considering a migration to a new ERP (enterprise resource planning) vendor. It would be very valuable to query the meta model to see how many of our current data elements are already available in the new ERP vendor’s solution.

Historical

Lastly, meta models are historical as a good meta model will include historical views of the metadata, even as it changes over time.  This allows a corporation to understand how their business has evolved over the years.  This is especially critical if the MME is supporting an application that contains historical data, like a data warehouse or an advanced analytics application. For example, if the business metadata definition for “customer” is “anyone that has purchased a product from our company within one of our stores, website or through our catalog”.  A year later a new distribution channel is added to the strategy. The company now allows customers to order products through an app on their phone.  At that point in time, the business metadata definition for customer would be modified to “anyone that has purchased a product from our company, website, within one of our stores, through our mail order catalog or through our company app”.  A fundamentally sound meta model stores both definitions because they have validity, depending on what data you are analyzing (and the age of that data). 

Metadata Repository

This is the industry’s first widespread term to refer to the metadata management system. The term refers to the meta model and typically a management software package that may have been purchased.  It is one of the six components of the MME.

Types of Metadata

There are two types of metadata that the MME will contain, technical and business. 

Technical Metadata

Technical metadata provides the developers, DBA (database administrators), technical users, and other IT staff members the metadata they need to maintain, grow, and effectively manage an organization’s IT environment.

Technical metadata is absolutely critical for the ongoing maintenance and growth of the warehouse.  Without technical metadata the task of analyzing and implementing changes to a decision support system is significantly more difficult and time consuming.

Examples of Technical Metadata
Physical, logical, and conceptual data element names, domain values, data quality rules, and formats
Physical, logical, and conceptual table/file names, keys. and indexes
Physical, logical, and conceptual table/entity relationships
The physical flow of data within an IT environment
Physical, logical, and conceptual data models
Data tags
NoSQL structures
Audit controls, and balancing information
The structure of data
Application names and boundaries
Mappings, extractions, transformations, and loads of data between applications
Encoding/reference table conversions
The relationship between models
History of data extractions and replications
User access patterns, frequency, and execution time of reports/queries
Table/data element access patterns
Technical business rules
Subject areas
Application archiving (e.g. data warehouse, data lake)
Job dependencies
Program, and job names and descriptions
Version maintenance
Security criteria, constraints, and measures
Purge criteria
Backup

Table 3: Examples of Technical Metadata

The table above gives us a solid example of technical metadata. The following table will give more examples of technical, except with an IT portfolio management focus.

Examples of Technical Metadata for IT Portfolio Management
Hardware assets (mainframes, Unix, servers, PCs, etc.)
Hardware configurations (disk space, memory, processor types, etc.)
Hardware locations
Hardware costs (purchase price, leasing fees, maintenance fees, etc.)
Software licenses
Software license expiration dates
Software installations (purchase price, leasing fees, maintenance fees, etc.)
Software costs
Installed software patches
System listings
System technology rankings (E = evolving, S = stable, A = aging, O = obsolete/unsupported)
System purposes
System inputs/outputs
Project listings
Project costs (estimates and actuals) (internal, consulting, hardware, and software)
Project success rates
Project staffing (internal, vendors, temporary, etc.)
Project estimated date of completion, actual date of completion, etc.
Project business justification and scope
Project status
Network links
Contact person
Hardware assets (mainframes, Unix, servers, PCs, etc.)
Hardware configurations (disk space, memory, processor types, etc.)
Hardware locations

Table 4: Examples of Technical Metadata for IT Portfolio Management

Operational Metadata

It is important to note that a reference guide lists a third type of metadata called Operational Metadata. Operational Metadata refers to metadata that a data warehouse team may add during the ETL process and which is designed to help the ETL process. Examples of operational metadata include ETL Load Date, Update Date, Load Cycle Identifier, Current Flag Indicator, Operational System(s) Identifier, Active in Operational System Flag and Confidence Level Indicator. Operational metadata should not be classified as a 3rd type of a metadata as it is a type of technical metadata.

Business Metadata

Business metadata is the link between IT applications and business users as it provides the semantic layer that helps business professionals locate, understand, and effectively utilize the organization’s data.  Business metadata provides these users with a roadmap for access to the data in the data warehouse, analytical engines, sales systems, ERP applications, data lakes, big data stores, websites, and all the other applications in the IT environment.

Examples of Business Metadata
Common access routines for the data in the warehouse/mart, mobile apps, data lake, analytical engine, ERP, sales system, websites, social media sites, etc.
Subject areas
Table names, and definitions in business terms
Data element names, domain values, and definitions in business terms
Application field mappings, transformations, and summarization
Rules for drill down, drill up, drill across, and drill through
Data stewards
Data steward policies, standards, procedures, workflows, and decision trees
Data location
Business rules
Business policies
Organization hierarchies and contact trees
Other hierarchies (e.g. product)
Various security policies and risk management procedures
Data warehouse refresh dates

Table 5: Examples of Business Metadata

Other Key Metadata Management Terms and Concepts

Business Glossary

A business glossary provides a listing and sometimes a hierarchy, of the key business concepts of an organization in a common vocabulary. Often it will contain definitions, rules, and polices.

1 122

Table 6: Simple Business Glossary Example

Data Dictionary/Data Glossary

A data dictionary (aka data glossary) can be said to be a business glossary designed for an organization’s IT staff. It would show a listing of the key business concepts and their associated technical instantiations in a common vocabulary.

2 22

Table 7: Simple Data Dictionary Example

Data Heritage

Data heritage represents the metadata about the original source of the data.  For example, the data heritage of a business element called “Customer Name” could be “a salesperson types in the customer’s name in the Salesforce system”.

Data Lineage

Data lineage represents information about everything that has “happened” to the data within an organization’s environment.  Whether the data was moved from one system to another, transformed, aggregated, etc., ETL (extraction, transformation, and load) tools can capture this metadata electronically.

Conclusion

Learning these foundational concepts that are used in the enterprise-wide management of metadata in an organization in an essential part of every data management professional’s development. Follow these concepts and keep them present build an effective data management program.

LinkedIn
Facebook
Twitter

Dr. David P. Marco, LinkedIn Top BI Voice, IDMMA Data Mgt. Professional of the Year, Fellow IIM, CBIP, CDP

Dr. David P. Marco, PhD, Fellow IIM, CBIP, CDP is best known as the world’s foremost authority on data governance and metadata management, he is an internationally recognized expert in the fields of CDO, data management, data literacy, and advanced analytics. He has earned many industry honors, including Crain’s Chicago Business “Top 40 Under 40”, named by DePaul University as one of their “Top 14 Alumni Under 40”, and he is a Professional Fellow in the Institute of Information Management. In 2022, CDO Magazine named Dr. Marco one of the Top Data Consultants in North America and IDMMA named him their Data Management Professional of the Year. In 2023 he earned LinkedIn’s Top BI Voice. Dr. Marco won the prestigious BIG Innovation award in 2024. David Marco is the author of the widely acclaimed two top-selling books in metadata management history, “Universal Meta Data Models” and “Building and Managing the Meta Data Repository” (available in multiple languages). In addition, he is a co- author of numerous books and published hundreds of articles, some of which are translated into Mandarin, Russian, Portuguese, and others. He has taught at the University of Chicago and DePaul University. DMarco@EWSolutions.com

© Since 1997 to the present – Enterprise Warehousing Solutions, Inc. (EWSolutions). All Rights Reserved

Subscribe To DMU

Be the first to hear about articles, tips, and opportunities for improving your data management career.