Metadata management is one of the foundational components of an enterprise data management initiative. Many organizations struggle with its incorporation into their data management processes.
Metadata Management’s Relationship to Data Management
Data Management is one of the hottest topics in our industry as Global 2000 companies and large government agencies are beginning to understand that without accurate, timely, and well-understood data they cannot realize the benefits of advanced analytics, big data, mobile analytics, data lakes, and the vast reservoir of data opportunities from the internet of things (IoT).
The practice of metadata management is foundational to every aspect of data management. Imagine trying to build a sustainable data governance practice without metadata management. Data stewards spend most of their time working with metadata and a smaller amount of time on data. Without proper metadata management, these stewards would be limited to working with only Sharepoint, Excel spreadsheets, Word documents, and a bunch of non-automated processes to accomplish their vital tasks.
Master data management needs metadata management as much as data governance does. In a master data management program, we need to have automated and accurate metadata on the systems, data movement, data stores, and golden records in our information technology (IT) environment. Every component of enterprise data management has deep connections to metadata and its management including:
- Business analytics: Data definitions, reports, users, usage, performance.
- Business architecture: Roles and organizations, goals, and objectives.
- Business definitions: The business terms and explanations for a particular concept, fact, or other item found in an organization.
- Business rules: Standard calculations and derivation methods.
- Data governance: Policies, standards, procedures, programs, roles, organizations, stewardship assignments.
- Data integration: Sources, targets, transformations, lineage, ETL workflows, EAI (enterprise application integration), EII (enterprise information integration), migration / conversion.
- Data quality: Definitions, defects, metrics, ratings.
- Document content management: Unstructured data, documents, taxonomies, ontologies, name sets, legal discovery, search engine indexes.
- Information technology infrastructure: Platforms, networks, configurations, licenses.
- Logical data models: Entities, attributes, relationships and rules, business names and definitions.
- Physical data models: Files, tables, columns, views, business definitions, indexes, usage, performance, change management.
- Process models: Functions, activities, roles, inputs / outputs, workflow, business rules, timing, stores.
- Systems portfolio and IT governance: Databases, applications, projects and programs, integration roadmap, change management.
- Service-oriented architecture (SOA) information: Components, services, messages, master data.
- System design and development: Requirements, designs and test plans, impact analysis.
- Systems management: Data security, licenses, configuration, reliability, service levels.
Understanding Metadata Management
The field of metadata management can be intimidating. There are many best practices and terminology that needs to be understood to effectively function in this line of work. The goal of this article is to present the key concepts and the basic best practices of metadata management so that you have a solid foundation on this valuable topic.
The classic definition of metadata is “data about data.” Unfortunately, this definition is limiting as metadata is about much more than “data about data”.
Metadata is a type of data that digitally describes the who, what, when, where, why, and how of an organization’s data, processes, applications, assets, business concepts, and/or other things of interest.
More simply, metadata provides the context to the content of our digital assets.
From this definition, we can see that metadata is a type of data. Like data, metadata is a set of digitized values, facts or information that provides knowledge. This knowledge looks to answer the who, what, when, where, why, and how. The 5 Ws & 1 H (who, what, when, where, why, and how) table provides definitions for each type of knowledge.
|Who||Refers to person(s)|
|What||Used as a request for specific information; to inquire about the character, occupation, etc. of a person or thing; to inquire as to the origin, identity, etc. of something; to inquire as to the worth, usefulness, force, or importance of something; or to refer to how much something is.|
|When||Refers to time, a time period, how long ago, how soon, under what circumstances or upon what occasion.|
|Where||Refers to a thing or person that is in or at a place, part, point, etc..|
|Why||Used to ask for what, for what reason, cause, or purpose.|
|How||Used to refer what way or manner; by what means; to what extent, degree, etc.; in what state or condition?; for what reason; why?; to what effect; with what meaning?; a question concerning the way or manner in which something is done, achieved, etc.; a way or manner of doing something.|
Table 1: 5 W’s & 1 H Description
It’s important to understand that metadata management has two distinct but related uses. It is valuable to the information technology (IT) department of a company and to the business side of the organization as well.
|Knowledge Type||Technical Example||Business Example|
|Who||Who is the programmer responsible for a specific data movement process?||Who is the chief data steward of the CUSTOMER subject area?|
|What||What is the data lineage between our customer system and our enterprise data warehouse?||What field on our analytics shows the profitability of our products?|
|When||When do the extraction, transformation and load (ETL) jobs run and what is each job’s dependencies?||When was the data that I am analyzing last refreshed?|
|Where||Where in our IT environment are there servers operating at less than 40% of capacity?||Where do we have a report that shows our social media analytics by marketing campaign?|
|How||How do we setup security privileges for a new analyst?||How do we calculate the fields on our key report?|
|Why||Why are we experiencing more errors in the quality of our data?||Why are we missing some customers in our analytical reports?|
Table 2: 5 W’s & 1 H Business and Technical Examples
Managed Metadata Environment
The managed metadata environment (MME) represents the architectural components, and processes that are required to properly and systematically gather, retain, and disseminate metadata throughout the enterprise. The MME encapsulates the concepts of metadata repositories, catalogs, data dictionaries and any other term that people have thrown out to refer to the systematic management of metadata.
Figure 1: Managed Metadata Environment (MME)
Meta Model & Metadata Model
The terms meta model and metadata model are synonymous. They both refer to the physical model that is designed to store the metadata. A meta model looks much like the data models that most of us are familiar with. They have elements (metadata elements instead of data elements), tables and relationships. In fact, almost all the best modeling practices for data are equally applicable to metadata modeling as well. Even with the similarities, the modeling of metadata does pose some challenges that very few data modelers are familiar with.
The 4 Characteristics of a Meta Model
A great meta model has 4 key characteristics. It is generic, integrated, current, and historical.
Generic means that the physical meta model looks to store metadata by metadata subject area as opposed to application-specific. For example, a generic meta model will have an attribute named “DATABASE_PHYS_NAME” that will hold the physical database names (a metadata subject area) within the company. A meta model that is application-specific would name this same attribute “ORACLE_ACCTREC_PHYS_NAME”. The problem with application-specific meta models is that metadata subject areas expand their scope and can even change over time. To return to our example, today Oracle may be our company’s database standard. Tomorrow we may switch the standard to SQL Server for cost or compatibility advantages. This situation would cause needless additional changes to the change to the physical meta model. Further, we should not have application-specific names in to meta model like ACCTREC (i.e. Accounts Receivable). It has inputs (data coming in), processes and outputs (data coming out) just like any other system. Therefore, there is no reason to have our meta model have application-specific names for our attributes or tables as this is limiting and a poor meta modeling practice.
A meta model provides an integrated view of the enterprise’s major metadata subject areas. Suppose we need a meta model that holds business definitions for our data elements and captures technical data lineage. Meta modelers make the mistake of putting the business metadata (definitions) in a separate set of tables and the technical metadata in a different set of tables without any relationships.
As a result, if the business is considering adding a new “customer type”, the metadata team can’t query the data lineage related metadata in the model to see what data elements would be impacted by this business decision. This severely limits the power that metadata management can provide.
The best practice of having an integrated meta model is missed by the vast majority of organizations as they chose to implement many smaller metadata management solutions, rather than an enterprise-wide metadata management effort.
A fundamentally sound meta model contains metadata that relates to both the current environment and the future/planned environment. Metadata management is very valuable in understanding and managing our current business and technical landscape; however, it can also play a central role in our organization’s future plans. For example, let’s assume that our company is considering a migration to a new ERP (enterprise resource planning) vendor. It would be very valuable to query the meta model to see how many of our current data elements are already available in the new ERP vendor’s solution.
Lastly, meta models are historical as a good meta model will include historical views of the metadata, even as it changes over time. This allows a corporation to understand how their business has evolved over the years. This is especially critical if the MME is supporting an application that contains historical data, like a data warehouse or an advanced analytics application. For example, if the business metadata definition for “customer” is “anyone that has purchased a product from our company within one of our stores, website or through our catalog”. A year later a new distribution channel is added to the strategy. The company now allows customers to order products through an app on their phone. At that point in time, the business metadata definition for customer would be modified to “anyone that has purchased a product from our company, website, within one of our stores, through our mail order catalog or through our company app”. A fundamentally sound meta model stores both definitions because they have validity, depending on what data you are analyzing (and the age of that data).
This is the industry’s first widespread term to refer to the metadata management system. The term refers to the meta model and typically a management software package that may have been purchased. It is one of the six components of the MME.
Types of Metadata
There are two types of metadata that the MME will contain, technical and business.
Technical metadata provides the developers, DBA (database administrators), technical users, and other IT staff members the metadata they need to maintain, grow, and effectively manage an organization’s IT environment.
Technical metadata is absolutely critical for the ongoing maintenance and growth of the warehouse. Without technical metadata the task of analyzing and implementing changes to a decision support system is significantly more difficult and time consuming.
|Examples of Technical Metadata|
|Physical, logical, and conceptual data element names, domain values, data quality rules, and formats|
|Physical, logical, and conceptual table/file names, keys. and indexes|
|Physical, logical, and conceptual table/entity relationships|
|The physical flow of data within an IT environment|
|Physical, logical, and conceptual data models|
|Audit controls, and balancing information|
|The structure of data|
|Application names and boundaries|
|Mappings, extractions, transformations, and loads of data between applications|
|Encoding/reference table conversions|
|The relationship between models|
|History of data extractions and replications|
|User access patterns, frequency, and execution time of reports/queries|
|Table/data element access patterns|
|Technical business rules|
|Application archiving (e.g. data warehouse, data lake)|
|Program, and job names and descriptions|
|Security criteria, constraints, and measures|
Table 3: Examples of Technical Metadata
The table above gives us a solid example of technical metadata. The following table will give more examples of technical, except with an IT portfolio management focus.
|Examples of Technical Metadata for IT Portfolio Management|
|Hardware assets (mainframes, Unix, servers, PCs, etc.)|
|Hardware configurations (disk space, memory, processor types, etc.)|
|Hardware costs (purchase price, leasing fees, maintenance fees, etc.)|
|Software license expiration dates|
|Software installations (purchase price, leasing fees, maintenance fees, etc.)|
|Installed software patches|
|System technology rankings (E = evolving, S = stable, A = aging, O = obsolete/unsupported)|
|Project costs (estimates and actuals) (internal, consulting, hardware, and software)|
|Project success rates|
|Project staffing (internal, vendors, temporary, etc.)|
|Project estimated date of completion, actual date of completion, etc.|
|Project business justification and scope|
|Hardware assets (mainframes, Unix, servers, PCs, etc.)|
|Hardware configurations (disk space, memory, processor types, etc.)|
Table 4: Examples of Technical Metadata for IT Portfolio Management
It is important to note that a reference guide lists a third type of metadata called Operational Metadata. Operational Metadata refers to metadata that a data warehouse team may add during the ETL process and which is designed to help the ETL process. Examples of operational metadata include ETL Load Date, Update Date, Load Cycle Identifier, Current Flag Indicator, Operational System(s) Identifier, Active in Operational System Flag and Confidence Level Indicator. Operational metadata should not be classified as a 3rd type of a metadata as it is a type of technical metadata.
Business metadata is the link between IT applications and business users as it provides the semantic layer that helps business professionals locate, understand, and effectively utilize the organization’s data. Business metadata provides these users with a roadmap for access to the data in the data warehouse, analytical engines, sales systems, ERP applications, data lakes, big data stores, websites, and all the other applications in the IT environment.
|Examples of Business Metadata|
|Common access routines for the data in the warehouse/mart, mobile apps, data lake, analytical engine, ERP, sales system, websites, social media sites, etc.|
|Table names, and definitions in business terms|
|Data element names, domain values, and definitions in business terms|
|Application field mappings, transformations, and summarization|
|Rules for drill down, drill up, drill across, and drill through|
|Data steward policies, standards, procedures, workflows, and decision trees|
|Organization hierarchies and contact trees|
|Other hierarchies (e.g. product)|
|Various security policies and risk management procedures|
|Data warehouse refresh dates|
Table 5: Examples of Business Metadata
Other Key Metadata Management Terms and Concepts
A business glossary provides a listing and sometimes a hierarchy, of the key business concepts of an organization in a common vocabulary. Often it will contain definitions, rules, and polices.
Table 6: Simple Business Glossary Example
Data Dictionary/Data Glossary
A data dictionary (aka data glossary) can be said to be a business glossary designed for an organization’s IT staff. It would show a listing of the key business concepts and their associated technical instantiations in a common vocabulary.
Table 7: Simple Data Dictionary Example
Data heritage represents the metadata about the original source of the data. For example, the data heritage of a business element called “Customer Name” could be “a salesperson types in the customer’s name in the Salesforce system”.
Data lineage represents information about everything that has “happened” to the data within an organization’s environment. Whether the data was moved from one system to another, transformed, aggregated, etc., ETL (extraction, transformation, and load) tools can capture this metadata electronically.
Learning these foundational concepts that are used in the enterprise-wide management of metadata in an organization in an essential part of every data management professional’s development. Follow these concepts and keep them present build an effective data management program.