Affiliated with:

Reference and Master Data Management

Reference and master data provide the contextual capabilities for transactional data.  Managing reference and master data enables organizations to understand operational data and analyze disparately collected data effectively.

Reference and Master Data is the collection of generally non-transactional data that gives context to transactions, and provides connection points between and among related data in different records, files, tables and other storage formats.  Usually grouped together when discussed, reference data and master data are distinct subsets of the domain “reference and master data.”  Reference and Master Data Management is a component of enterprise data management.

Reference data is data used to classify or categorize other data.  Examples of reference data include code lists, taxonomies, and hierarchies of data, etc.  Most reference data sets change slowly.  Metadata about reference data sets may document:

  • The meaning and purpose of each reference data value domain
  • The reference tables and databases where the reference data appears
  • The source of the data in each table
  • The version of the reference data that is currently available
  • Reference data last update date
  • Maintenance description for the reference data
  • Business data stewardship information for the reference data

 Master data is data about the business entities that provide context for business transactions.  Unlike reference data, master data values usually are not limited to pre-defined domain values.  Business rules typically dictate the format and permitted ranges of master data values.  Common organizational master data includes data about:

  • Categories such as individuals, organizations, and their roles, including customers, citizens, patients, vendors, suppliers, business partners, competitors, employees, students, etc.
  • Products, internal and external, inventory, and related concepts.
  • Financial structures, including general ledger accounts, cost centers, profit centers, etc.
  • Location concepts, for the organizations and individuals and other entities that concern the enterprise.

Master Data Management is the process of defining and maintaining how master and reference data will be created, integrated, maintained, and used throughout the enterprise.  It is a complex endeavor and requires the inclusion of several data management disciplines (data governance, metadata management, data integration, data quality) and the use of all the areas of enterprise architecture. 

Types of Data

Organizations of every size, industry and focus have a variety of data collections in use and storage.  This variety can be classified into the following categories:

  • Unstructured — Data found in e-mail messages, reference papers, magazine / journal articles, intranet portals, product descriptions, marketing collateral, etc.
  • Transactional — Data related to sales, deliveries, invoices, trouble tickets, claims, and other monetary and non-monetary interactions.
  • Metadata — Data describing data (giving raw data its meaning and context), may reside in a formal repository or in various other forms such as XML documents, report definitions, column descriptions in a database, log files, connections, and configuration files.
  • Master — Master data are the critical nouns of a business, and generally fall into four groupings: people (e.g., customer, employee, vendor, etc.), things (e.g., product, item, widget, etc.), places (e.g., office locations and geographic divisions), and concepts (e.g., contract, claim, account, etc.)  The granularity of domains is determined by the magnitude of differences between the attributes of the entities within the domain.

All these types of data are part of any organization’s information systems; master data enables the connection between transactions to allow for the management of records that should be connected across disparate files, tables, applications.

Reference Data Management

Managing reference data properly is important to any organization since reference data carries the context of data transactions through its semantic content (code value descriptions, location data, and other contextual information).  Reference data can be used to drive business logic that helps execute a business process, designate an application to perform specific actions, or provide meaningful segmentation to analyze transaction data.  Also, mapping reference data often requires human judgment, so the need for intervention by business data stewards in the reference data management process cannot be overlooked.

Reference data management is important for several reasons.  Reference data:

  • Describes the structures used in the organization (internal department codes, internal product codes, employee organization codes, internal location codes, etc..)
  • Describes the common data used in organizations that are external to the organization that connect to the organization (e.g., geographical, currency, country, diagnosis coding structures)
  • Provides assistance and support to analytics and business intelligence (e.g., classification codes)

Organizations that use data entry heavily, including healthcare, insurance, and government entities, experience significant data quality challenges due to improper coding of reference data values.  These errors can be quite costly, in several ways.  Additionally, many organizations rely on hundreds of individually developed reference files or tables, and each instance requires updating and periodic quality review.  Since most organizations do not have sufficient staff to perform the reference data tasks, these activities may not be performed; therefore the reference data is out-dated, causing errors in application performance and data integration.

Reference data should be managed centrally, by a team that is supervised by the Data Governance organization or the Master Data Management (MDM) unit, as part of Enterprise Data Management.  The Reference Data team should have the following responsibilities:

  • Manage Internal Reference data: Internal reference data applies to business concepts that are specific to the organization.  Managing internal reference data requires a federated approach, since it is created and managed by many different business data stewards.  The central reference data team must ensure that groups accountable for internal reference data use a standardized approach to reference data creation and management.
  • Manage External Reference data: External reference data is maintained by authorities outside the enterprise (e.g., ISO, government agencies, etc.).  External reference data must be discovered, selected, understood, and captured appropriately.  Standardized approaches for these processes must be developed and implemented to ensure that external reference data is organized properly and updated regularly.

Master Data Management

Master Data Management is the process of defining and maintaining how master data will be created, integrated, maintained, and used throughout the enterprise.  The challenges of MDM are:

 1) to determine the most accurate data values from among potentially conflicting data values, and

2) to use the most accurate values instead of other, less accurate data.  Master data management systems attempt to determine the most accurate data values and make that data available wherever needed.

Master Data Management Process

Organizations approach master data management (MDM) programs with trepidation, since these initiatives involve multiple applications and include several areas of enterprise data management: data governance, data integration, metadata management, data architecture.  Additionally, MDM requires selection of some technical products to support these processes, and the optimal choices depend on the organization’s platform and skills expertise.

However, the actual MDM process can be explained simply in the following steps:

1.) Examine: Understand all possible sources and the current state of data quality (data and metadata) in each source.  Record results in a comprehensive data mapping document.

2.) Join: Collect the master data into a central database and link it to all participating applications

3.) Administer: Cleanse, de-duplicate the master data.  Manage it according to business rules

4.) Distribute: Connect the central master data with enterprise business processes and the appropriate applications.  Ensure regular updates to the MDM / applications

5.) Promote: use the master data in all appropriate business intelligence and analytics reporting across the organization, at all levels

Image 480

Figure 1: Basic Master Data Management Process

Identifying Master Data

Many enterprises struggle when attempting to identify the categories of master data.  Again, master data is the collection of data that shared by multiple parts of the organization (not reference data) and that combines to give context to transaction data. 

In general, most organizations can identify their master data in these four categories:

  • Parties: those who transact business with the enterprise, including customers, prospects, suppliers, and external partners.
  • Things: what the enterprise produces, such as products and services.
  • Places: actual places and how they are segmented, such as geographies, locations, subsidiaries, sites, and zones.
  • Financial and Organizational: reporting and accounting categories, including organization structures, sales territories, chart of accounts, cost centers, business units.

Identifying the master data in each category can be a lengthy process, and the master data should be modeled according to standards so the master data is represented appropriately.  Since master data is shared across applications and business units, it must be profiled for each application and business unit, with the relevant business and technical metadata captured for each instance of the master data in a comprehensive data map, including business rules.

Master Data Management Architecture

An organization cannot simply purchase an MDM application and consider its master data management system complete.  Rather, it must consider its current information technology (IT) platform, it current and future business needs and technical requirements, and design an appropriate MDM architecture.  Generally, there are four common MDM architectural styles: local MDM, hub MDM, federated database MDM, and hybrid MDM. 

  • Local MDM: use as a starting point for an MDM approach, or for a single business unit
  • Hub MDM: use to connect discrete business operations for MDM architecture
  • Federated Database MDM: use with existing master data systems (e.g., enterprise resource planning and / or customer relationship management systems) and large centralized IT environments
  • Hybrid MDM: use in complex IT environments with multiple existing MDM systems – variety of architectural approaches

Master Data and Data Governance

As defined in several authoritative sources, including “Foundations of Data Governance“, Data governance is the planning, oversight, and control over management of data and the use of data and data-related resources, and the development and implementation of policies and decision rights over the use of data.  Data governance addresses the need for managing the complexity of enterprise data management, including master data.

The management of master data allows organizations to correct data inconsistencies across business units and applications and apply uniform business rules to improve cross-application / cross-unit data sharing.  Applying the policies of data governance to master data allows organizations to create and enforce policies and standards for data and information that apply throughout the enterprise.

 Master Data and Data Quality

One of the biggest challenges that organizations face when embarking on a master data management initiative is the state of the data’s quality, since the cross-application / cross-unit nature of master data is an invitation to anomalies in accuracy, completeness, consistency, validity, etc…  Almost all MDM programs include a data quality component, and many MDM programs are begun to address data quality issues in one or more of the master data categories. 

Since data quality, in general, is a challenge for many organizations, any MDM program that includes data quality must acknowledge that the data quality effort will focus on the master data only, and that the MDM data quality processes and remediation efforts will apply to the master data category under analysis until completion.  This concentrated focus is necessary so the master data quality efforts have sufficient resources and attention for success, since distractions can arise, take the resources into non-MDM areas and reduce the effectiveness of their MDM data quality efforts.  Once the master data has been cleansed to an appropriate level, those resources can be assigned to non-MDM data quality projects.


Reference and master data are essential components of enterprise data management.  They provide the contextual capabilities for transactional data.  Managing reference and master data can involve complex processes but doing so enables organizations to understand operational data and analyze disparately collected data effectively.


Anne Marie Smith, Ph.D.

Anne Marie Smith, Ph.D. is an internationally recognized expert in the fields of enterprise data management, data governance, data strategy, enterprise data architecture and data warehousing. Dr. Smith is a consultant and educator with over 30 years' experience. Author of numerous articles and Fellow of the Insurance Data Management Association (FIDM), and a Fellow of the Institute for Information Management (IIM), Dr. Smith is also a well-known speaker in her areas of expertise at conferences and symposia.

© Since 1997 to the present – Enterprise Warehousing Solutions, Inc. (EWSolutions). All Rights Reserved

Subscribe To DMU

Be the first to hear about articles, tips, and opportunities for improving your data management career.