Affiliated with:

image 40

An organization’s data quality is often the result of poor metadata management.  Improve business and technical metadata and manage it properly to ensure higher quality data and information.

The state of an enterprise’s information depends on its data quality and metadata.  Poor quality data coupled with incorrect interpretation and use of information from an enterprise application is a recipe for failure, since it extinguishes all confidence with the organization’s consumers.  The consequences can be poor customer service, inept business processes, shipping or invoicing errors, lack of compliance, penalties from regulatory reporting issues and many others.  Additionally, misinformed decisions by information consumers responding to industry market changes can have significant costs to and affect the organization’s health.

This dilemma often results from organizations that fail to take advantage of the opportunity and initiative to improve data quality and metadata in the enterprise.  This missed opportunity leads to increased time and expenses required to reconcile and audit data in the enterprise for accurate and reliable use as information.  Through planning, design and implementation of data quality and managed metadata – as components of an overall enterprise data management framework – organizations can gain competitive advantage through effective and confident use of their information assets.

Metadata is the data context that explains the definition, control, usage, and treatment of data content within a system, application or environment.  Metadata provides the characteristics to measure data quality in the enterprise.  Data quality measures the health of information for an intended use.  Several factors affect realized data quality:

  • The inherent data quality itself has characteristics (metadata) such as the accuracy, completeness, consistency, and freshness of the data.  These qualities can be measured and tracked over time, then improved after this analysis.
  • The pragmatic definition of data quality is how well the data suits a particular purpose.  Data characteristics include its form, precision, level of aggregation, and availability (all found in metadata).  These characteristics are specific to the process or consumer that will use the information.
  • The level of data integration, such as multiple customer numbers for the same customer or multiple product IDs for the same product, is another common metadata problem that many organizations experience.  This affects the organization’s ability to obtain an accurate picture of the business quickly.
  • Finally, inconsistent definitions of the basic people, organizations, locations, assets and events across different systems and business units often make it difficult to obtain a clear view of the state of the business.  This inconsistency in the enterprise is often the initial reason why organizations kick off enterprise initiatives around data governance and stewardship and managed metadata.

Typically, data quality is measured against the specific use of the information.  However, not all information meets the same quality specification.  The impact of data quality is also dependent on the consumers of information making wise choices about the sources of information they use.  Low-quality data (or the perception of low-quality data) deteriorates trust in the information and encourages consumers to create alternate – often inconsistent – sources of information.  This results in the reduced ability to collaborate and present a single version of the organization’s health.

The major processes involved in a data quality program include:

  • Measuring the inherent quality of the data sources
  • Creating consistent views of customers, products, assets, profitability, etc.
  • Aligning definitions of key objects (customer, location, product)
  • Determining the recommended uses for each source
  • Making this information available to all knowledge workers as necessary
  • Improving the inherent data quality procedures
  • Improving the existing processes which create the data

In order to support these efforts, data quality techniques and methods have evolved to include these major steps.

  • Data Profiling In this phase, organizations can analyze the data structures relative to the data content.  It is useful in recommending or affirming data structure design based on the data content.  In addition, it can generate metrics on potential errors for use in system or process improvement.  Data profiling can:
    • Identify non=compliant data in terms of data type, length, domain value, etc.  These data characteristics are technical metadata.
    • Recommend key structures, both primary key and foreign keys, based on content
    • Recognize data entry patterns in order to determine requirements for additional edits or application formatting
    • Identify data errors and anomalies based on metadata business rules.
  • Standardization This is a process of formatting data (such as phone numbers, social security numbers or product data) based on patterns into actionable components, and standardizing terms, in preparation for data conversions, interfaces, or match/merge.  Standardization is based on pre-set business rules (found in the metadata), which may be part of a data quality product (e.g., postal standardization), or it may be customized for the organization.
  • Match/Merge This is a process of linking or creating a consolidated set of information for customers, products or places, based on information from multiple data sources.  It provides the key to integrating processes and information across the enterprise.
  • Auditing By applying business rules and profiling to files, feeds or transactions over time to identify data quality problems, auditing allows you to prevent data errors, provide data production feedback for business processes, or determine data quality trends.
  • Address Verification and Householding Specialized processing features, such as address verification and householding (or clustering), allow marketers to gain insight into opportunities.  Address verification confirms that the addresses are valid and useful by analyzing the data components (e.g., city is in state, zip is in city and state, etc.).  Householding groups individuals based on a common feature, typically address.

These basic functions, coupled with business data and process ownership agreements, are the basic components of a recommended data quality program.

Conclusion

Metadata is the contextual enabler for data quality.  Awareness of data quality and its dependency on metadata is a critical component of an enterprise data management program.  Organizations that continue to ignore the state of their data quality and metadata will continue to be exposed to incomplete and inaccurate data when making business decisions.  Today’s organizations, to be successful in this age of information, must make greater efforts to understand, improve and maintain the state of data quality and metadata throughout the enterprise.

LinkedIn
Facebook
Twitter

Dr. David P. Marco, LinkedIn Top BI Voice, IDMMA Data Mgt. Professional of the Year, Fellow IIM, CBIP, CDP

Dr. David P. Marco, PhD, Fellow IIM, CBIP, CDP is best known as the world’s foremost authority on data governance and metadata management, he is an internationally recognized expert in the fields of CDO, data management, data literacy, and advanced analytics. He has earned many industry honors, including Crain’s Chicago Business “Top 40 Under 40”, named by DePaul University as one of their “Top 14 Alumni Under 40”, and he is a Professional Fellow in the Institute of Information Management. In 2022, CDO Magazine named Dr. Marco one of the Top Data Consultants in North America and IDMMA named him their Data Management Professional of the Year. In 2023 he earned LinkedIn’s Top BI Voice. Dr. Marco won the prestigious BIG Innovation award in 2024. David Marco is the author of the widely acclaimed two top-selling books in metadata management history, “Universal Meta Data Models” and “Building and Managing the Meta Data Repository” (available in multiple languages). In addition, he is a co- author of numerous books and published hundreds of articles, some of which are translated into Mandarin, Russian, Portuguese, and others. He has taught at the University of Chicago and DePaul University. DMarco@EWSolutions.com

© Since 1997 to the present – Enterprise Warehousing Solutions, Inc. (EWSolutions). All Rights Reserved

Subscribe To DMU

Be the first to hear about articles, tips, and opportunities for improving your data management career.