An organization’s data quality is often the result of poor metadata management. Improve business and technical metadata and manage it properly to ensure higher quality data and information.
The state of an enterprise’s information depends on its data quality and metadata. Poor quality data coupled with incorrect interpretation and use of information from an enterprise application is a recipe for failure, since it extinguishes all confidence with the organization’s consumers. The consequences can be poor customer service, inept business processes, shipping or invoicing errors, lack of compliance, penalties from regulatory reporting issues and many others. Additionally, misinformed decisions by information consumers responding to industry market changes can have significant costs to and affect the organization’s health.
This dilemma often results from organizations that fail to take advantage of the opportunity and initiative to improve data quality and metadata in the enterprise. This missed opportunity leads to increased time and expenses required to reconcile and audit data in the enterprise for accurate and reliable use as information. Through planning, design and implementation of data quality and managed metadata – as components of an overall enterprise data management framework – organizations can gain competitive advantage through effective and confident use of their information assets.
Metadata is the data context that explains the definition, control, usage, and treatment of data content within a system, application or environment. Metadata provides the characteristics to measure data quality in the enterprise. Data quality measures the health of information for an intended use. Several factors affect realized data quality:
- The inherent data quality itself has characteristics (metadata) such as the accuracy, completeness, consistency, and freshness of the data. These qualities can be measured and tracked over time, then improved after this analysis.
- The pragmatic definition of data quality is how well the data suits a particular purpose. Data characteristics include its form, precision, level of aggregation, and availability (all found in metadata). . These characteristics are specific to the process or consumer that will use the information.
- The level of integration, such as multiple customer numbers for the same customer or multiple product IDs for the same product, is another common metadata problem that many organizations experience. This affects the organization’s ability to obtain an accurate picture of the business quickly.
- Finally, inconsistent definitions of the basic people, organizations, locations, assets and events across different systems and business units often make it difficult to obtain a clear view of the state of the business. This inconsistency in the enterprise is often the initial reason why organizations kick off enterprise initiatives around data governance and stewardship and managed metadata.
Typically, data quality is measured against the specific use of the information. However, not all information meets the same quality specification. The impact of data quality is also dependent on the consumers of information making wise choices about the sources of information they use. Low-quality data (or the perception of low-quality data) deteriorates trust in the information and encourages consumers to create alternate – often inconsistent – sources of information. This results in the reduced ability to collaborate and present a single version of the organization’s health.
The major processes involved in a data quality program include:
- Measuring the inherent quality of the data sources
- Creating consistent views of customers, products, assets, profitability, etc.
- Aligning definitions of key objects (customer, location, product)
- Determining the recommended uses for each source
- Making this information available to all knowledge workers as necessary
- Improving the inherent data quality procedures
- Improving the existing processes which create the data
In order to support these efforts, data quality techniques and methods have evolved to include these major steps.
- Data Profiling In this phase, organizations can analyze the data structures relative to the data content. It is useful in recommending or affirming data structure design based on the data content. In addition, it can generate metrics on potential errors for use in system or process improvement. Data profiling can:
- Identify noncompliant data in terms of data type, length, domain value, etc. These data characteristics are technical metadata.
- Recommend key structures, both primary key and foreign keys, based on content
- Recognize data entry patterns in order to determine requirements for additional edits or application formatting
- Identify data errors and anomalies based on metadata business rules.
- Standardization This is a process of formatting data (such as phone numbers, social security numbers or product data) based on patterns into actionable components, and standardizing terms, in preparation for data conversions, interfaces, or match/merge. Standardization is based on pre-set business rules (found in the metadata), which may be part of a data quality product (e.g., postal standardization), or it may be customized for the organization.
- Match/Merge This is a process of linking or creating a consolidated set of information for customers, products or places, based on information from multiple data sources. It provides the key to integrating processes and information across the enterprise.
- Auditing By applying business rules and profiling to files, feeds or transactions over time to identify data quality problems, auditing allows you to prevent data errors, provide data production feedback for business processes, or determine data quality trends.
- Address Verification and Householding Specialized processing features, such as address verification and householding (or clustering), allow marketers to gain insight into opportunities. Address verification confirms that the addresses are valid and useful by analyzing the data components (e.g., city is in state, zip is in city and state, etc.). Householding groups individuals based on a common feature, typically address.
These basic functions, coupled with business data and process ownership agreements, are the basic components of a recommended data quality program.
Metadata is the contextual enabler for data quality. Awareness of data quality and its dependency on metadata is a critical component of an enterprise data management program. Organizations that continue to ignore the state of their data quality and metadata will continue to be exposed to incomplete and inaccurate data when making business decisions. Today’s organizations, to be successful in this age of information, must make greater efforts to understand, improve and maintain the state of data quality and metadata throughout the enterprise.