Skip to content.

Sections
Home » Resource Center » Real-World Decision Support (RWDS) Journal » January 2002 - Volume 1, Issue 14 » Meta Data Management in the Data Warehouse Environment: A Logical Approach to Meaningful Data Analysis

Meta Data Management in the Data Warehouse Environment: A Logical Approach to Meaningful Data Analysis

By Bob Zurek

Information Asset Management is intrinsically valuable - it is the lifeblood of any complex business. Companies depend on all types of information - financial reports, text documents, audio and video files, graphics, maps, messages and transactions - to make business decisions.

Today's dynamic and fast-changing business environment creates volumes of data that quickly become useless without effective management. To make sound decisions that affect the bottom line, users need information that helps them interpret the data. They need clear, precise, accessible and effectively managed meta data.

Meta data -loosely defined as "data about data "- is generated whenever data is added to, deleted from, modified in, or moved to or from a data warehouse. Through the management of meta data, companies can achieve standardized data definitions, clarified data relationships and increased data availability when and where it is needed.

Meta data management is realized by tool collaboration and sharing, made possible through the selection and seamless integration of best-of-breed tools. Today, vendors, systems integrators and businesses generally employ one of two methods for achieving meta data integration. The first involves constructing bridges from one data warehouse tool to another. This approach becomes unworkable whenever one of the participating tools is changed by its vendor. Alternatively, the common meta data model method offers a "least common denominator " approach in which only an agreed-upon set of meta data is contained, or one vendor establishes a "standard " that others must follow.

This article examines the advantages and challenges of meta data management in the data warehouse environment and offers an ideal approach to meaningful data analysis.

The Promise ... and the Problems

The Role of Meta Data in the Data Warehouse

It is frequently said of high technology that "the only constant is change." Enterprise data warehouses, by their very nature, are highly dynamic - perpetually changing to keep pace with evolving business (especially e- business) needs. Over time, data stored in a warehouse inevitably becomes outdated or changed in various ways. As it does, reporting and analysis of warehouse data becomes less and less accurate. Users find themselves groping for meaningful answers to questions such as: What does the data mean? What is its structure and format? Where did it come from? How was it calculated? When was it loaded? Who owns it? Where is it used?

When properly managed, meta data enables users to arrive at the answers to these questions. Meta data can be found throughout the enterprise in a patchwork of repositories and proprietary meta data stores. Two broad categories of meta data are found in the warehouse environment:

  • Technical meta data provides a detailed blueprint that IT can use to build and maintain the warehouse. Technical meta data typically includes database implementation names, table and column sizes, data types, and structural information such as database key attributes and indices. Providing data warehouse designers and developers broad access to technical meta data allows more rapid implementation of changes and rollout of future projects.
  • Business meta data includes those descriptions of data that are not related to software implementations such as the business name, business rules in relation to other data, and the owner of the definition. Business meta data gives end users a roadmap for navigating all of the data in the enterprise by documenting what information is available in the warehouse and, when accessed, provides a context for interpreting the data. Business descriptions of a data element and information on when the data was loaded, calculated or transformed, prove invaluable to users in order to understand and trust the data and use it to make sound business decisions.

It's difficult to overstate the advantages of managing meta data. Three of these advantages include:

  • Standardized data definitions - One department refers to "revenues," another to "sales." Are they talking about the same activity? One subsidiary unit talks about "customers," another about "users "or "clients." Are these different classifications or different terms for the same classification? Effective meta data management can ensure a consistent data "language "applies throughout the organization.
  • Clarified data relationships - Meta data management illuminates the associations and interactions among all components of the warehouse environment: business rules, tables, columns, transformations, and user views of the data, to name a few. By clarifying relationships throughout the data warehouse environment, managed meta data enables warehouse managers and knowledge workers to see the bigger picture - to fully understand the meanings of the data assets, and to accurately predict and manage the impact of changes to the environment.
  • Increased data availability - Meta data exists "behind the scenes," revealing the origin of data, who defined it, when it was modified, and much more. Traditionally hidden, meta data must now be made visible to company knowledge workers on demand. New standards and technologies like XML and the Web create a perfect vehicle for delivery.

Why Has Using Meta Data Been So Difficult?

When building or extending a data warehouse, developers employ a variety of tools for system modeling, database design, data quality assessment, data movement, scheduling, analysis and reporting. The need for close collaboration among these tools is critical as data is defined, transformed, moved, and accessed for business intelligence (BI) purposes. The key to achieving tool collaboration and sharing lies in the selection and seamless integration of best-of-breed tools. However, in the large majority of enterprises, this easy-sounding remedy has been more hoped-for than achieved.

The consequences of a company's inability to manage meta data are many - and severe:

  • Changes in the data warehouse are difficult to manage, and thus frequently can't be made in time to match the pace of change in the business.
  • Data can't be compared and analyzed across departments and processes. Stove-piped data marts have to be reengineered into an enterprise warehouse to support sound, consistent business decisions that transcend divisional boundaries.
  • Redundant, ad hoc systems - only temporarily useful - are extremely costly to maintain, integrate and retire.
  • BI and modeling tools can't be integrated into the warehouse environment, hence multiple versions of "truth" are scattered throughout private stores of business data across the enterprise.
  • Meta data can't be shared among products without rekeying, which is time-consuming and introduces errors into the environment.
  • Documentation is out of date or incomplete, affecting the ability to manage changes in the environment and ultimately undermining the knowledge workers' confidence in the data they are using.

Not There, Not Accessible, or Not Intelligible

Most businesses struggling to manage their meta data are finding that it is either nonexistent or inaccessible, or accessible but unintelligible. This situation can be due to any of several reasons:

  • The business lacks the tools or organizational structure necessary for developers and end users to create valid meta data;
  • Information exists in unconnected, frequently redundant or temporary gathering places that function as "islands," "silos," or "stove-pipes," making it impossible to implement changes that will ripple through the entire enterprise; or
  • The meta data can't be accessed because the tool that created it is no longer available; other tools can't be substituted because they don't speak the same language as the originating tool. Example: An extraction and transformation tool is typically unable to use the information generated by a warehouse-modeling tool.

Whatever the cause, the negative effect is that comparing and analyzing data across departments or processes becomes virtually impossible. Meta data is the logical way to create a uniform corporate definition of data, yet no industry-wide meta data standard has been adopted. (Although the Common Warehouse Model promises to become the industry meta data standard, its effectiveness depends on wholesale, widespread and consistent adoption by all relevant tool vendors -which historically, at least, has not happened with other proposed "standards.")

Conventional Approaches to Meta Data Integration

Tool Bridges

Today, vendors, systems integrators and businesses employ one of two methods for achieving meta data integration. The first involves constructing bridges from one data warehouse tool to another. In this approach, the developer first becomes familiar with the underlying schemas of the source and target tools, then extracts the meta data from the source tool and changes it into the format of the target tool. The same process in reverse is used for bi-directional exchange.

To a limited extent, tool bridges furnish a quick and easy solution to meta data exchange. Over time, however, the quick fix begins to break down. Problems emerge when vendors deliver revised or updated versions of their tool, and existing bridges will not work with the new versions. Consequently, additional precious developer time must be spent studying new schemas and building new bridges.

Furthermore, the bridge approach will not scale as the warehouse grows and the number of integrated tools increases. Assuming that two bridges are required to integrate two tools, six bridges will be needed to integrate three tools, twelve to integrate four tools and so on. Like their real-world counterparts, these bridges will eventually "fall down" if not properly maintained. When one of those four integrated tools changes its schema, then six bridges must be changed. Clearly, the endless writing and maintenance of bridges is likely to devour an unacceptable amount of scarce, expensive development resources.

For these reasons, tool bridges tend to be most appropriate, and function most effectively, as a tactical, unidirectional meta data exchange solution. They are an inadequate answer to the comprehensive management of warehouse meta data.

Common Meta Data Model

An alternative integration method is the common meta data model, which typically takes one of two forms:

  • Least Common Denominator: The first type of model contains only that set of meta data on which a group of vendors can agree to use for data exchange, which then becomes limited to those items described in the model. In this "least common denominator "method, the more vendors involved in the agreement, the smaller and more restrictive the set of meta data tends to become.
  • The Standard: The second model type is put forth by a vendor as "the standard"; other vendors must support this model to achieve integration. This approach is not only somewhat arbitrary, it fails to embrace all of the meta data available for sharing within tool suites.

Looking for the Right Tool

As data warehouses and data marts continue to proliferate, businesses need meta data management more than ever to function effectively. These organizations can point to a substantial wish-list of capabilities they would like to see in an ideal meta data management tool. Of primary importance, developers and end users alike need a single authoritative source in order to use captured meta data. Furthermore, they also need to be able to maintain meta data throughout warehouse development and deployment, and make it accessible by any number of tools used to create and maintain the business intelligence infrastructure. To accomplish these objectives, the infrastructure needs a translation service that would permit data to be shared among the warehouse tools suite. This service would provide fine-grained semantic integration, enabling maximum sharing without gaps or information loss; and it would also provide the ability to add new tools to the warehouse environment without creating additional maintenance problems.

The "ultimate" meta data management tool would enhance data availability by offering the capability to publish meta data out to the Web. It would also be tightly integrated with the BI reporting and analytical tools. These enhanced capabilities would yield the standardized data definitions, clarified data relationships and increased data availability that are unattainable through current approaches.

An Ideal Approach to Managing Meta Data

It follows that a truly workable solution must avoid the shortcomings of tool bridges and common models. In the approaches from leading vendors, including Ascential Software, meta data is integrated by means of a comprehensive model that represents the union of shared meta data from all data warehousing tools.

Unlike the other approaches, this integration model is already built into the solution and reuse is transparent to users. Most importantly, this model enables identification of the rich relationships among the various tools.

We've spoken of the three major goals of meta data management: standardized data definitions, clarified data relationships and increased data availability. An ideal approach to meta data management will achieve these objectives by:

  • Allowing each tool to manipulate meta data via its own meta data model;
  • Enabling most tools' meta data objects to be shared;
  • Incorporating schema evolution strategies; and
  • Eliminating construction and proliferation of custom bridge code.

Meta data is nothing less than the foundation on which business knowledge and decision-making are built. By providing essential content to the enterprise applications, it serves as a critical piece of the enterprise information infrastructure.

The ultimate goal of an ideal meta data management strategy is to fuel the integration of knowledge throughout the enterprise by offering robust, flexible architectures that embrace a variety of model standards.

About the Author

As vice president of advanced technology, Bob Zurek is responsible for directing Ascential's future technology direction, including new product development and technology acquisition. Ascential's DataStage product family allows companies to fully integrate data from all information sources - including CRM, B2B, e-business, mainframe and data warehouse environments - in a unified framework. As the cornerstone of Ascential's Information Asset Management offering, DataStage leads the market in providing comprehensive data integration, data quality assurance, meta data management and enterprise application connectivity.

With more than 20 years of proven professional software development, technology research and management experience, Zurek is instrumental in creating and driving Ascential's Information Asset Management (IAM) strategy. Prior to joining Ascential, Zurek was a senior technology analyst with Forrester Research, where he was responsible for research and strategy advisory on emerging technologies, with an emphasis on customer relationship management, knowledge management and wireless/speech. Previously, Zurek co-founded LumaPath, Inc. He was also vice president and CTO of Incentive Systems and vice president of research and technology and a Fellow at Sybase and PowerSoft. Zurek holds a BS from the University of New Hampshire and has been a guest lecturer at various MBA programs, as well as an invited speaker at the Harvard University Symposium. bob.zurek@ascentialsoftware.com