Affiliated with:

Data Rationalization and Semantic Resolution

Integeration 5

Data rationalization through managed metadata and data modeling, can support semantic resolution, enabling improved analysis and knowledge transfer.

Introduction

Across the Internet and within internal systems, ontologies are used to improve search capabilities and make inferences for improved human or computer reasoning. By relating terms in ontology, the user doesn’t need to know the exact term actually stored in the document. Data rationalization is a Managed Meta Data Environment (MME) enabled application which creates/extends an ontology for a domain into the structured data world, based on model objects stored in various models (of varying levels of detail, across model files and modeling tools) and other meta data. Ontology is “the study of the categories of things that exist or may exist in some domain”. Another definition of ontology, applicable to domains, “a collection of taxonomies and thesauri” from Seth Earley. Data models, often unknowingly, express many aspects of ontology, even though they are not stored in OWL or RDF.  These concepts form parts of several components of enterprise data management.

The primary reason for data modeling is to create physical data structures – though a critical best practice for data modeling is to follow a phased modeling approach – typically developing conceptual, logical, and finally physical data models. Conceptual data models are sometimes considered to be semantic models since they are expressed in business terminology and demonstrate how key business objects relate to each other, independent of technology or application.

There are other types of data models (e.g. enterprise models) and other metadata that should be linked together to provide a more holistic view of a domain. Unfortunately, most modeling tools are incapable of handling all of the different levels of models effectively; it is common for more than one modeling tool to be used in an enterprise, and multiple model files are usually a necessity. For example, a data modeling tool might be used for logical and physical models, while a UML class diagram might be used for the conceptual model.

Connecting these model objects, and visualizing these objects and relationships is called Data Rationalization, and it is typically enabled as part of a Managed Metadata Environment (MME) by leveraging a Metadata Repository (MDR) tool. Data Rationalization is a “vertical data lineage” as opposed to horizontal data lineage employed for data movement (i.e. Information Supply Chain).

In Data Rationalization, we’re not trying to find where an actual piece of data came from (i.e. source to target), but what higher order model objects the data was conceived from or objects that can help to explain it, or which downstream objects implement the higher order model objects (see Figure 1 below).

Integeration 6

Figure 1: Example – Data Rationalization versus Information Supply Chain

Benefits of Data Rationalization

What is the benefit of Data Rationalization? To be able to effectively exploit, manage, reuse, and govern enterprise data assets (including the models which describe them), it is necessary to be able to find them. In addition, there is (or should be) a wealth of semantics (e.g. business names, definitions, relationships) embedded within an organization’s models that can be exposed for improved analysis and knowledge transfer. By linking model objects (across or within models) it is possible to discover the higher order conceptual objects for any given object. Conversely, it is possible to identify what implementation artifacts implement a higher order model object. For example, using data rationalization, one can traverse from a conceptual model entity to a logical model entity to a physical model table to a database table, etc. Similarly, Data Rationalization enables understanding of a database table by traversing up through the different model levels.

In 21st century distributed systems, there are often dozens, hundreds of models, and tens of thousands of data elements – found in many heterogeneous systems. It is usually quite difficult to find all the model objects or implementation artifacts (e.g. database tables) in the enterprise that express a concept, e.g. “Customer.” Even with name matching, the same term may mean different things in different systems (i.e., homonyms) or have differing natural keys and therefore probably are NOT representing the exact same thing. Of course, different applications may use different terms and different abbreviations… e.g., prospect, account, customers, etc…

Data Rationalization is an enabler of effective Data Governance. It is not possible to govern information assets if without knowing the location of the data and / or the variety of meaning given to each object. Similarly, Data Rationalization can aid in the development of Master Data Management solutions. By identifying common data entities, and how these relate to other pieces of data (again, across many systems), MDM solutions will be able to improve meeting business user needs for all the systems that require the master/reference data.

How does Data Rationalization work?

To be able to rationalize data, meta-relationships between model objects (across model levels) must be established. Of course, doing so does not replace normal types of relationships between model objects in the same model. Meta-relationships can be established in multiple ways:

  1. Use automated modeling tool functionality
  2. Use manual modeling tool functionality
  3. Use modeling tool metadata fields
  4. Use Metadata Repository (MDR) tool to manually establish links using a GUI or other interface.
  5. Use a spreadsheet

Once the meta-relationships are established, they must be imported into the Metadata Repository (if not established using the MDR tool). From there, analysts can search, retrieve, and visualize the metadata to perform Data Rationalization analysis. Analysts do not need a modeling tool license to explore the models (assuming the higher order models can be found), or need to rely on the data modeler to obtain access or export the model metadata.

Conclusion

A very simple example of a Data Rationalization analysis might be an analyst wishing to understand the relationship between two tables. Assume the analyst does not have a modeling tool license, does not know what model to look for, or does not have access to the network directory where the models are stored. Also, assume that foreign keys have been disabled in the physical model (valid in some cases, e.g. data warehousing…). Using the MDR, the analyst could search on the table names, and when these are displayed rationalize upwards to see the logical entities (in a separate model file) these tables originated from. This enables the analyst to see the relationship and allows the analyst to understand its cardinality, optionality, and identification and review the relationship verb phrase. Data Rationalization provides semantic understanding of the object in question along with its higher and lower order companions, for increased knowledge of the organization’s data and information.

LinkedIn
Facebook
Twitter

Peter Stiglich, CBIP

Pete Stiglich, CBIP, is a Principal Consultant with Data-Principles, LLC and has written and presented extensively on data architecture, data management, and Big Data. He is an AWS Technical Professional and a Hortonworks Architecture Professional.  Pete also is an experienced trainer in data architecture and data modeling, and has a background in data governance and metadata management.

© Since 1997 to the present – Enterprise Warehousing Solutions, Inc. (EWSolutions). All Rights Reserved

Subscribe To DMU

Be the first to hear about articles, tips, and opportunities for improving your data management career.