Affiliated with:

Making Metadata Active and Smart

image 121

Truly valuable metadata is both active and smart.  There are methods that can be implemented to ensure active and smart metadata.

In information technology, there is concept of “Mooers’ Law”: “An information retrieval system will tend not to be used whenever it is more painful and troublesome for a customer to have information rather than for him not to have it” (1959).

 Information technology solutions are designed to provide value in spite of the limitations of the technology through which they are implemented.  However, that is not always how such solutions work.  Sometimes systems that are designed to deal with complexity actually add layers of their own complexity, reducing their usefulness.

Unfortunately, databases (and other information stores) often hit this wall.  In examining the causes of this situation, some clear suggestions for solutions emerge.  Interestingly, both the problem and the solution start with people – the creators of information stores, and the users of the information they store!

Image 122

Figure 1: Generic Technology Stack: Information technology is often made more complex by its own layering.  Human beings (the end users) sit atop many layers of technology that separate them from the information they need.

The Inadequacy of Information Silos

Databases – and other information silos – were conceived to be useful tools to help people sort out large volumes of information by quickly and easily relating that information based on certain criteria.  It is a fine idea – let the machine do the heavy lifting of sorting and finding information.  All the end user has to do is create the criteria for that data pull or sort.  However, this step itself poses problems.  The root of the problem is data movement.

Of course, in a perfect world there would be no data movement at all.  All pertinent information would simply reside in an immaculate, infinite, infinitely fast and immortal memory where it would be perpetually available, precise and perfectly correlated.  Real computer memory is both limited and fallible, giving life to the vast and prosperous data storage industry and the wide array of database technology.  Along with the need to store information comes the need to organize it and manage it.  In the real world, information must be protected and backed-up regularly to avoid loss.  There are also limits on the size and flexibility of such data stores based on the database design.  Add the human component again with queries and still more problems arise.

Who decides where and when data or information should be moved to be the most useful?  Who decides what is even possible to move?  Databases are designed to reflect a particular designer’s view of the enduring relationships between pieces of information.  Nevertheless, this is just one set of perspectives on how chunks of information should be related.  It is inevitably limited and limiting because it represents only those relationships made possible by the initial design.

Query languages – the ways in which the end user is allowed to ask for certain sorted or selected data – can be limiting and dependent more upon arbitrary database structure rather than the way people really ask questions.  Limits and challenges exist at every turn threatening to take a useful and powerful tool – the database – and render it an unwieldy mountain of irrelevant information and content.

It seems to be impossible to design a database (or any other information store – this is not an exercise in database bashing) in a way that reflects all perspectives equally well and that endures through time.  This situation has promoted ingenious software architects to develop mechanisms to support alternate perspectives.  An example of such a mechanism is the database “view,” a method used by many commercial products.  The limitations of views are, once again, their inherent inflexibility and complementary complexity.  As views become more flexible, they grow in complexity; as they become less complex, they become less flexible.  Both scenarios decrease the usefulness of such an approach as end users demand both flexibility and the ability to handle ever-increasing complexity.

In some ways, the history of information management has been the history of evolving design theories.  In the early years of computing, information was organized in ways that mimicked pre-computing technologies – the Hollerith punched card and traditional paper files.  It was a short step to application specific file layouts.  Next came the era of the relational database, wherein the powerful ability of computers to cross-reference and correlate information was implemented in a way that was not directly analogous to real-world paper file systems.  This began in 1969-70, with the publication of two seminal works by British computer scientist, E.F. Codd (“Derivability, Redundancy and Consistency of Relations Stored in Large Data Bases” and “A Relational Model of Data for Large Shared Data Banks”), and the process appears to have stalled, despite forays into other ways of organizing data.  Object orientation and associative technologies have yet to make a real mark.

What has emerged since the early 70’s is the wonderful world of metadata!  The metadata repository seeks to recognize that traditional data structures fail to provide the context and perspective that are critical to effective information usage.  However, although metadata repositories have made a genuine contribution by making explicit much of the contextual framework for information assets, they are, at best, a partial solution to meta-chaos and, with their peculiar jargon and specialized user interfaces, can add their own layers of complexity.

In many cases, metadata solutions are the best of a set of insufficient solutions for business’ need for an increasing amount of information relevance.  Ideally, in the metadata model, information objects carry their own context, and perspective is provided by the users who wish to access information selected according to their unique criteria.

Information Silos

Most information management technology works to create small, functioning islands of information in the current IT environment that addresses discrete, isolated areas of business.  Certain relevant information is related appropriately via relationship mapping or metadata organization.  However, this information is limited and found only in clumps or islands of information as defined by the metadata or database architect.

A typical example of a common limitation inherent to this situation would be attempting to find the relationship between information locked in one island of information with information locked in another.  Marketing studies provide many examples.  Imagine that you owned a large dry cleaning company with several discrete databases.  One holds employee information.  Another holds all locations and information about each location’s sales and profitability.  Somewhere, there is information about all of the dry cleaning chemicals purchased by the company and where they were dispatched.  Now imagine that someone wanted to examine, regularly, which of the most successful stores use the most of the most expensive dry cleaning chemicals and how many hours the employees worked in that store.  Given the constraints of many IT situations, the information you want might be an exercise in manual search rather than automation.  The islands of information can only be navigated one at a time.  It is the way many databases work, but it is not the way people think or behave in the real world.  Beyond this, the query languages themselves further limit the utility and ability to cross-reference the information, often providing imprecise results.

Within these islands of information IT can only provide information as it was originally imagined to be needed and then only in relatively simple contexts – there is no room for flexible cross-referencing and re-imagining business analysis.  For IT to develop along with current trends in business management and analysis, we must change the way we organize and find information.  Several solutions have been suggested.

BSM Architectures as a Role Model?

The use of metadata repositories and directories has emerged as the most successful approach to eliminate these islands of information, but even this has its imperfections.  There remains a danger of re-fragmentation without a theoretical basis and includes the risk of separating context and object, introducing duplication and inaccuracy.

The point at which business management and IT converge appears to hold the key.  As frameworks such as ITIL® evolve, new approaches to information organization are appearing.  The ITIL concept of the configuration management database (CMDB), for example, offers some solutions.  By unifying information for IT configuration items within a consistent framework, duplication and inaccuracy can be avoided.  By combining this management concept with solid metadata architecture, meta-chaos is avoided – at least in the Business Service Management (BSM) world.  The CMDB provides contextual information for IT assets dynamically (or actively), and perspective is provide by the BSM uses cases.

CMDBs create and contain standards for the entire IT enterprise.  These standards can help to bring together information once isolated on separate islands.  This IT model and practice points to a broader business model that may offer some solutions to the problems of islands of information and meta-chaos.  So far, the discussion has focused on infrastructure, applications and data stores, but my interest is in the architecture this implies.  This architecture may provide a helpful model for future information management, particularly if the CMDB is based on metadata management technology and thus provides a bridge from the BSM world to the entire breadth of IT activities from development to governance.

Metadata Structures Using a Business Service Management Model

Two kinds of metadata structures can provide coordination and uniformity of information to provide bridges between many islands of information.  A pure metadata repository contains metadata (or at least copies of metadata) while pure metadata directories locate items of interest.  A repository stores information (as in a database) while a directory is a grouping of information based on some predetermined criteria (as in a phone book).

There are also passive and active metadata management strategies, each with specific strengths and weakness.  A passive repository requires a managed process to populate it with the appropriate metadata, while active repositories self-populate automatically.  While the automatic option sounds like the best one initially, there are subtle drawbacks to this model – consider, for instance, what happens if a consistent “point in time” view of information is required.  With active population, the repository is always changing.  This might be considered “active but uninformed”!  It provides context – but context from a very technical point of view.

A better approach may be found in a model of an intelligently active repository, architected after the model of business service management (BSM).  In a BSM model, components of the business service are grouped and managed as services – not as discrete elements of an IT infrastructure on their own.  It is top-down architecture rather than bottom-up.

A repository based on this model would have its auto-population driven by changes in perspective from the presentation layer.  (See Figure 2 below.)  Instead of constantly repopulating with every change, the architecture leverages the needs of the end-user to determine and build its metadata structure, pulling perspectives on information from above rather than pushing them from below.  This architecture would create a more logical (at least more like human logic in its structure) and “smarter” repository.

In contrast, when information is pushed from below up into the adapter layer and beyond based on a schedule or other arbitrary factor; it could be considered a “dumb” repository.  In this situation, the information is managed operationally, but with no regard to end-user context or perspective.  In practice, the information may be somewhat context-driven (perhaps by a query), but it remains fragmented as in the example of the islands of information.  Some of this information may be related, but cross-referencing on a large scale with this model is not feasible.

A better model would involve a “smart active” approach, using the BSM paradigm, whereby the visualization layer itself drives the pull of information, determining the particular “service” to be brought together.  Using this BSM-like architecture, the end user actually drives the context for information making logical and creative use of information possible without increasing complexity to the point of uselessness.  It is a great example of how a management model can truly be translated to an IT architecture model.

Image 124

Figure 2: The Enterprise Metadata Repository: This model addresses the challenge to design a repository for flexibility, multiple perspectives and changes without rendering it so complex that it is no longer useful.  It is driven from the end-users’ visualization requirements, building a metadata structure based on actual use of information.

This new implied model makes it clear that the lines are blurring between IT management and business management, and operations and business goals.  As with other technology information, technology is an integral part of any business – whether in a strategic or tactical role; providing service or key business information.  Good management practices require good technology solutions, especially those that support enterprise data management.  Likewise, good technology enables good business practices.  It can also be argued that information technology should be driven – at its most basic level – by the needs of the user rather than ruled by the constraints of our models.

Conclusion

Metadata is a critical component of Information Technology activities and the ability of the user community to access the correct data.  Therefore, it is worthwhile for organizations to design and implement metadata solutions that are active and smart.

LinkedIn
Facebook
Twitter

Ian Rowlands

Ian Rowlands is a senior leader in development, marketing and channel management in global software and services companies. Most recently, Ian has led the development of a major metadata repository technology. He has played a leading role in establishing product management, managing worldwide indirect channels and conceiving and executing product marketing programs for a variety of software organizations. Ian earned a Bachelor of Technology degree from the University of Bradford (U.K.), is a standing member of the British Computer Society, and a Chartered I.T. Professional.

© Since 1997 to the present – Enterprise Warehousing Solutions, Inc. (EWSolutions). All Rights Reserved

Subscribe To DMU

Be the first to hear about articles, tips, and opportunities for improving your data management career.