The Semantic Web and semantic technologies can link to an organization’s Information Architecture for a more holistic view of the enterprise, and for richer understanding and analysis of data resources.
Semantic models or ontologies can be a key change agent to increase enterprise integration as heterogeneous data resources can be described using RDF/RDFS/OWL standards and improved knowledge discovery achieved, as more facts about our data or architecture can be uncovered due to inferencing.
Data Architects need to understand the Semantic Web (“a web of data – not just a web of documents”), what semantic initiatives are underway in the enterprise and what ontologies are being used or developed. Indeed, Enterprise Data Architects should be intimately involved in ontology development or adoption.
Overview of Information Architecture
There are several key components of Information Architecture:
- Enterprise Data Model
- Information Value Chain Analysis (i.e., CRUD matrices)
- Information Supply Chain Analysis (horizontal data lineage)
- Reference data sets – e.g. standard code values
- Metadata architecture
- Data Integration Architecture, including Master Data Management (MDM) architecture
- DW / BI Architecture
- Semantic models aka ontologies (e.g., ontologies in RDF/OWL)
Many experienced data architects would add Data Rationalization analysis to this list. Data Rationalization is vertical data lineage – the lineage between model objects across model levels (e.g., conceptual, logical, physical), and it complements Information Supply Chain Analysis.
Semantic Models and the Enterprise Data Model
While semantic models/ontologies can be associated with many other components of Information Architecture, the Semantic Models and the Enterprise Data Model (EDM) are more closely related.
An Enterprise Data Model typically includes three (3) models – Subject Area Model (SAM), Enterprise Conceptual Data Model (ECDM), and the Enterprise Logical Data Model (ELDM). The ECDM in particular should be closely aligned with enterprise semantic models and vice versa.
The ECDM identifies and defines the key business objects/data entities and the relationships between these (which can express many business rules). It is only natural that the ECDM inform the semantic models/ontologies, given the knowledge captured in the ECDM about the enterprise. The ECDM could be considered a semantic model given the semantic clarity which much be achieved through enterprise collaboration required to come to common naming and definition of conceptual data entities. The ECDM is a business model – and is technology, application, business process, and business unit neutral. It allows the enterprise to understand itself in a new way and should be understandable to humans, since it is a key means to achieve alignment between business and IT.
Semantic models or ontologies expressed in languages such as RDF/OWL are meant to enable computers to understand data and metadata in a new way to facilitate knowledge discovery (through inferencing) and increased machine capability. There can and should be much overlap and integration between the ECDM and ontologies – as is applicable for the domains and depth covered by the ECDM. An ECDM will encompass many domains (or subject areas) over time. Ontologies are “taxonomies and thesauri about a domain” – and so a product ontology would probably have taxonomies for product types, categories, and other hierarchical classifications. For example, a car would be categorized by make and model, and probably other taxonomies. These taxonomy entities would be represented in the ECDM, though the ECDM would not have the actual taxonomy values. An important distinction between a data model and ontology is that ontology can contain both classes and instances, where a data model does not display or contain the actual data instances.
The development of ontologies in RDF/OWL is not for the faint of heart. It is a relatively new paradigm, but has similarities with more traditional E/R data modeling – though with very real differences. For example, cardinality is a concept common to both data modeling and OWL for describing the maximum or minimum number of occurrences in a relationship. However, in OWL, cardinality may also be used to make inferences in the data. An important distinction to remember is that conceptual data modeling is all about knowledge description, whereas semantic models are about knowledge discovery – where a predetermined data structure might not be enforced. Given the AAA slogan applicable to the Semantic Web (Anyone can say Anything about Any topic), ontologies can be used to piece together disparate pieces of information from many sources across the internet or intranet and allow new knowledge to be inferred from these bits and pieces.
Aligning the ECDM and Enterprise Ontologies
Given the classes (entities) and relationships which already exist in the ECDM, can the ECDM be converted into a semantic model? The short answer is yes. Ontology Definition Metamodel (ODM) is a standard by the Object Management Group (OMG) for “model driven ontology development” using UML as a way to visually express and develop ontologies. Adoption and usage of enterprise ontologies (whether internally or externally developed) should be given consideration by a Data Governance Board / Council, or at least by a Data Stewardship Coordinating Committee. With ODM, ontologies can be expressed as UML class diagrams which can make the content easier to understand. Of course, not all Data Stewards are going to be able to interpret a UML model, but at least the diagrams provide easier visualization for the ontology.
ODM includes RDF, OWL, ER and other metamodels, plus mappings which enable an ER model to be translated into OWL and vice-versa. Figure 1 below is the ODM ER metamodel and Figure 2 below is the ODM ER to OWL mapping:
Figure 1 – ODM ER Metamodel
Figure 2 – ODM ER Model to OWL Model mapping
Unfortunately, there are a limited number of tools which currently support ODM, but it can still serve as a valuable resource for translating an ECDM into RDF/OWL ontology. For example, the OWL Class metamodel below shows that an OWL Class is a subtype of RDFS Class.
Figure 3 – ODM OWL Class Metamodel
Other Aspects of Information Architecture and Ontologies
While the ECDM has the most logical connection with enterprise ontologies, other aspects of Information Architecture can be incorporated into enterprise ontologies as well. For example, incorporating an Information Value Chain (IVC) Analysis into an enterprise ontology will provide a much richer ontology. The IVC can serve as a bridge between the information domain and business domains such as functional area, business process, or organizational unit.
Incorporating various information architectures (models) such as DW/BI, MDM, and Metadata and tying objects from these with implementation artifacts can enable rich knowledge discovery about the EIM program performance.
Of course, translating Information Architecture components into enterprise ontologies must be supported by clear business drivers. Unless the organization is very experienced with RDF/OWL, experts advise developing enterprise ontologies in very small (relatively) iterations. Even a modest number of asserted triples in an ontology can result in a large number of inferred triples. Spend adequate time in testing to ensure that the inferred triples created by the inferencing engine make sense.
Semantic models or ontologies typically fall under Information Architecture in EIM programs – these should not just be something that web developers put together. After all, the Semantic Web is a “web of data.” The ECDM is most closely related to ontologies as it identifies key business concepts and the relationships between them. Incorporating other aspects of Information Architecture into enterprise ontologies can enrich these and enable greater knowledge discovery about the business and the EIM programs.