Asking ten knowledgeable people for their definition of data integration may result in receiving ten different answers. Most of the answers will be an interpretation of Webster’s dictionary definition for integration, which is “to make whole or complete by adding or bringing together parts.”
These definitions could mean anything from two systems passing data back and forth (loosely coupled) to a shared data environment where all data elements are unique and non-redundant and are being reused by multiple applications (tightly coupled).
For example, creating an integrated view of a customer may start by bringing together customer-related data elements from multiple sources, such as:
- What the customer filled out in a loan application
- Transactions this customer performs with our organization
- Demographic data on the customer available from external vendors
- Data from sales force automation or order management applications
- Information received when the customer calls the customer service department or asks for information on the Web site
Unfortunately, much customer data is duplicated in several source systems. As a result, the redundant data is often inconsistent. Thus, just bringing together parts (the inconsistent data) to make whole the image of the business entity “customer” is obviously not the only purpose of with data integration.
A refined definition, based on Webster’s definition, could create a more complete description of data integration: “To make a business entity or subject area whole or complete by adding or bringing together their unique data elements and storing them non-redundantly (or at least consistently) so that they can be reused by multiple applications.” This adds a new level of complexity to data integration, which technology solutions alone cannot adequately address.
Ineffective “Silver Bullet” Technology Solutions
Realizing that data integration is the key to thriving in the fast-paced information-oriented economy, companies are looking for the silver bullet solution to their data integration problems. However, they discover quickly that the real issues underlying the current “dis-integrated” data chaos cannot be solved by technology solutions without fundamentally changing the way they manage their data redundancy.
Enterprise Resource Planning (ERP)
ERP solutions are a collection of functional modules used to integrate operational data to support seamless operational business processes for the enterprise. ERP products were meant to solve the redundant and inconsistent operational data mess by consolidating operational data into ERP modules.
Good idea – if properly implemented. Properly implemented meant that this was a chance to perform extensive research on the data to be converted into the ERP modules. Extensive research would have included analyzing all data elements for their definitions, contents, semantics, and business rules. This extensive business analysis activity would have included finding and fixing gaps in business knowledge and lost data relationships. Correcting existing data impairments should have been a mandate for every ERP conversion. However, it rarely, if ever, was.
Instead, most ERP conversion projects were performed the old-fashioned way using traditional source to target mapping, which does not include extensive data domain (content) analysis, such as:
- Finding data elements which have multiple meanings and are used for multiple purposes
- Finding missing relationships between business entities
- Finding and resolving duplicate keys
- Validating data contents among dependent data elements across files
- Deciphering, separating, and translating one-byte or two-byte codes
- Finding and extracting processing logic embedded in the data values
Therefore, because of performing a traditional conversion, the promises of the ERP technology solution to solve data integration and data quality problems did not materialize to the extent expected.
Data Warehousing (DW)
Another serious attempt at integrating data has been the drive towards data warehousing. The definition of data warehousing speaks to this attempt: “A DW delivers a collection of integrated data used to support the decision making process for the enterprise.” A solution at last, and a sound plan – if properly implemented. Properly implemented once again meant extensive analysis of the operational data to find the data redundancy and data inconsistency problems and to correct them.
This detailed analysis had to be performed on all data elements within the scope of the data warehouse to deliver a pool of non-redundant, clean, consistent, and integrated data. This type of analysis cannot be magically performed by a tool but requires business analysts who have the knowledge to define the organization in business terms and with a cross-functional scope. These business analysts should be business people who are data stewards or data consumers, guided by a skilled IT data analyst who documents the results of the analysis.
In reality, very few DW projects have the necessary user involvement. Users are still trapped in the habit of expecting IT to build a customized silo system just for them. IT is still trapped in obliging. Both are trapped in not understanding the real issues, or choosing to ignore them. As a result, DW initiatives are more often than not little more than silo data mart projects on a new platform, and many of the data integration problems remain to the dismay of disappointed users.
Customer Relationship Management (CRM)
CRM attempts to integrate customer information with product information through related business functions, such as sales, marketing, and order fulfillment. CRM has metamorphosed into a sophisticated set of tools and applications. Unfortunately, as with ERP systems, too many CRM conversions follow the traditional habits of source to target mapping without extensive data analysis and without too much data cleansing. Data is usually moved “as-is” into the new CRM modules with the unreasonable expectation of magically having clean, consistent, and integrated customer data in their CRM system as a result. Once again, simply using new technology is not the whole solution.
The need for integration will increase when more competitive demands are placed on organizations. Data integration must happen, to comply with business demands, and it must happen quickly. EAI and EII vendors understand this pressure placed on organizations. They also realize that organizations have a colossal investment in their existing systems, which should be leveraged, if possible.
EAI and EII middleware technology provides that leverage by allowing the unrestricted sharing of data among any connected applications and data sources in the organization. The obvious advantage of these technologies is that existing data can be shared without having to make any changes to existing systems and without having to build new ones. This could be a cost-effective solution to solve data access and data integration problems across heterogeneous systems and platforms, if – and only if, the disparate data were non-redundant or at least consistent, and if it reasonable performance criteria allow sharing and collaboration. It also assumes that all the associated metadata can be collected and managed to support data integration and data access.
The reality is that EAI and EII tools only eliminate writing customized bridges between existing systems. The tools have no effect on the quality of the data in the existing systems, nor do they help the business people interpret the data. For example, the data element “Annual Income Amount” could be stored redundantly in 45 customer files and in 30 of the 45 files, these “duplicate” amount fields have a different value. EAI and EII tools are of no value to determine which one of these amount fields is the correct one. That determination must be made by data owners or negotiated with data consumers, while knowing or having rapid access to the definition and format of each instance of that data element.
Conclusion
Data integration is a topic that is well publicized, much talked about, and heavily hyped by many tool vendors. It is a complicated topic that requires considerable effort if implemented fully. Some less effective alternatives to data integration are data consolidation and data federation. EAI and EII middleware technology can also be utilized when appropriate and cost-effective, but realize that true data integration cannot be achieved by technology alone.