Data Integration is essential for successful information management and business intelligence
Without data integration, enterprise integration is extremely costly and overly complex. Many businesses are convinced that Business Intelligence the Internet, and Customer Relationship Management (CRM) are critical, strategic areas of investment. However, most organizations are struggling to deploy the enabling technologies. What are the constraints that prevent businesses from achieving these important goals? There are many obstacles to overcome, from proper organizational change management to maturing underlying technologies. The teams behind the implementation of these solutions frequently are able to harness the new technologies, and are capable of gaining organizational support. The critical constraint that must be overcome is often overlooked: the overwhelming task of integrating a set of otherwise disparate systems using proven data management techniques.
Challenges to Data Integration
The struggle is trying to share and use data, from one application to another, or from one business or business unit to another, in ways that were not foreseen when the original applications were deployed. The data, frequently assumed to be of the appropriate quality for the new application, is not fit for analytical purposes. The result? Many labor hours are spent in unplanned and not budgeted tasks trying to find, understand, and reconcile the data to meet the new need. This issue is also manifested when changes to legacy or newly developed applications must be made in response to changing business needs. Many times the cost of making “very small changes” to an application is unbelievably large. It becomes a very difficult issue to explain to business sponsors and executives the reasons for all the data changes, especially when the existing applications work properly.
Reasons for Data Integration Challenges
What causes this problem? The answer is in the data architecture of the installed base. An enterprise has tens or even hundreds of applications with hundreds or thousands of point-to-point connections (interfaces) between them. Furthermore, each team deploying a new solution creates new interfaces or makes corrections or adjustments to existing ones, to achieve implementation. In these “on the fly” adjustments, little is documented of the interface or the modification. This continuous development and maintenance of point-to-point connections make an already large and complex problem larger and more complex for the next application or modification. Why is this done? This siloed, piecemeal approach is how system interface development is taught to systems personnel, and this data sharing approach based on a point-to-point solution is pervasive in the industry.
Results of Data Integration Mistakes
This approach to build interfaces, or execute conversions, on a point-to-point approach creates the IT applications “Fur-Ball.” The term “Fur-Ball” was selected based on behavior. If an application or interface is modified, the impact is nearly impossible to predict because there are so many dependencies that, most of the time, some are not identified, much less analyzed (like trying to follow a thread in a Fur-Ball). This is the data architecture of the installed base. It is an architecture no one selected as the state they wanted to achieve (no one said “we want a Fur-Ball of data interfaces and conversions between our applications”). It is an architecture that provides constraints on how applications are designed, integrated to one another, deployed and operated; it provides a framework that makes all applications very similar in how they acquire, use and share data.
However, the Fur-Ball creates two undesirable conditions in the enterprise data systems. First, it creates a mechanical problem. In this architecture, the applications are tightly coupled. An application is tightly coupled to another application if the first is exposed to the internals of the latter beyond the data that is shared. In this architecture, if one application changes it frequently impacts many other applications, that frequently affect others, creating a domino effect. You may say that most applications are data coupled; they are exposed only to the data they need from the other application. However, the data contained in the “source” application is not factual (it is not a reflection of the facts represented), but overloaded with application processing information or rules (or convoluted with sequencing or processing rules). Therefore, the “target” application must become aware of these processing rules to be able to simplify them to use the data properly. When the source changes its rules, even though the target does not need to change per se (the data did not change, but the representation in the source did), the target must now change to align with the new rules, and it must do it in lock-step!
Second, it creates a semantic problem. In this architecture, the applications contain the same facts, but they are represented in different and many times inconsistent ways. A typical symptom is when the same report is produced with the same information from two different applications or business areas, and the information is inconsistent (e.g., one report indicates 12,354 claims opened last month, the other indicates that 13,405 claims opened last month). In many instances, the same fact, say the “collision limit” is represented by different applications based on how they process the data (how they sort, sequence, establish hierarchy between coverage limits, etc.) not the fact that it is intended to represent. Other examples of processing or sequencing convolution include the combination of facts to meet one application processing constraints (if is not in the continuous 48 states, roll it into “foreign”), or split them apart using various, un-reconciled business rules.
In summary, this environment is characterized by incoherent information, hundreds of scheduled and ad-hoc conflicting reports, “departmentalized” data files, “data specialists,” ad-hoc Tiger Teams launched on “data quests,” etc. What is even scarier for businesses is the fact that many times the rules used to make these transformations are only known by the code in the interfacing application -the programmer and the business matter experts involved are no longer with the areas or the company.
When a business with this condition tries to deploy a new application whether CRM, BI, EAI, IAI, or an Internet or a simple traditional solution, the company is faced with this dilemma, the “data problem.” How do you know if your enterprise suffers from the “Fur-Ball” syndrome or the “data problem”? It is simple, if changing an application must be coordinated in lock-step with other applications unrelated to the change, you have mechanical problems. If the business areas indicate that they get “too much data, but not enough information”, inconsistent or conflicting reports or information, or erroneous data that “they know is good in the source system”, you have semantic problems. When the cost of not managing data, including its replication, exceeds the “pain threshold” of your company, you know you have a severe Fur-Ball problem!
How did the enterprise get to this state? First, most organizations do not recognize that they must manage their data as an asset, so they do not follow accepted data management best practices including in data integration. Second, seldom do they recognize the fact that data is being replicated, even less often do they see the need to manage replication of their data, and again, they don’t follow the best practices for replication.
It is essential for all IT professionals to understand the data integration problem, define the approach to solve it, identify and apply the correct policies, processes, procedures, technologies, etc., one step at a time.
As an example, one analysis discovered that not having an agreed upon, company-wide definition and specification of each piece of data was diminishing its value. Another critical finding indicated that the use of EAI and Data Warehousing technologies in isolation was less effective than using them in concert. Another critical finding indicated that organizational metadata practices were lacking, therefore forcing business and IT areas to the “throw everything into a shoe box” approach for metadata management. However, the most important finding was that not having top level executives understand the significance of the data management problem and lacking their support in implementing solutions was the largest impediment to a successful resolution of the “data problem” or, “untangling the Fur-Ball”.