Affiliated with:

data migrate

Independent data marts cause many problems in data warehouse architecture; combining to result in numerous issues for business intelligence and analytics solutions.

1

A severe disease has spread to epidemic proportions throughout our society.  This disease is particularly dangerous as it effects are not readily identifiable at the time of infection.  However if this condition goes untreated it can be debilitating and even terminal.  This disease is not hepatitis, but rather “independent” data marts .  While this imagery may seem a bit dramatic, unfortunately it reflects the reality in many of today’s companies.

This article will address how to migrate from independent data marts into an architected  data warehouse solution.  It will address the characteristics of independent data marts, the flaws in their architecture, and the reasons why they exist.

Characteristics of Independent Data Marts

Independent data marts are characterized by several traits.  First, each data mart is sourced directly from the operational systems without the structure of a data warehouse to supply the architecture necessary to sustain and grow the data marts.  Second, these data marts are typically built independently from one another by autonomous teams.  Typically, these teams usually will apply varying tools, software, hardware, and processes.  Possibly the most visually descriptive trait of a company that has constructed independent data marts is that once they map out a process flow of their data warehousing environments (DWE), the flow will resemble that of a “spaghetti” chart (See Figure 1) .  What is most disturbing is the number of companies that have stated that this chart resembles their current DWE architecture.

2

Figure 1: Independent Data Mart Architecture

As we see this architecture is not an architecture at all.  Instead, it is a series of “stovepipe” DWE systems.  This architecture greatly differs from that of an architected data warehouse (See Figure 2).

Enterprise DW Architecture Sahama Croll 2007

Figure 2: Architected Decision Support / Business Intelligence System

The purpose of this article is to discuss independent data marts and the process for migrating from them to an architected solution; however, we will touch briefly on the topic of DWE architecture.  We will not go into a detailed discussion of top-down vs. bottom-up approaches (we will save that topic for another article) , except to say that the “classic” top-down approach is a more scalable, and logical approach for constructing a DWE system.

It is surprising how often the top-down methodology is mistaken for a “galactic” approach.  This is a misunderstanding since the top-down approach is best used iteratively and incrementally to build the DWE system.  When used in this fashion the cost for building a data warehouse that feeds “dependent” data marts becomes highly comparable to the cost of building independent data marts.

Problems with Independent Data Marts

Redundant Data

As the number of independent data marts grows, the amount of redundant data begins to grow uncontrollably across the enterprise. This redundancy occurs because each of the independent data marts requires its own, typically duplicated copy of the detailed corporate data. Often a great deal of this detailed data is not required in the data marts, which typically provide summarized views.

It would be enlightening if a study were conducted to calculate the costs of maintaining non-necessary redundant data for Fortune 1000 companies.  The end total would be in the billions of dollars in expenses and lost opportunity.

Redundant Processing

A data warehouse provides the architecture to centralize integration and cleansing activities common to all of the data marts of a company.  Without the data warehouse, all of these integration and cleansing processes need to be duplicated for all of the independent data marts. This greatly increases the number of support staff required to maintain the DWE system, creating a particularly disastrous situation for most companies in light of today’s IT staffing shortage.

Separate teams typically will build each of the independent data marts in isolation of one another.  As a result, these teams do not leverage the other’s standards, processes, knowledge, and lessons learned.  This results in a great deal of rework and re-analysis.

These autonomous teams will commonly select differing tools, software, and hardware.  This forces the enterprise to retain skilled employees to support each of these technologies. In addition, a great deal of financial savings is lost, as standardization on these tools does not occur.  Often a software, hardware, or tool contract can be negotiated to provide considerable discounts for enterprise licenses, which can have multiple implementation phases to reduce immediate costs.  These economies of scale can provide tremendous cost savings to the organization.

Scalability

Independent data marts directly read operational system files and/or tables, which greatly limits the DWE system’s ability to scale.  For example, if a company has five independent data marts it is likely that each data mart would require customer information.  Therefore, there would be five separate extracts being pulled off of the same customer tables in the operational system of record.  Most operational systems have limited batch windows and cannot support this number extracts. With a data warehouse, only one extract is required in the operational system of record.

Non-Integrated

As previously discussed, each independent data mart is built by autonomous teams, typically working for separate departments.  As a result, these data marts are not integrated and none of them contains an enterprise view of the corporation.  Therefore, if the CEO asks the IT department to provide him with a “listing of our most profitable customers” each data mart will offer a different answer.  Having worked with companies that have experienced this exact situation, I can attest that the CIO is rarely pleased to have to explain why his department cannot answer this seemingly simple question.

One of the chief phenomena facing corporations today is the current merger and acquisition craze.  Interestingly, one of the key factors fueling this movement is the desire by many companies to reduce their IT spending. In light of the situation the costs associated with independent data marts becomes even more magnified as companies continue to focus on controlling their ever-growing IT costs.

It is important to note that many companies that have built independent data marts are currently in the process of migrating from them.  The cost, in dollars and time for the migration is not trivial.

Why Do Independent Data Marts Exist?

With all of these architectural flaws it would seem surprising that so many companies have built their DWE systems around this architecture.  There are several reasons why this aberration has occurred.

Data Warehouse Environments (DWE) Are Complex

When the data warehousing craze spread, most companies were looking to build one of their own.  Unfortunately, the task of building a well architected and scalable business intelligence system is complicated and requires sophisticated software, expensive hardware, and a highly skilled and experienced team.  Finding data warehouse architects and project leaders that truly understand data warehouse architecture is a daunting challenge, both in the corporate and consulting ranks.

To construct a data warehouse a corporation must come to terms with their data and the business procedures that the data represent. While this task is challenging it is a necessary step and one in which the true value of the DWE process is derived from.

Independent Data Mart Shortcut

Building independent data marts are initially less expensive than architected data warehousing environment.  In addition, independent data marts can be constructed quickly and do not require a company to really understand their data beyond that of individual departments as a data warehouse requires.  These points have been used effectively to sell the concept of constructing independent data marts.  Unfortunately, it is this lack of thorough analysis and long-term planning that limits the independent data marts from being an effective business intelligence system.

Inappropriate Vendor Messages

Many vendors have developed tools that are effective at building small, departmental independent data marts.  These companies in their rush to market with these tools have worked very hard at selling the independent data mart concept (of course it is never worded like this).  The reasons are obvious.  These companies can significantly reduce their sales cycles because only one department is involved in the software purchasing decision.  In addition, their software requires much less sophistication because they merely need to build a standalone data store.

The current vendor buzzword in today’s market is “turnkey”, or “integrated solutions.”  Everyone seems to offer a “turnkey” or “integrated” DWE solution.  Unfortunately, merely purchasing a “turnkey” solution does not alleviate the task of learning and understanding a corporation’s data and their business processes.  Integration of data from disparate systems requires a careful analysis and an understanding of business processes and the data that represents them.  There is no “magic bullet” or “turnkey” solution that alleviates this task.  An “integrated” solution requires that the organization understands all the sources that will contribute to the final result.

Approaches to Migration

There are two general approaches for migration; “Big Bang” and “Iterative”.  Table 1 summarizes the advantages and disadvantages of each approach.

3

Table 1: “Big Bang” vs. Iterative Approaches to Data Warehousing / BI

Big Bang Approach

As the name implies all of the independent data marts will be reengineered simultaneously into a structured DWE architecture.  There are some advantages to this approach.  First, it can provide the fastest path for migration.  Often, companies will need to change their DWE architecture as quickly as possible because of a need to implement additional DWE projects that promise to lend a high return on investment (ROI) or because there are funds available for the effort currently that might not be available later.

Second, this approach allows for immediate economies of scale rather than slowly attaining them in Iterative method.  The disadvantages to this approach are that it is labor intensive and requires tremendous coordination.  In addition, the “Big Bang” approach is the more complex of the two to implement and thus provides the highest exposure to risk.

This approach is best suited when the independent data mart problem is relatively small and not highly complex.  However, when the problem is large the complexity of the migration grows at a tremendous rate.

Iterative Approach

This approach looks to reengineer the independent data marts (one or two data marts at a time) in manageable phases.  The advantages to this approach are several.  First, it allows a company to manage and reduce the risk involved in a migration effort.  This occurs because the migration can be accomplished in a phased manner, thereby increasing the probability of the project’s success.

Second, as each project phase is executed lessons are learned and leveraged for subsequent phases.  This is very valuable since usually, once the first phase is completed, the follow up phases run much more smoothly.

The major disadvantage to this approach is that it takes longer to fully complete the migration.  This approach is most successful when the independent data mart problem is large and too complex to tackle in a “Big Bang” manner.

Initial Planning

Many companies fail in their migration efforts long before they start.  The chief reason for this is the lack of initial planning and sponsorship.  Obtaining executive sponsorship is one of the most important tasks at the onset of the project.  This is critical since typically autonomous teams in different corporate departments have constructed each of the independent data marts.  Therefore, having a project champion that has cross-departmental authority is critical for dealing with the political challenges, which are commonplace in these migration efforts.

During the initial planning phases it is important to plan on implementing a metadata repository that can support future DWE development efforts and that will provide a semantic layer between the business users and the DWE system.  The data mart migration provides an outstanding opportunity to implement the metadata repository. Before the data mart migration begins it is best to standardize the data naming nomenclature for the DWE system.  Implementing standard data naming nomenclature will aid in the DWE system’s maintenance and provide cleaner and more understandable metadata.

A great deal of research must be conducted on the independent data marts before a migration is possible (Table 2 summarizes these tasks).  The most important research activity is to understand the business needs that each independent data mart is meeting.  Typically, multiple independent data marts will exist to meet the same or similar business needs.  These situations are common and do suggest a path for migration.  The results of this research will illustrate the independent data marts that will be the most difficult to migrate.

During independent data mart migration, it is an excellent time to standardize on hardware, and software for the DWE project.  For each differing software or hardware platform, a company needs to have trained personnel to support it.  Therefore, by limiting the redundant software/hardware the corporation reduces the support strain on their IT staff.  In addition, standardizing allows for software and hardware purchasing economies of scale to be achieved.

4

Table 2: Independent Data Mart Migration Research Tasks

Golden Rule

The central covenant of any independent data mart migration effort is to “Never delivery less functionality to the business users than what they have today”.  Generally business users do not react well to spending money on infrastructure because they don’t initially see its value.  The key business users need to be educated that a bad system architecture leads to a non-scalable and non-flexible system that will eventually need to be rewritten at a very high cost.  Therefore, during migration the users must be assured that they will not receive less functionality (information, ease of use, and response time) than what they are currently receiving today.

Identifying a Migration Path

There are several activities that are necessary to conduct before a migration path will be evident.

Create Your Own Spaghetti Chart

First, diagram out the current DWE architecture.  This is critical for identifying which legacy systems are feeding which independent data marts.

5

Figure 3: Diagram Current DWE Architecture

Identify Redundant Data

Often, independent data marts will be sourced from the same legacy systems.  By targeting independent data marts with the same source data often multiple independent data marts can be removed with minimal extra effort.  Identifying redundant data often suggests a migration path.

Figure 4 illustrates existing independent data marts for a company.  In the schematic both the Finance and Marketing data marts are sourced from the same legacy systems.  This suggests that it might be wise to target both of these data marts for initial migration (assuming the Iterative approach is used).

6

Figure 4: Identifying Redundant Data Sources

Identify Paths of Least Resistance

Data

It is important to target those independent data marts whose data will most likely be used in future DWE efforts.  By targeting these data marts first, it will ease the task of keeping all new DWE development activity in the newly architected environment.

The next step is to identify those data marts whose transformation rules are known and documented.  Understand that even the best-documented transformation rules will have gaps.  Moreover, even those marts that have been built using ETL (Extraction/Transformation/Load) tools have metadata (documentation) gaps.  For example, ETL tools can provide the functionality to call user exits that are hand-coded programs. The processes performed by these user exits will not be captured in the ETL tool’s metadata stores.  If documentation does not exist for a mart then programmers will need to manually analyze each of the ETL program’s code to extract the transformation rules.  Manually analyzing code to extract transformation rules is a very time consuming and expensive activity.

Political

It will be critical to obtain support from the current independent data mart IT teams and business users.  Identify those data mart teams most likely to work cooperatively with the centralized DWE team.  Recognize the strengths and weaknesses of those teams that can and will provide the most aid.  If a particular data mart team/business users are not willing to assist with the migration effort it is best to work around these teams by delaying the migration of their particular data mart.  If this is not an option, then utilize your executive sponsorship to “motivate” this group to provide their support.

Understand your team’s strengths and weaknesses

Keep in mind that any team will have its stronger and weaker areas of knowledge and experience.  As much as possible keep your team’s areas of weakness off the critical path.  Any mission critical team weaknesses must be reinforced with internal members from the other data mart teams or from outside vendors. Consider training all the team in the areas of data management to ensure a strong team.

Case Study: Putting the Concepts into Motion

The following case study looks to put the concepts we have discussed into action.  This case study illustrates the iterative approach to independent data mart migration since most companies that have independent data marts typically have a pervasive and complex situation.

Background

The XYZ company is a Fortune 500 consumer electronics firm.  XYZ recently acquired a smaller company (Acme Electronics) that has a single Marketing data mart; little is known about this data mart. In addition, XYZ is standardizing on a new order entry system in five (5) years and existing batch windows for the legacy systems have reached its limit.  XYZ’s management team is stable, well organized, and fully supports the migration effort.  Table 3 lists the DWE specific details and Figure 5 shows the current DWE architecture.

7

Table 3: Case Study DWE Background

7

Figure 5: Independent Data Mart Architecture

Phase One Migration

By viewing the data, it is evident that the Marketing and Finance data marts share two common data sources (old and new order entry systems).  In addition, the Marketing data mart has a strong end-user community that will be highly supportive of the migration effort.  In addition, both the Marketing and Finance data mart’s business users have agreed to freeze their additional functionality requests for Phase One of the migration.  Identifying the business and technical metadata for each data mart was an essential component of this phase.

During this phase, we avoided migrating the Quality Control and the Acme Marketing data marts.  This occurred because of the lack of support in the Quality Control mart and all the unknown aspects of the Acme Marketing mart.  Figure 6 illustrates the Phase One DWE architecture.

9

Figure 6: Phase One DWE Architecture

Phase Two Migration

During this phase, the operational logistical system’s data will be brought into the data warehouse and the Quality Control data mart is now being sourced directly from the enterprise data warehouse.  In addition, during this phase the Marketing and Finance teams change requests that were frozen during Phase One implementation are now being developed.  Lastly, a new dependent Accounting data mart is now sourced from the data warehouse.

10

Figure 7: Phase Two DWE Architecture

Phase Three Migration

In this phase we are merging the functionality in the former Acme Electronics Marketing data mart into the existing dependent Marketing data mart.  Also, additional data marts are continuing to appear (CEO data mart).

11

Figure 8: Phase Three DWE Architecture

Conclusion

It is important to understand that the process for migrating from this architecture is a costly proposition that will only get more expensive and difficult as time goes on.  Remember, as with any disease the earlier it is detected and treatment begins the sooner the patient will become healthy.  However, if treatment is delayed the patient’s condition will worsen and eventually become terminal.

LinkedIn
Facebook
Twitter

Dr. David P. Marco, LinkedIn Top BI Voice, IDMMA Data Mgt. Professional of the Year, Fellow IIM, CBIP, CDP

Dr. David P. Marco, PhD, Fellow IIM, CBIP, CDP is best known as the world’s foremost authority on data governance and metadata management, he is an internationally recognized expert in the fields of CDO, data management, data literacy, and advanced analytics. He has earned many industry honors, including Crain’s Chicago Business “Top 40 Under 40”, named by DePaul University as one of their “Top 14 Alumni Under 40”, and he is a Professional Fellow in the Institute of Information Management. In 2022, CDO Magazine named Dr. Marco one of the Top Data Consultants in North America and IDMMA named him their Data Management Professional of the Year. In 2023 he earned LinkedIn’s Top BI Voice. Dr. Marco won the prestigious BIG Innovation award in 2024. David Marco is the author of the widely acclaimed two top-selling books in metadata management history, “Universal Meta Data Models” and “Building and Managing the Meta Data Repository” (available in multiple languages). In addition, he is a co- author of numerous books and published hundreds of articles, some of which are translated into Mandarin, Russian, Portuguese, and others. He has taught at the University of Chicago and DePaul University. DMarco@EWSolutions.com

© Since 1997 to the present – Enterprise Warehousing Solutions, Inc. (EWSolutions). All Rights Reserved

Subscribe To DMU

Be the first to hear about articles, tips, and opportunities for improving your data management career.