Skip to content.

Sections

Why ETL and Data Migration Projects Fail

by Joseph R. Hudicka

Introduction

Nearly every IT initiative today involves some level of data migration. Companies seeking to leverage newer, faster technologies must transfer data from legacy systems to new environments. Others are building business intelligence solutions by integrating data from various sources and using this data for strategic planning. Still, others are building more efficient business alliances by exchanging information between companies electronically.

Of course, there is a significant assumption underlying each company's decision to pursue initiatives like these-accurate, well organized data sources. Unfortunately, most companies do not even consider data quality until initiatives like these are underway.

This is not entirely surprising. Data has been historically collected for operational purposes - satisfying today's customer orders, repairing today's list of equipment, processing today's list of payments, etc. Data quality issues are not always obvious when looking at a single record of data. To the contrary, data quality issues become blatantly obvious when data from various sources is merged together, or when data is aggregated over a period of time.

Data migration and/or ETL projects are rarely thought about independently, which is largely because they occur as a result of a large initiative, like those described at the beginning of this article.

Bottom line: ETL and data migration projects continually run over budget and run out of time. It is time that we change our overall outlook and approach to correct this recurrent problem.

Just the Facts…

Those of us involved in this endeavor have known for years about the problems that can arise during data migration and the affects they have on time and expense. Anecdotal evidence is great, but it is not enough. An objective survey was needed to sample a broad base of people familiar with ETL projects and to demonstrate the real problems that occur everyday, over and over again.

Information Architecture Team constructed a survey that allowed users to share their experiences on how these tools impacted the timelines and costs of their projects. The results of this ETL survey have been very enlightening. There were over 300 quality responses, and each respondent has conducted an average of four ETL/data migration projects. That translates into over 1200 migration projects!

ETL Survey Results

The cost of doing a data migration project is reflected in Figure 1.

Figure 1

Figure 1. The Cost of Data Migration

As you can see, the median project costs over 10 times the amount planned! And this is from experienced folks that have done at least four migration projects, yet still continue to be plagued by dismal results. It is important to note that the survey did not consider projects that were cancelled largely due to data migration difficulties.

So where is the extra cost coming from? A look at where the time is being spent helps to answer this question. The following charts display how the mapping and testing phases exceeded their scheduled time.

Figure 2

Figure 2. Testing and Mapping times exceeding schedule

Look closely at Figure 2. The survey yielded a two-month overture for mapping-not because of the amount of time, but because it was reported at all. Overall, more than 75% of all migration projects exceeded the planned timeline!

In most migration projects, mapping is reported as being completed on time. This is because there are no mechanisms for testing the completeness of the mapping without moving into the testing phase. Mapping is generally accepted as complete when each "source" field is assigned to a corresponding "target" field.

But do the data types match? Do the source fields contain NULL values when the target fields are mandatory? Are the source values too long for the target field? These questions and several more are typically not considered until testing takes place, and the problems are encountered, rather than discovered.

Conclusion

The key to controlling the time and monetary expenditures of ETL and data migration projects is planning. Those embarking on a data migration project need a detailed methodology that ensures that each migration requirement has been carefully examined and planned for proactively. Otherwise, you will remain in a reactive posture. The result will be that your project will fall victim to the same mistakes of migration that so many others have.

About the Author

Joseph Hudicka is the founder of the Information Architecture Team, an organization that specializes in data quality, data migration, and ETL. Winner of the ODTUG Best Speaker award for the Spring 1999 conference, Joseph is an internationally recognized speaker at ODTUG, OOW, IOUG-A, and many local user groups. Joseph has extensive knowledge in the areas of information architecture, metadata, business intelligence and ETL/data migration, with substantial experience in pharmaceutical, retail, public, and e-commerce applications. Joseph coauthored Oracle8 Design Using UML Object Modeling for Osborne/McGraw-Hill & Oracle Press and has also written or contributed to several articles for publication in DMReview, Intelligent Enterprise and The Data Warehousing Institute (TDWI)l. You can contact Joseph at jhudicka@ia-team.com