Data Warehouse Success
By Sid Adelman and Larissa Moss
This article is excerpted from a book titled Data Warehouse Project Management, scheduled to be published by Addison Wesley Longman in the first half of 2000.
Addison Wesley Longman, Sid Adelman, Larissa Moss
Measures of Success and Failure
There has been much heated discussion over the failure rate of data warehouses. Luminaries disagree on the percentage of those that have succeeded. The problem may be with the definition of success and failure. In fact, very few organizations have identified, up front, what for them will be success or failure. That being the case, any industry-wide numbers on failure are meaningless.
We have all seen projects that are grossly over budget, years behind schedule, and delivering a fraction of the promised function. For various reasons some of these projects have been declared a success with promotions, industry and publication awards and bonuses for all involved. There is a clear problem in that we have no accepted definition for success.
So that everyone is working off the same set of goals, and so that you are legitimately able to declare victory, we suggest that you, the project manager, list your measures of success and have that list approved by both IT and by the business sponsor.
Measures of Success
There are a number of examples of success indicators. Let’s take a look at some measures of success.
- Return on Investment (ROI)
ROI can be achieved a number of ways:
- Lower cost
Costs could be lowered through better inventory management, fewer dollars spent on unproductive measures, product promotions, etc.
- Improved productivity
Greater productivity could be expected from both IT and the user. Today user analysts may spend 80% of their time gathering data and only 20% analyzing the data. The data warehouse should reverse those numbers. IT will still be responsible for developing complex reports as well as writing reports for production systems. The data warehouse can provide reporting tools with a well documented, clean, and easily accessible database. This capability should significantly improve IT productivity.
- Increased revenue
This could be as a result of greater market share and increased sales as marketing is able to more effectively target customers and provide the right products at the right time to the right market.
The effects of costs and revenues may be difficult to assign to the impact of the data warehouse. As the data warehouse is being implemented, the organization is not standing still. There are both internal and external factors that impact both costs and revenues, and so the actual benefits of the data warehouse may be difficult to determine.
- Lower cost
- The Data Warehouse is used
One of the easiest categories to understand can be measured by the number of users and the total number of queries and reports generated. If queries and reports are run regularly, it’s a good indication that the users are achieving some benefit.
- The Data Warehouse is useful
The data warehouse may be used, but the users may find the benefits to be marginal and illusive. It’s important to ask the users what they see as the benefits of the data warehouse, how it has changed the way they do business, how it may have improved their productivity, and how it may have improved the quality of their decisions.
- Project delivered on time
This measure is problematic, as schedules are often set without an understanding of what is involved and how long each project task will take. "On time" is only relevant if a realistic schedule is the base for comparison.
- Project delivered within budget
This criterion may be difficult to achieve since the total costs of a data warehouse are difficult to determine. Initially, you may not have known how many users to expect, how many queries and reports they would be generating, and the complexity and resources used by the queries and reports. You did not know how large the data warehouse would be, or how many indexes and summary tables would be required and desired. You may not have anticipated needing a larger CPU. You may not have known that the software was more difficult than the vendors represented, and that teams of software consultants would be required. You may not have anticipated needing to upgrade our network to support the increased line traffic. You may not have anticipated needing to raise the salaries of the data warehouse team nor the increased cost of recruiting the talent required to make the project a success. All these factors will contribute to severely underestimating the budget. "Within budget" is only relevant if a realistic budget is the basis for comparison.
- Improved user satisfaction
Users may be internal, external or both. In all cases, the goal is to have users who are happy with the features and capabilities, the performance, the quality of data and the level of support.
- Additional requests for data warehouse functions and data
You’ll know you were successful if other user departments are beating down your door with requests for access to the data warehouse, and current users are requesting new data and functions to be added to the existing data warehouse.
- Business performance-based benchmarks
This is the most subjective of all the measures and will become the most controversial. Most industries have sets of industry averages as well as benchmark (the best) companies against which they make comparisons. For example, the industry average for building a car may be ten worker days. With better information, a car manufacturer in the middle of the pack, may have a goal to manufacture a mid-size sedan using eight worker days. The data warehouse may be able to provide improved, more complete and timelier information and, with this information, the auto manufacturer may be able to achieve their productivity goals.
- Goals and objectives met
Success will be defined by how well these goals and objectives were met. No doubt, not all were met or were only partially met. A scorecard will give you an initial and then an ongoing measure of your project’s success.
- Business problems solved
The data warehouse was developed for some specific reason. Perhaps marketing was unable to identify customer demographics for target marketing. If the data warehouse now provides this capability, it should be considered a success.
- Business opportunity realized
The identified opportunity might have been the ability to provide information to suppliers through the web so they would be able to respond more quickly to your demands for components you need for your manufacturing process. If the supplier now has fewer stock-outs, the project is successful.
- The data warehouse has become an agent of change
The world is changing, and the rate of change is accelerating dramatically. Successful organizations must be able to respond and respond quickly. Decisions must be made more quickly, but this can only happen with better and more timely information. There can be some fundamental changes to the business in the manner and speed in which decisions are made and, the data warehouse can be the vehicle for that change.
Measures of Failure
We may not be sure if the data warehouse is a success but we will always know when we have failed. Some of the indications of failure are:
- Funding has dried up
This could be because the sponsor has moved on or is no longer interested in the project. There could be other factors that have nothing to do with the success of the project such as the company being bought by Philistines who have no appreciation of your efforts or the data warehouse possibilities. Real failure results if the project is perceived as having no real benefit to the organization.
- Users are unhappy with the quality of the data
If you took a shortcut and decided not to cleanse the data, users will reject the data warehouse. The users may even have told you not to spend the time to understand and clean the data, that they want the data warehouse as soon as possible, and they don’t care if the data is dirty. Don’t believe them. Even if the initial users don’t care if the data is dirty, other users in the organization who don’t know how to interpret the dirty data do care. These new users would have to learn how to decipher the dirty data or they would get the wrong results if they don’t realize it’s dirty. If the quality of the data is poor, the data warehouse will be a failure.
- Users are unhappy with the query tools
The notion of "one size fits all" is inappropriate for selecting tools. Power users need and want different tools than those fitted to the inexperienced and technologically intimidated users. The needs of each user segment must be met with the appropriate tool and the appropriate training and environment. Users will tell when they don’t like the query tool you have foisted on them.
- Only a small percentage of users take advantage of the data warehouse
You may have expected everyone in marketing to be actively writing queries. They all went through training and graduated with a query tool certificate. However, it turns out that whenever anyone needs some information, they all go to only one person in the department who is a frequent and knowledgeable user.
- Your sponsor avoids you in the hallway
When the only available seat in the lunchroom was at your table and the sponsor chose to eat standing up you may have a problem.
- Poor performance
Response time is so bad that users only launch queries right before leaving on extended vacations. It’s not only that response time is bad, but also there does not seem to be any means of improving performance that anyone is aware of.
- Inability to expand the data warehouse (it’s not scalable)
You expect the data warehouse to grow, but if the technical architecture does not allow the expansion of the number of users and the size of the data base, the infrastructure must then be recast at enormous cost and lost opportunity.
- Data that is not integrated
One of the goals for the data warehouse is the ability to integrate data coming from many and heterogeneous source files and databases. You wanted departments to be able to share data and to have data of record for the organization; data that everyone in the organization would accept as correct. A proliferation of data marts that do not have common keys or common data definitions is an indication of islands of data marts that cannot be shared.
- Extract, Transform and Load (ETL) steps do not fit in the batch window
ETL can gobble up machine resources and time. A number of data warehouses have failed because of the very lengthy time required for ETL. This problem is especially critical to global companies who have to be up and running in several time zones and are therefore starting with a reduced batch window.
Measuring Results
The only way we will know if we are successful is to monitor and measure the project. The measurements are both subjective and objective. Just like certain medical tests, some of these measures are invasive and may have negative consequences, such as an impact on performance or the stability of the system. Some measures are costly; they require knowledgeable people and machine resources to carry them out. The clever project manager will select the appropriate metrics by evaluating both cost and impact. The metrics that should be considered are:
- Functional quality
Do the capabilities of the data warehouse satisfy the user requirements? Does the data warehouse provide the information necessary for the users to do his or her their job?
- Data quality
If the data warehouse data is of poor quality, the users will reject it. There are two means of measuring quality:
- Ask the users if their reports are accurate,
- Use a software tool to provide a scorecard on the quality of the data.
Be aware that the software tools cannot evaluate all types of data quality.
- Computer performance
There are four indicators of performance we should consider:
- Query response time,
- Report response time,
- Time to load/update/refresh the data warehouse,
- Machine resource.
Some organizations have established benchmark performance numbers for known queries and reports, and they exercise and measure these benchmarks periodically looking for impending performance problems. There are a number of tools that measure performance. Most of the database management systems have imbedded capabilities to measure database performance. Third-party utilities supplement this capability. A number of the query and report tools have response time metrics.
- Network performance
The ability of the network to handle the data traffic will directly impact response time. Network software measures line load, line traffic and indicates conditions where an activity was waiting for line availability. Besides the software, network administrators must be available to analyze the results and take appropriate action.
- User satisfaction
Users must be polled shortly after being given the data warehouse capability and then polled periodically to identify changes in their level of satisfaction and to watch trends.
- Number of queries
Many of the query tools provide metrics on the number of queries executed by department and by individual.
- What data is accessed
Many organizations have data that is never accessed. This is the result of inaccurate or incomplete requirements gathering or of the users changing their minds. Sometimes IT loads all of the source data fearing the user will ask for something they did not anticipate in the requirements gathering phase. IT has been beaten up so often by the users who want "all the data, they want to keep it forever and they want the system delivered yesterday". There are tools that will identify what data is actually being accessed and how often.
- Satisfies scope agreement
The scope agreement documents what functions the users will be getting and when. It’s appropriate to review the scope document and determine which functions might not have been satisfied and why.
- Benefits achieved
Before the project began, you estimated the benefits, both tangible and intangible for your project. Now you need to measure the tangible benefits and make some approximations for the intangibles. Since the benefits will not materialize the first day the system is installed, measurement should wait at least two months after implementation.