Knowing how to measure success and failure, and qualifying results for a data warehouse or analytics project is essential for all project managers.
This article is excerpted from a book titled Data Warehouse Project Management (published by Addison Wesley Longman (© 2000), Sid Adelman, Larissa Moss)
There has been much heated discussion over the failure rate of data warehouses and decision support / analytical systems. Luminaries disagree on the percentage of projects that have succeeded. The problem may be with the definition of success and failure. In fact, very few organizations have identified, up front, what for them will be success or failure with a data warehouse or decision support initiative. Therefore, any industry-wide numbers on data warehouse failure are meaningless.
We have all seen projects that are grossly over budget, years behind schedule, and delivering a fraction of the promised function. For various reasons some of these projects have been declared a success with promotions, industry and publication awards and bonuses for all involved. There is a clear problem in that we have no accepted definition for success in large-scale analytics and decision support applications development.
So that everyone is working from the same set of goals, and so that you are able to declare victory legitimately, we suggest that you, the project manager, list your measures of success and have that list approved by both IT and by the business sponsor.
Measures of Success
There are a number of examples of success indicators for data warehouse projects, and for any analytical applications.
- Return on Investment (ROI)
ROI can be achieved a number of ways:
- Lower cost
Costs could be lowered through better inventory management, fewer dollars spent on unproductive measures, product promotions, etc.
- Improved productivity
Greater productivity could be expected from both IT and the user. Today user analysts may spend 80% of their time gathering data and only 20% analyzing the data. The data warehouse should reverse those numbers. IT will still be responsible for developing complex reports as well as writing reports for production systems. The data warehouse can provide reporting tools with a well documented, clean, and easily accessible database. This capability should significantly improve IT productivity.
- Increased revenue
This could be as a result of greater market share and increased sales as marketing is able to more effectively target customers and provide the right products at the right time to the right market.
The effects of costs and revenues may be difficult to assign to the impact of the data warehouse. As the data warehouse is implemented, the organization is not standing still. There are internal and external factors that affect both costs and revenues, and so the actual benefits of the data warehouse may be difficult to determine. However, benefits can be articulated through careful interviews with a cross-section of stakeholders at all levels of the enterprise, focused on analytics and decisions that can be improved through faster access to more accurate data.
- The Data Warehouse is used
One of the easiest categories to understand can be measured by the number of users and the total number of queries and reports generated. If queries and reports are run regularly, it is a good indication that the users are achieving some benefit.
- The Data Warehouse is useful
The data warehouse may be used, but the users may find the benefits to be marginal and illusive. It’s important to ask the users what they see as the benefits of the data warehouse, how it has changed the way they do business, how it may have improved their productivity, and how it may have improved the quality of their decisions.
- Project delivered on time
This measure is problematic, since schedules are often set without an understanding of what is involved and how long each project task will take. “On time” is only relevant if a realistic schedule is the base for comparison.
- Project delivered within budget
This criterion may be difficult to achieve since the total costs of a data warehouse or other decision support application are difficult to determine. Initially, you may not have known how many users to expect, how many queries and reports they would be generating, and the complexity and resources used by the queries and reports. You did not know how large the data warehouse would be, or how many indexes and summary tables would be required and desired. You may not have anticipated needing a larger CPU. You may not have known that the software was more difficult than the vendors represented, and that teams of software consultants would be required. You may not have anticipated needing to upgrade our network to support the increased line traffic. You may not have anticipated needing to raise the salaries of the data warehouse team nor the increased cost of recruiting the talent required to make the project a success. All these factors will contribute to severely underestimating the budget. “Within budget” is only relevant if a realistic budget is the basis for comparison.
- Improved user satisfaction
Users may be internal, external or both. In all cases, the goal is to have users who are happy with the features and capabilities, the performance, the quality of data and the level of support.
- Additional requests for data warehouse functions and data
You will know you were successful if other user departments are beating down your door with requests for access to the data warehouse, and current users are requesting new data and functions to be added to the existing data warehouse.
- Business performance-based benchmarks
This is the most subjective of all the measures and will become the most controversial in many organizations. Most industries have sets of industry averages for measuring performance, and will have identified benchmark (the best) companies against which they want to make comparisons. For example, the industry average for building a car may be ten worker days. With better information, a car manufacturer in the middle of the pack may have a goal to manufacture a mid-size sedan using eight worker days. The data warehouse may be able to provide improved, more complete and timelier information and, with this information, the auto manufacturer may be able to achieve their productivity goals.
- Goals and objectives met
Success will be defined by how well these goals and objectives were met. Inevitably, not all of the objectives were met or were only partially met. Using a scorecard will give you an initial and then an ongoing measure of your project’s success.
- Business problems solved
The data warehouse was developed for some specific reason. Perhaps marketing was unable to identify customer demographics for target marketing. If the data warehouse now provides this capability, it should be considered a success.
- Business opportunity realized
The identified opportunity might have been the ability to provide information to suppliers through the web so they would be able to respond more quickly to your demands for components you need for your manufacturing process. If the supplier now has fewer stock-outs, the project is successful.
- The data warehouse has become an agent of change
The world is changing, and the rate of change is accelerating dramatically. Successful organizations must be able to respond and respond quickly. Decisions must be made more quickly, but this can happen only with better and more timely information. There can be some fundamental changes to the business in the manner and speed in which decisions are made and, the data warehouse can be the vehicle for that change.
Measures of Failure
We may not be sure if the data warehouse is a success but we will always know when we have failed. Some of the indications of failure are:
- Funding has dried up
This could be because the sponsor has moved on or is no longer interested in the project. There could be other factors that have nothing to do with the success of the project such as the company being bought by Philistines who have no appreciation of your efforts or the data warehouse possibilities. Real failure results if the project is perceived as having no real benefit to the organization.
- Users are unhappy with the quality of the data
If you took a shortcut and decided not to cleanse the data, users will reject the data warehouse. The users may even have told you not to spend the time to understand and clean the data; that they want the data warehouse as soon as possible, and they don’t care if the data is dirty. Don’t believe them. Even if the initial users don’t care if the data is dirty, other users in the organization who don’t know how to interpret the dirty data do care. These new users would have to learn how to decipher the dirty data or they would get the wrong results if they do not realize it is dirty. If the quality of the data is poor, the data warehouse will be a failure.
- Users are unhappy with the query tools
The notion of “one size fits all” is inappropriate for selecting tools. Power users need and want different tools than those fitted to the inexperienced and technologically intimidated users. The needs of each user segment must be met with the appropriate tool and the appropriate training and environment. Users will complain loudly when they don’t like the query tool you have forced them to accept.
- Only a small percentage of users take advantage of the data warehouse
You may have expected everyone in marketing to be actively writing queries. They all went through training and graduated with a query tool certificate. However, it turns out that whenever anyone needs some information, they all go to only one person in the department who is a frequent and knowledgeable user.
- Your sponsor avoids you in the hallway
When the only available seat in the lunchroom was at your table and the sponsor chose to eat standing up, you may have a problem.
- Poor performance
Response time is so bad that users only launch queries right before leaving on extended vacations. The problem is more than poor response time, but also there does not seem to be any means of improving performance, or at least no one is searching for solutions actively.
- Inability to expand the data warehouse (it’s not scalable)
You expect the data warehouse to grow, but if the technical architecture does not allow the expansion of the number of users and the size of the database, the infrastructure must be recast at enormous cost and lost opportunity.
- Data that is not integrated
One of the goals for the data warehouse is the ability to integrate data coming from many heterogeneous source files and databases. You wanted departments to be able to share data and to have data of record for the organization; data that everyone in the organization would accept as correct. However, due to a lack of proper design, you have a proliferation of data marts that do not have common keys or common data definitions. This is an indication of islands of data marts that cannot be shared, and the result is the inability to share or integrate data.
- Extract, Transform and Load (ETL) steps do not fit in the batch window
ETL can gobble up machine resources and time. A number of data warehouses have failed because of the time required for ETL. This problem is especially critical to global companies who have to be up and running in several time zones and are therefore starting with a reduced batch window.
The only way we will know if we are successful is to monitor and measure the project. Appropriate measurements are subjective and objective. Just like certain medical tests, some of these measures are invasive and may have negative consequences, such as an impact on performance or the stability of the system. Some measures are costly; they require knowledgeable people and machine resources to execute them. The clever project manager will select the appropriate metrics by evaluating cost and effect. The metrics that should be considered are:
- Functional quality
Do the capabilities of the data warehouse satisfy the user requirements? Does the data warehouse provide the information necessary for the users to do their jobs?
- Data quality
If the data warehouse data is of poor quality, the users will reject it. There are two means of measuring quality:
- Ask the users if their reports are accurate,
- Use a software tool to provide a scorecard on the quality of the data.
Be aware that the software tools cannot evaluate all types of data quality. Some data quality assessments must be done by people who know the data and the usage expectations, after data quality profiling has been performed.
- Computer performance
There are four indicators of performance we should consider:
- Query response time
- Report response time
- Time to load/update/refresh the data warehouse
- Machine resource.
Some organizations have established benchmark performance numbers for known queries and reports, and they exercise and measure these benchmarks periodically looking for impending performance problems. There are a number of tools that measure performance. Most database management systems have imbedded capabilities to measure database performance. Third-party utilities supplement this capability. A number of the query and report tools have response time metrics. An experienced database administrator will use all these capabilities to manage resources appropriately for performance monitoring.
- Network performance
The ability of the network to handle the data traffic will directly affect response time. Network software measures line load, line traffic and indicates conditions where an activity was waiting for line availability. Complementing the software, network administrators must be available to analyze the results and take appropriate action.
- User satisfaction
Users must be polled shortly after receiving access to the data warehouse, and then polled periodically to identify changes in their level of satisfaction. The data warehouse team should watch user satisfaction trends carefully and respond to issues promptly.
- Number of queries
Many of the query tools provide metrics on the number of queries executed by department and by individual, along with other categories.
- What data is accessed
Many organizations have data that is never accessed. This is the result of inaccurate or incomplete requirements gathering or of the users changing their minds. Sometimes IT loads all of the source data fearing the user will ask for something they did not anticipate in the requirements gathering phase. IT has been beaten up so often by the users who want “all the data, they want to keep it forever and they want the system delivered yesterday”. There are tools that will identify what data is actually being accessed and how often.
- Satisfies scope agreement
The scope agreement documents what functions the users will be getting and when. It is appropriate to review the scope document and determine which functions might not have been satisfied and why.
- Benefits achieved
Before the project began, you estimated the benefits, both tangible and intangible for your project. Now you need to measure the tangible benefits and make some approximations for the intangibles. Since the benefits will not materialize immediately, measurement should wait at least two months after implementation.
In the final analysis, only those organizations that adopt measurements as part of the planning phase in the data warehouse project will be able to determine whether their data warehouse is a success or a failure. As Lord Kelvin wrote, “You can’t manage what you have not measured.”