Developing a process for cost justifying a data warehouse and metrics for measuring various costs associated with a data warehouse project are essential for success
This article is excerpted from a book titled Data Warehouse Project Management (published by Addison Wesley Longman (© 2000), Sid Adelman, Larissa Moss)
Not every data warehouse project will be cost justified initially. Many have been implemented without really knowing how much they will cost. Many will be authorized without even estimating the potential benefits. Projects are often started because a sponsor has the need for information and has the budget. As a result, many of these projects are not completed when the sponsors learn the total costs. Often, the projects collapse under the weight of unexpected charges due to a lack of understanding of the variety of sources that contribute to the total costs, direct and indirect, starting and continuing.
The need for cost justification
Budgeting – Without cost justification, projects will always be in jeopardy. During future budget cycles, management will be looking for ways to reduce cost and if there is no documented reason for completing the project, they are likely to forget the flush of excitement that accompanied the project’s initiation.
Staffing – Without cost justification, staffing with the right people may be difficult. By having some real dollar numbers to back up a request, the request is more likely to be satisfied.
Prioritization – Without a cost/benefit analysis, project prioritization is difficult and management has little to compare projects other than a gut-feel that one is more important than another. Without cost/benefit analysis, the line-of-business manager with the most power is likely to get her project approved. The project that is most important to the enterprise may never be implemented.
Costs can and must be controlled in every project. It is the project manager who has the responsibility for controlling costs along with the other responsibilities. Adhering to the Project Agreement / Scope Document is a major start for controlling costs, according to the Project Management Institute. The Project Agreement (Scope Document) specifies the data that will be in the data warehouse, the periods for which the data is kept, the number of users and predefined queries and reports. Any one of these factors, if not held in check, will increase the cost and possibly change the schedule of the project. A primary role of the data warehouse project manager will be to control scope creep, which affects project costs.
Consultants and Contractors
Consultant and contractor expenses can escalate a project’s cost rapidly. Consultants are used to supplement the lack of experience of the project team, contractors are used to supplement the lack of skilled personnel. There are two types of consultants / contractors:
- Product specific contractors – These persons are brought in because they know the product. They can either help or actually install the product, and they can tune the product. They will customize the product, if it is necessary. The product-specific consultants may either be in the employ of the tool vendor or may be independent. An example of their services would be installing and using an ETL tool to extract, transform and load data from your source files to the data warehouse. In this activity they may be generating the ETL code on their own or working with your people in this endeavor. To control costs, the most important thing is to have the right person(s) on your staff work with the consultant/contractors and absorb their knowledge and understand the process. The acquired knowledge would allow your staff to perform this work in the future and to maintain what has been done. Your goal is to make your staff self-sufficient as soon as possible. This should be the goal of the contractors as well.
- General data warehouse consultants – These consultants may have a specific niche such as data modeling, performance, data mining, tool selection, requirements gathering or project planning They will typically be involved for a shorter period of time than the product-specific consultant/contractor. They have two roles that are equally important. The first is working with your people to complete a task such as selecting a query tool or developing a project plan. The second is the knowledge transfer to your staff so they can perform the activity the next time on their own. Just as in the case of the product-specific consultant/contractor, your goal is to make your staff as self-sufficient as soon as possible. As with the contractors, this should be the consultants’ goal.
Conversely, some data warehouse consultants may be engaged to serve as strategic support for your team, perhaps leading the effort supplying the data warehouse project manager and other essential team roles. These types of data warehouse consultants, like those provided by EWSolutions, may remain with the project for the first complete iteration to ensure that all activities are performed according to best practices and that sufficient industry standard knowledge has been transferred to the client’s staff.
Very often, contractors and consultants are asked to participate in activities beyond their original statement of work. These may or may not be useful activities and they may or may not contribute to the success of your project. Keep the contractors and consultants focused on their stated objectives and on your project and, if they are working on something other than your project, be sure some other department is paying for them.
You didn’t marry the contractors and consultants. They are not there “until death do you part” (or when the project is cancelled). Don’t let them get too comfortable. Watch their time, their activities and their deliverables. By having specific deliverables associated with their contracts, they are more likely to complete their projects on time and within your budget. Do not bring them in too early and do not keep them beyond the time when they no longer make significant contributions to your project.
The software products that support the data warehouse can be very expensive. The first thing to consider is which categories of tools you need. Do not bring in more categories of products than you need. Do not try to accomplish everything with your first implementation. Be very selective.
Many organizations have someone experienced in dealing with vendors and understanding their contracts. If your organization has such a person or team, you are fortunate. You will be working closely with this person. They will know the things to watch out for in a contract, but you will need to give them some help to acquaint them with data warehousing. You will also have to give them some warning if you heard anything negative about the vendor. Your contract people will know how to include protection in the contract to keep the vendor from arbitrarily raising their prices. They will know how to control maintenance costs. They will know about protection if the vendor sells out to a less-accommodating company. They will know about enforcing satisfactory service from the vendor. If you do not have anyone in your company who can perform these functions, engage a procurement consultant.
When you talk to the references, you will want to ask them what they planned to buy from the vendor, and what they had to buy eventually. There often are surprises with product purchases requiring layers of capabilities.
Anticipate growing your environment. The growth will be in the number of users, the size of the database, much more machine resources necessary to perform the more complex queries, an extension to users beyond your enterprise (customers and suppliers), and possibly, the need to migrate to a more robust and better performing platform. Be sure your contract allows for these types of growth factors, and that the growth, while it may increase, does not explode your costs.
Your organization most likely already has an RDBMS. Should you have to pay for it as part of your data warehouse project? If there is a site license, there may be no charge to your department or you may have to pay a portion of the site license. You may have to pay if the data warehouse will be on another CPU, and if the RDBMS is charged by CPU. You may have to pay an upgrade if the data warehouse requires going to a larger CPU, and if there is an additional cost for the larger CPU.
What if you already have a site license for a query tool or report writer? You may have to pay a proportional percentage of the cost or at least a proportional percentage of the yearly maintenance fee. A reasonable method of assigning proportionality is by the number of users your project supports.
Capacity planning for a data warehouse is extremely difficult because:
- The actual amount of data that will be in the warehouse is very difficult to anticipate,
- The number of users will also be difficult to estimate,
- The number of queries each user will run is difficult to anticipate,
- The time of day and the day in the week when the queries will be run is difficult to guess (we know there will not be an even distribution, expecting more activity at month-end, quarter-end, etc.), and
- The nature of the queries, the number of I/Os, the internal processing is almost impossible to estimate.
All these unknowns mean that whatever hardware is chosen, it must be scalable. It must be able to scale and grow to at least three times the largest anticipated size.
The turnpike effect may cause you to under-estimate the resources needed. (The number of lanes needed for turnpikes was based on the traffic in the area. When the turnpikes were built, more traffic was attracted because of the road’s convenience. This meant the turnpikes were under-built). If the data warehouse is successful, it will be used more than anticipated, and an increasing number of users will want to use it. It will be used for more functions than were originally considered. More data will be desired. The data will be expected to be delivered at a more detailed level and for more historical periods (five years instead of two years).
For the data warehouse, you will need CPUs, disks, networks and desktop workstations. The hardware vendors can help size the machines and disks. Be aware that unanticipated growth of the data, increased number of users and increased usage will explode the hardware costs. Existing desktop workstations may not be able to support the query tool. Do not ask the query tool vendor for the minimum desktop configuration. Ask for the recommended configuration. Call references to find out if and how they had to upgrade their desktop workstations.
Controlling Hardware Costs
Your ability to control hardware costs will depend primarily on whether your organization has a chargeback system. Even though department heads are supposed to have the best interests of the organization at heart, what they care most about is meeting their performance objectives. These, of course, include the costs assigned to their department. If department heads are paying for what they get, they will be more thoughtful about asking for resources that may not be cost justified. We had an experience with a user asking to store ten years worth of detailed data. When he was presented with the bill (an additional $1.5 million), he decided that two years worth of data was adequate.
How should you account for existing hardware that can be used for the data warehouse? It may mean you do not have to buy any additional hardware. Should that be included in our data warehouse cost? It is a safe assumption that your organization will need additional hardware in the future. By using the redundant hardware for the data warehouse, it means that additional hardware for non-data warehouse purposes must be purchased sooner. You may be able to defer the cost of the redundant hardware. At the time the hardware is purchased, it may be less than today’s cost.
Raw Data Multiplier
There are many debates over how much disk is needed as a multiplier of the raw data. Besides the raw data itself, space is needed for indexes, summary tables and working space. Additional space may be needed for replicated data that may be required for both performance and security reasons. The actual space is dependent on how much is indexed and how many summary tables are needed. The summary tables may be created as you learn more about what the users are asking for and how often they ask. From this information, you will be creating summary tables, one at a time. The RDBMS vendors should be able to help you with estimates of how much working space you will need. The authors did a study of what other data warehouse installations were using as a multiplier. It was highly varied but we arrived at a five times multiplier– if you have 100 gigabytes of raw data you should provide for 500 gigabytes of disk space.
Internal People Costs
These people are getting paid anyway regardless of whether we use them on this project or not. Why should we have to include their costs in our budget? We have to assume these people would be working on other productive projects. Otherwise, there is no reason for the organization to keep them employed. Count on having to include the fully burdened costs of the people on your project. Keep in mind that you are much better off with a small team of highly skilled and dedicated workers than with a larger team of the type of people to avoid for your project.
Additional Support Costs
User Support staff or the Help Desk staff will be the users’ primary contact when there are problems. Providing adequate User Support will require more people, and more training of those people, to answer questions and help the users through difficult situations. The cost for the additional people, the training and possibly an upgrade in the number and knowledge-level of the staff answering user issues must be added into the data warehouse costs.
For optimal effectiveness, user training is done on the premises and not at a vendor site. There are four cost areas for user training that must be considered.
- The cost to engage a trainer from the outside or the time it takes your in-house trainer to develop and teach the class.
- The facilities required, including the desktop workstations for the workshop
- The time the users spend away from the job being in class, and the time it takes them to become proficient with the tool.
- If not all the users are in the same location, travel expenses for either the users or the trainer must be included.
Generic training may be appropriate for various IT staff members in selected skills related to data warehousing / decision support / analytics. Examples are classes in logical data modeling, data warehouse project management or star schema database designs. Data warehouse conferences and seminars can provide an overall perspective as well as training in specific areas. IT staff may need to attend training on the complex tools and products chosen for the platform. IT will also need enough time to work with the products to become proficient. The cost of training is sometimes included in the price of the software and / or hardware.
Most organizations focus on the cost to implement the initial data warehouse application and give little thought to continuing costs of its operation. Over a period of years, the continuing cost will very likely exceed the cost of the initial application. The data warehouse will grow in size, in the number of users and in the number of queries and reports. The database will not remain static. New data will be added, sometimes more data than for the initial implementation, and the design may change, and the database must be tuned periodically. New software will be introduced, new releases will be installed and some interfaces will have to be rewritten. As the data warehouse grows, the hardware and network must be upgraded.
User support of the data warehouse will be a continuing expense for the maintenance team and the Help Desk. Desktop workstations must be upgraded, as will the software on those workstations.
Total Cost of Ownership
The total cost of ownership (TCO) includes all the direct and the indirect costs associated with the data warehouse / decision support / analytics initiative. TCO includes costs that are normally hidden from a budget such as the additional time the users spend working with the data warehouse directly, assuming that in the past they sent their requests to IT. TCO includes costs you incur initially as well as costs in future periods. Insurance, tax, floor space, utility costs are part of the TCO. The TCO should be calculated for the initial implementation and for projected costs in subsequent years for the enhancement and sustenance of the data warehouse.
Creating the process for justifying the costs involved in starting and sustaining any data warehouse project can require several people and some length of time. However, neglecting this essential activity often results in the project not being funded, or the initial project not continuing, due to a lack of understanding of the direct and indirect costs that form the data warehouse effort.