Preparing a data warehouse testing strategy can ensure the successful development and completion of end-to-end testing of any data warehouse, data mart, or analytical environment.
Organizations need to learn how to build an end-to-end data warehouse testing strategy. This is most often necessary because the success of a data warehousing project is highly dependent upon the team’s ability to plan, design, and execute a set of effective tests that expose all issues with data inconsistency, data quality, data security, the ETL process, performance, business flow accuracy, and the end user experience.
Why Careful Planning for DWH Tests is Important
Data warehousing projects are becoming larger, more complex, and increasingly important to the organizations that implement them.
- Data is loaded from a growing number of diverse sources across the enterprise to create larger, richer assemblages of both text and numerical information.
- Data is loaded into either a high-volume test area or in the user acceptance testing (UAT) environments.
- Regression testing: ensures that existing functionality remains intact each time a new release of ETL code and data is completed.
- Performance, load, and scalability tests: ensure that data loads and queries perform within expected periods and that the technical architecture is scalable.
- Acceptance testing: includes verifications of data model completeness to meet the reporting needs of the project, reviewing table designs, validation of data to be loaded in the production data warehouse, a review of the periodic data upload procedures, and finally application reports.
Documentation Needed for Data Warehouse QA Planning
Few organizations discard the databases on which new or changed applications are based, hence it is important to maintain reliable database models and data mappings when a data warehouse is first developed and keeping them current when changes are identified.
Business and Technical Requirements
Requirements describe what data must be available in the data warehouse, how it is organized, and how often it is updated. Business users and their requirements affect almost every decision made throughout the implementation of a data warehouse.
Testing requires that well-established and well-maintained requirements are available. When planning tests, look for requirements that fall into (at minimum) the following categories and prepare tests accordingly:
Business requirements, BI application needs, source data needs, data quality requirements, regulatory and compliance requirements, technical requirements (infrastructure, technology), performance requirements, and more.
Source-to-Target Mapping Documents
The backbone of a successful BI solution is an accurate and well-defined source-to-target mapping of each metric and the dimensions used. Source-to-target data mapping helps designers, developers, and testers understand where each data source is and how it transitioned to its final displayed form. Source-to-target mappings should identify the original column names for each source table and file, any filter conditions or transformation rules used in the ETL processes, the destination column names in the data warehouse or data mart, and the definitions used in the repository (RPD file) for the metric or dimension. This helps you derive a testing strategy focused more on the customized elements of the solution.
Consequently, it is essential that businesses rigorously verify the quality of data throughout their data warehouses. If an organization fails to do so, users will make faulty decisions based on incorrect data. Over time, confidence in the data can erode to the point where they will not utilize the business intelligence tools and other applications that rely on the data warehouse.
Unfortunately, not all companies have adequate safeguards in place for their data warehouses. They may have conventional tools and processes in place for validating certain types of data (such as customer names and addresses) once they are in the data warehouse, but they often lack the controls necessary for…
- Preventing bad data from being loaded initially and incrementally
- Properly validating financial and other numerical data
- Discovering and remediating the root causes of chronic data quality problems
- Documenting data quality management measures to third parties (such as auditors and regulators)
The best practices and the test methodology presented here are based on research and practical experiences verifying data warehouse and business intelligence applications.
Identifying Tests Needed for Data Warehousing QA
Verifications that need a strategy: For the reason that data warehouse testing is different from most software testing, a best practice is to break the testing and validation process into several well-defined, high-level focal areas for data warehouse projects. Doing so allows targeted planning for each focus area, such as integration and data validation.
- Data validations: includes reviewing the ETL mapping encoded in the ETL tool as well as reviewing samples of the data loaded into the test environment.
- Data integration tests: tasks include reviewing and accepting the logical data model captured with a data modeling tool (e.g., ERWin), converting the models to actual physical database tables in the test environment, creating the proper indexes, documenting the relevant metadata, and testing the ETL programs created by your ETL tool or stored procedures.
- System testing: involves increasing the volume of the test data to be loaded, estimating and measuring load times and loading errors, placing data
Figure 1: Source-to-Target Mapping Example
Data warehouse models are crucial for the success of a data warehouse. If they are incorrect or non-existent, the warehouse effort likely will lose credibility. All project leaders should take the necessary time to develop the data warehouse data models. For most data warehouses, a multi-month building effort with a highly experienced data warehouse modeler may be needed after comprehensive business requirements are defined. Again, only a very experienced data warehouse modeler should build the model. It may be the single most important skill on the data warehouse team.
The data architecture and model is the blueprint of any data warehouse; understanding it helps the project team to grasp the bigger picture of a data warehouse. Data models help stakeholders understand the key relationships between the major and critical data sources.
The importance of modeling the data warehouse correctly cannot be overstated. Requirements are at the heart of the data model, so devoting sufficient time to identifying the data warehouse’s requirements will help ensure the validity of the data warehouse data model. Errors in the data model will cause errors in the mapping and in the DW’s database. These errors will delay the project’s implementation, and result in additional costs.
Figure 2: Sample Section of a Data Warehouse Data Model
Preparing an approach to data warehouse testing is an important starting point for data warehouse QA. Doing so helps define the testing scope and strategy that will be employed to test all project software and data.
A data warehouse test approach process can serve as a high-level framework to fully develop testing that will be used as living document throughout early phases of the data warehouse/BI project.
A testing strategy:
- Identifies the elements that will be targeted by the tests and the documentation required.
- Outlines the testing types to be used.
- Targets required resources and tools.
- Outlines the approach that QA will use to test the product, and the deliverables
- Data is leveraged by a wider range of users to support a greater number of decisions that affect the bottom line every day.