Affiliated with:

Planning for Data Mapping Projects

image1 13

It is essential to develop a plan and use established processes for data mapping in advance of the project. Doing so will ensure success in ETL, data integration, and business intelligence / analytics efforts

Introduction

Planning is arguably the most important stage of the entire data mapping project.

Documentation needed for a project’s sources should include data maps and data dictionaries to deliver a complete definition and the intended use of each source data component. Metadata, often found in data dictionaries help to assure that the interpreted meaning of each source data element is correct. For example, a field for “Provider ID” could have many different definitions over several elements such as billing identification number, national provider identifier, social security number, etc. Data definitions with similar or exact names could differ considerably in meaning.

Analogous to any complex project, planning for data maps requires:

  • Define objectives for the mapping project
  • Gain IT and business management support
  • Define specific mapping deliverables
  • Assign mapping roles and responsibilities

Meeting the Challenges of Data Mapping Projects

Source to target mappings describe how one or more attributes in source data sets are related to one or more attributes in a target data set. These source-to-target mappings are derived from the ETL transformation rules described in a requirements specification document, comments inside the transformation scripts, spreadsheets, ER data model diagrams, or SQL scripts.

Data mapping is complex and challenging. So what makes data mapping so difficult? Following are common challenges and shortcomings associated with data mapping and how they can be mitigated. 

  • The time, people, and tools needed to build data maps can be substantial

The process of connecting data sources, building mappings for data transformation and integration, and validating the metadata and transformed data often require significant resources, particularly when the entire process is done manually.

There are means to ease the data mapping burden significantly. It starts by defining the process for gathering information to be documented for each source and target. In most cases, systematic interviews with data stewards are the most efficient way to collect info for a data map. Interviews with subject matter experts (SME’s) should be direct using data mapping templates. Metadata and data mapping tools should be used to automate as much as possible.

  • The information needed is not always available for building data maps

A common mistake organizations make with data maps is that they omit important information and therefore render the data map far less useful than it should be. Before data mapping initiatives start, project organizers should assemble key stakeholders and gather feedback on what information needs to be included in mappings for sources and targets. For example, consider retention schedules, litigation risk profiles, and accessibility constraints of particular data sources. Privacy officers will want to know which data sources contain sensitive customer information that must be carefully protected.

  • Substantial efforts needed to maintain data maps

As with all important project documents, data maps should be constantly evaluated, updated and assessed for quality. One method to ensure the data mappings are maintained is to make sure the process is fully integrated into the organization’s master data management program. With every change to requirements, data maps should be reviewed to assess the impact. 

  • Data mapping with spreadsheets can pose long-term issues

Creating manual mappings using spreadsheets is often difficult and time-consuming for many reasons:

Mappings specifications built using spreadsheets cannot be managed easily

Data mappings cannot be easily versioned and auditability of what and who has changed mappings remains a constant issue.

Creating maps internally and using unqualified personnel for map development compromises the integrity of results. Use skilled data management personnel familiar with data mapping requirements, along with the limitations and obstacles of the data to ensure reliable results.

Steps to Success in the Data Mapping Planning Process

  • Determine which data sources are needed to meet requirements for the target system

General steps to data discovery:

A. Identify the data needed to meet required business tasks
B. Identify potential internal and external sources of that data and its metadata
C. Assure that each source meets the privacy and regulatory requirements
D. Assure that each source will be adequately available and accessible according to required frequencies

  • Identify tools for data analysis, data preparation, and data mapping

It will be necessary to load data sources (i.e., frequently sample data sets) into an environment of data preparation (DP) tools where the data can be analyzed and manipulated. It is important to get the data into an environment where it can be examined and readied for the next steps.

  • Conduct data profiling on potential and selected source data

This is the vital (but often discounted) step in data preparation. The project team must analyze source data before it can be properly prepared for downstream consumption. Beyond simple visual examination, projects often need to profile data, detect outliers, and find null values and other unwanted data among sources. This can provide an insight into the state of data quality.

  • Cleanse and screen source data

Based on the knowledge of the business goals, experiment with different data cleansing strategies that will get the relevant data into a usable format. Start with a small, statistically-valid sample to iteratively experiment with different data prep strategies, refine data record filters, and discuss with business stakeholders. 

Planning Considerations for Data Mapping Projects

A typical plan begins with meeting representatives from the IT department to gain an understanding of systems, assets, retention policy and practices, employee separation procedures, archives, backup system and outsourcing of data storage or management. IT is the primary authority on sources such as corporate email and backups.

A next step is to consult with business unit leaders about needs and general data practices. They will identify the seemingly inevitable data repositories and associated software programs IT does not know about. In addition, do not forget to meet with records managers about specific document management systems, databases and file rooms.

At this point in the planning process, enough information has been gathered to build the data map in outline form. Request information (technical metadata) about the format, volume, security information, etc. from IT – continue to narrow the focus by gradually filling in gaps and resolving inconsistencies.

Enterprise data mapping software solutions automate some parts of the process and can be used to generate the map instead of manually creating a spreadsheet. Large corporations with complex IT systems and companies in highly regulated industries should evaluate investing in data mapping software.

Processes and procedures must be clearly defined and documentation prepared to explain how the map was developed, and tested, to work correctly for its intended purpose.

All data maps of any kind must be identified, inventoried, maintained with schema changes, and verified. Poorly designed and out-of-date mappings create significant data integrity problems. Undetected errors in data maps have the potential to introduce many problems, including current and “up the line” as data is propagated down-stream to other systems.

When evaluating data integrity issues that involve mapping, it is critical to understand the elements of all the code sets or data sets that will be mapped. The characteristics of each source or code set, their intended use, and how the map is created are all important to building a successful data map. Using a map for a purpose not intended or misunderstanding the construct of the source and target can lead to incomplete, incorrect, and inappropriate maps.

Conclusion

Planning for data mappings should begin after project requirements are “ready” and after all data sources have been identified and data appropriately “prepared” to meet the needs of requirements and target data. Plan enough time to evaluate the source data, to compare the available data and metadata to the needed data, and to drill down to the detail needed for source-to-target mapping. One goal is to ensure that data migration development is the shortest task in the project plan. The source data and target data should be well-understood before coding begins. Project leaders do not want to experience the very costly surprise of learning that the source data is not “fit to use” at integration testing or implementation.

LinkedIn
Facebook
Twitter

Wayne Yaddow

Wayne Yaddow is an experienced technical architect, leading data migration/integration/ETL testing projects at businesses such as J.P. Morgan Chase, Credit Suisse, Standard and Poor’s, AIG, and IBM. Wayne has written extensively on the subject and taught IIST (International Institute of Software Testing) courses on data warehouse, ETL, and data integration testing. After writing the book, “Testing the Data Warehouse” (Trafford Press), he continues to lead ETL testing and coaching projects as an independent consultant.  Wayne studied computer science at Technical University in Germany.You can contact Wayne at wyaddow@gmail.com.

© Since 1997 to the present – Enterprise Warehousing Solutions, Inc. (EWSolutions). All Rights Reserved

Subscribe To DMU

Be the first to hear about articles, tips, and opportunities for improving your data management career.