Affiliated with:

Data Models Ensure Successful Migrations

Type of data model

When migrating data across platforms, it is critically important to have three full data models to support the effort, with the relevant metadata.  Having these models will reduce time, costs, and effort needed to perform the migration successfully.

Introduction

Migrating data from one platform (e.g., legacy applications) to another (e.g., DW, ERP, MDM) is a difficult endeavor requiring understanding of the enterprise’s business rules, how to navigate semantics (e.g., what the source calls X, what the target calls X, what the enterprise means by X, and how get from XS to XT), and the source and target data structures.

Without effectively leveraging and/or designing data models (especially the Conceptual Data Model) and having solid metadata, the migration effort will encounter many additional issues and will be at risk of exceeding its schedule and budget.  The three models needed are: 1) Conceptual / Business Concept Model, 2) Source Data Model, and 3) Target Data Model.  Ideally, for the source and target you will have both logical and physical models.  Ideally as well, a business logical model (at the attribute level) would be created – but at least the Business Conceptual Model should be developed, if not it does not exist already.

Types Of Models Needed To Support Migration

Figure 1 – Types of models needed to support migration

Source-Target Mapping

The primary artifact driving data migrations is the source-target mapping (STM), usually a spreadsheet which identifies the columns/fields in the source and how these map to the target columns.  A major risk in performing a mapping is for a junior resource (e.g., data analyst) with limited time to map something that looks like a match (may have the same logical name and data values might look similar), but could actually be homonyms with no correlation, or might not take certain business conditions / rules into consideration for the mapping.  This can be disastrous and the metadata errors will lead to significant rework and retesting.

For example, what might have looked like a direct move between the source and target might actually involve joining multiple tables to get the correct values migrated; it is essential to understand those tables and the cardinality of relationships and join conditions.  One source can require diligent efforts in logical and physical mapping with the attendant understanding of the metadata of multiple possible connections.  Considering the dozens or hundreds of fields that may need to be mapped from usually multiple data sources – it becomes obvious that a new approach is needed.

It is common to encounter situations where there are no useful existing data models.  Perhaps, the source may have been operational for decades and no one knows where the models (diagrams) and metadata are, or if there ever was a diagram developed.  A diagram with just tables and columns is of very limited value, although this can easily be acquired from looking at the database catalog.  The key requirement is the identification of the relationships between tables.

Data Relationships

Data relationships are bidirectional in nature (it is important to know how a parent relates to the child and how the child relates to the parent).  Even with a model diagram that has relationships, there are cases where ALL the relationships are the same (e.g., all non-identifying 1 to 1 or many relationships) or all many-to- many relationships.  There might be a valid reason for this (primarily at the physical model level) – but typically this indicates a lack of modeling expertise, and in a conceptual data model such homogeneity is highly suspect.  Almost no business has such a simple model when all the business conditions and rules have been taken into account.

Even if the physical data model has proper relationships, these may be just physical (based on the database requirements), so the true nature of the business relationship may be hidden.  For example, in a data vault model, relationships between business entities (e.g., Customer and Order) are always through a link table (associative entity).  Therefore, they appear to be many-to-many relationships.  This approach is used so that a change in the business will not necessarily require a change in the model.

In a dimensional model, a default foreign key value (e.g., 0) might represent “Unknown”.  Therefore, a corresponding row in the dimension with this key value enables outer joins to be eliminated.  In this case, the relationship in the physical model would be 1 to 1:M, but logically the relationship might be 1 to 0:M.  Of course, misunderstanding how data relates can have a severe effect on the quality of the model and the data in the resulting database.

Business Conceptual Model

If there is no Enterprise Business Conceptual Data Model (aka Business Concept Model) start the migration efforts with the development of this model.  In this model (see example in Figure 2 below), key business objects and their relationships are identified.  In addition to supporting the migration effort, the Business Concept Model can be the foundation for the target model.  Starting here with a full Business Concept Model may not be possible if migrating to a packaged application, but developing a basic conceptual model will help bridge the semantic gap between source, enterprise, and target as well as document business rules expressed as relationships.  Many, but not all, business rules can be expressed via a data relationship on a model.  Writing out the relationships helps to cover some of the nuances in the notations, along with helping reviewers understand the model better.

Conceptual Model

Figure 2 – Example Conceptual Model

After completing the Business Conceptual Data Model, start mapping source tables/entities to the conceptual entity and target tables/entities to the conceptual entity so that it is easier to decompose, understand, and scope the level of migration effort.  For example, assume that the effort was scoped at 10 source tables to be needed to map into 5 target tables.  However, after the conceptual model development and the source-target mapping, the team learns that those 10 tables require 7 additional tables to connect the needed data properly; the level of effort becomes significantly greater than anticipated.  This can be magnified if the data structures are not easy to work with or if navigation is needed across multiple systems/databases.  It is critical to understand the cardinality of the relationships in all the sources and targets – otherwise there is a tremendous risk of picking the right column from the wrong row!  Sufficient training in data modeling and data architecture is essential for all data modeling professionals.

Conclusion

Solid data models, diagrams with business metadata, should form the foundation for any data migration effort.  If these are lacking prior to the start of the effort, the project manager should add time and budget for a data modeler to produce these.  If this is not possible, the project manager should plan to double the amount of time for the entire initiative to account for the inevitable rework and data quality issues.  At least three types of models are needed – the source model, the target model, and the conceptual data model to help bridge the gap between sources and target.

LinkedIn
Facebook
Twitter

Peter Stiglich, CBIP

Pete Stiglich, CBIP, is a Principal Consultant with Data-Principles, LLC and has written and presented extensively on data architecture, data management, and Big Data. He is an AWS Technical Professional and a Hortonworks Architecture Professional.  Pete also is an experienced trainer in data architecture and data modeling, and has a background in data governance and metadata management.

© Since 1997 to the present – Enterprise Warehousing Solutions, Inc. (EWSolutions). All Rights Reserved

Subscribe To DMU

Be the first to hear about articles, tips, and opportunities for improving your data management career.