Affiliated with:

Documenting Data Lineage: Practical Implementation Steps

image

All data lineage initiatives should be based on a set of essential steps for effective implementation and use of the metadata

Implementing a data lineage initiative is a complex, and necessary project.  An established data management framework and collaboration between data management professionals and stakeholders are a prerequisite for successful implementation of data lineage.  There is a set of steps all successful data lineage projects use, and each step is essential.

Step 1: Identify key business drivers for data lineage

A company should have serious reasons to start documenting metadata such as data lineage. Some of these reasons might be:

  • legislation requirements
  • business changes
  • data quality initiatives
  • supervisory and audit requirements

If one or more of these reasons is crucial for meeting business goals, then a company is ready to start discussing data lineage documentation.

Step 2: Ensure support and involvement of senior management

Neither data management nor data lineage should be implemented just for the sake of it, since they require a lot of resources, human as well as financial, and will consume a lot of time. Without the dedication and active support from senior management, such initiatives have no future. Two key groups of benefits might convince a management of a company to support a metadata management initiative. These are:

  • Improved work efficiency and increased revenue in 3-6 months. This can be achieved by improving data quality. In more concrete terms improving data quality can lead to:
    • increased revenue by 15-20 %
    • reduced operational costs by 40%
    • decreased IT maintenance costs by 40-50%.

These monetary benefits will be a result of reducing the cost of many manual operations with data, optimized application landscape etc.

  • Compliance with regulations, e.g. GDPR (the EU General Data Protection Regulation), California Consumer Protection Act (CCPA), Health Insurance Portability and Accountability Act (HIPAA). There are a variety of fines a company could receive due to data breaches from these regulations.

Step 3: Scope the data lineage initiative

Once senior management has approved the data lineage initiative, the next step is to think about the scope of the initiative.

For each business driver a company has chosen, the corresponding data sets can be found. This is the first filter to use that will reduce the scope of the initiative. For example, GDPR focuses on personal data. If a company just started a data quality initiative, chances are high that the first data set to be reviewed will be customer data.

The second filter is identification of critical data elements (CDE) within these data sets. CDEs are data elements that have the biggest impact on the organization’s performance and customer experience. Usually, these are the key performance indicators (KPIs) used to manage the company.

The techniques to identify these CDEs (KPIs) are simple. First step is to choose the most critical management reports and their KPIs, since each report will contain one or more CDEs. The difficulties start with identification of data elements that are needed to calculate these KPIs, and resolution of the primary CDE sources. This step is where the story with data lineage documentation begins. Once the scope of the data lineage initiative has been agreed upon, the scope of data lineage implementation can be defined.

Step 4: Define the scope for data lineage

Data lineage can be scoped using the concepts of ‘horizontal and vertical data lineage’.

The whole scope of data lineage starts with the original data sources and ends at the point of final usage. In large companies, especially with a lot of subsidiaries, such chains can be long and complicated. Therefore, a company often starts with a limited ‘length’ of data lineage, which starts at some point of data aggregation.

Data lineage can be documented on different levels of data models: conceptual, logical and physical. The choice of the number of levels on which data lineage will be documented refines the scope of data lineage as well.

Step 5: Prepare the business requirements for data lineage

Different groups of stakeholders have different requirements and expectations for data lineage.

There are at least two key groups: business stakeholders, i.e. audit, business and data analysis, financial controllers; and technical stakeholders, i.e. IT developers, database managers etc.

If a company has little experience with data lineage, the topic can remain abstract, so some education and training should accompany the first few data lineage projects.

Some specific features can be identified for these two general groups of stakeholders.

Business stakeholders are mostly interested in:

  • the ability to run root-cause analysis, starting from the end reports and going back to the ‘golden’ source
  • the value of data lineage rather than its design
  • data lineage on conceptual or logical data model levels.

On the other hand, the technical stakeholders focus on:

  • impact analysis, starting from the source of data elements and its path to its final destination
  • metadata design lineage
  • data lineage on physical level

Experts recommend that the data lineage team spend some time talking to different groups of business stakeholders to clarify their expectations, refine the expectations into requirements, and align all the requirements in a unified document.

At the completion of this step, the decision how to document data lineage can be made.

Step 6: Choose the method to document data lineage

There are two methods for documenting data lineage: descriptive and automated.  Each approach has its benefits and challenges, and one will be optimal for each data lineage project at an organization.  It is important to stress that data lineage documentation is a task that consumes significant time and resources (human, technology, funds, etc.).

It is advisable to assess which of the existing methods is most feasible for the organization, its goals, and its resources.

The level of documentation of data lineage will also affect the decision regarding the method. Regardless of the method chosen, manual work will be required to document data lineage.

Once the method has been chosen, the organization can choose suitable software.

Step 7: Choose the suitable application to document data lineage

Not surprisingly, even large companies document data lineage using MS applications such as Excel, Word, PowerPoint, Visio. If the organization can purchase a specific application, there is a variety of specialized data lineage products to explore. Should an automated solution be the goal, there is an additional set of products to examine.

Conclusion

Documenting data lineage is a complex effort, but it is an essential initiative for any organization that wants to understand its data fully and use it effectively.  Documenting data lineage can be accomplished by following these steps consistently.

LinkedIn
Facebook
Twitter

Irina Steenbeek, Ph.D.

Irina Steenbeek, Ph.D. is a Senior Data Management professional, with many years’ experience in several multinational banks and with other complex industries.

She is also experienced in project management for software implementation, business consultancy and control, and data science, and serves as a consultant through Data Crossroads.

Irina is the author of two books: The Data Management Toolkit and The Data Management Cookbook, along with various white-papers and articles on the topics of data management and its implementation. As the result of her experience, she developed a generic data management implementation model applicable for medium-sized companies.

© Since 1997 to the present – Enterprise Warehousing Solutions, Inc. (EWSolutions). All Rights Reserved

Subscribe To DMU

Be the first to hear about articles, tips, and opportunities for improving your data management career.