The purpose of data documentation is to collect information about the data to facilitate understanding, interpretation, transformation, use, etc.
The semantics of the term “documenting” include “Record or report in detail, support with citations or references, support with evidence or proof, provide information”. Add the word “data” and the purpose of data documentation becomes “to collect information about the data to facilitate understanding, interpretation, transformation, use, transmission, etc.”.
Data documentation has proved to be a necessity, and sometimes an obligation, for any organization that handles a large variety of data or when data are increasingly shared. It is a foundational function of enterprise data management.
Documenting data raises the question of both motivations and benefits. What to document is concerned with the relevant attributes that serve intended uses (regulatory reporting, personal data processing, 360 ° customer view, sales reporting, financial reporting, etc.). These reasons may vary depending on the motivations.
Why document data?
During its mission, a company is led to build relationships with customers, suppliers, partners, administrations, and other authorities, etc. As part of these relationships, the company oversees creating and delivering data, or of receiving and using data. It acts respectively as “data producer ” or as “data consumer”. Thus, creation, delivery, reception and use of data are materialized by the data processing.
Prior to any use, the company may be obliged (legal and regulatory framework) or find interest in documenting the data, to be able:
- As a data consumer, to inform about its uses, requirements and desired levels of service;
- As a data producer, specify the authorizations, restrictions or prohibitions of use, the characteristics of the data and actual service levels.
During implementation, the documentation is used as a basis for the reception of data delivery by the data consumer and for the acquittal of the data producer for the service delivered (see Figure 1).
Figure 1: Simplified relationship between data producer and data consumer – actor to actor
For the company, documenting data can appear as a necessity induced by:
- External relations (Client – Supplier, Company – Supervisory Authority, Company – Administration, etc.), which take place in a legal, regulatory, economic, social, and cultural environment, which regulates authorizations, restrictions and prohibitions use related to the data.
- Internal relationships, which are materialized by data processing between actors, roles, organizational units, etc.
In the latter context, the data consumer may find it useful to document data to inform its intended uses, requirements, and service levels (see Figure 2). The data producer may also find it useful to specify the authorizations, restrictions or prohibitions of use, the data characteristics and actual service levels. The company can finally find an interest in coordinating these individual interests to help meet its legal and regulatory commitments.
Figure 2: Simplified relationship between data producer and data consumer – activity to activity
What should be documented?
Data documentation consists in setting up a documentary system, that is “a structured and organized set of documents of different natures” within the meaning of the AFNOR FD S 99-31 standard. It has the advantage of:
- Preserving knowledge and facilitate its transmission;
- Building an accessible, valid, timely and shared information base;
- Preventing gaps, risks and malfunctions;
- Providing an information and knowledge basis for the harmonization of practices.
Despite this apparent and immutable utility, the development of a corporate data documentation strategy is a recent practice. Several reasons contributed to this change:
The first is due to a productivity challenge (personal or collective), characterized by the digital revolution. Indeed, the digital transformation has brought to the company capabilities of representing, collecting and processing data, such as the ability to digitize a character, sound, form, color, word, text, photography, music, film, etc. The acquisition of these capabilities was accompanied by the need to document the data collected, to facilitate identification, localization, search or use.
In this context, the useful documentary attributes are:
Equipped with elaborated data documentation, the user is more productive: he devotes his time to the exploitation of the data and no longer to their preliminary evaluation. In addition, all users enjoy the same document quality, in contrast to the case where the evaluation would be left to the discretion of each user.
Risk control is the second issue. It takes shape between regulation, norms, and standardization.
Indeed, many regulators have been concerned about the risks to the data subjects’ rights and freedoms. Texts have emerged here and there (HIPAA on the processing and exchange of health data in the United States, GDPR in Europe, etc.), with the same desire to empowering businesses and data subjects. Other initiatives (Solvency II in the insurance sector, Basel II and III or CRD IV for the banking sector, etc.) have taken shape to address prudential risk, develop the culture of self-assessment and promote the monitoring of major risks. Finally, others, based on the use of standards (IFRS, etc.) or the promotion of a common language (BCBS 239), have been developed to control data quality issues and risk assessment and to facilitate reconciliations and comparisons.
In this context, the useful documentary attributes are:
Shared semantics aims to improve understanding and to reduce errors in interpretation or implementation; it also makes it possible to limit the efforts made to reconcile the data. The prescriptive attributes shed light on the conditions of lawfulness or fairness of a processing relating to personal data and make it possible to justify its purpose; they also make it possible to specify the requirements (quality, security, protection or life cycle) that weigh on the data. The knowledge of the path covered (processing steps) by the data allows for rapid impact or dependency analyzes, which meet the expectations of the supervisory authorities and the internal actors of the company.
Traceability is the third key issue. Indeed, the continuing evolution of regulations, data-driven decision-making and digital transformation are all elements that promote the emergence of new requirements for data traceability. In this context, the company must document data to control:
- The origin, especially in its identification, semantics, quality levels at source (which may be non-contractual), etc.;
- The destination, in its identification, the result, its quality levels at the target;
- The transformation cycle that leads to this result, in the rules applied, the controls carried out and the responsibilities involved.
This documentation plays an important role in monitoring and assessing levels of service. It allows the company to:
- Acting appropriately to rectify detected defects (incident management);
- Conducting an upstream analysis (dependency analysis), prior to corrective actions;
- Conducting a downstream analysis (impact analysis) prior to a change;
- Preemptively integrating relevant documentary attributes into the design (requirement management);
- Taking legal action, challenging a data producer that has caused harm to the company or proving it as a data producer in the event of a data breach by a data consumer.
Data documentation brings transparency on the data processing activities. It assures the data consumer of the impacts associated with the exploitation of data. It gives him more confidence in his actions and decisions.
The documentary attributes useful in this context are:
- Structural, to characterize the composition and relationships between data;
- Location, to define the origination and destination of the data;
- Processes, to determine the stages of transformation since the origination;
- Rules, to specify the transformations;
- Commitments, to characterize levels of service;
- Administrative, to define the data lifecycle.
Compliance is the fourth issue that could be cited, and it is more than a necessity, it is an obligation. Indeed, organizations are confronted with a regulatory proliferation, which is accompanied by various requirements:
- Enhanced requirements for reporting and evidence management at the functional level;
- Tighter requirements in terms of data granularity and depth as well as service levels (notification period, data completeness, etc.) at the operational level;
- Increased requirements in terms of means (roles and responsibilities) of production (process) and control (policies and procedures) at the organizational level.
To propose appropriate answers, the company needs to identify and locate the data subject to the regulatory provisions. In this case, useful documentary attributes are:
- Descriptive, to characterize the content of the data;
- Prescriptive, to say what is authorized, subject to or prohibited by a regulatory provision;
- Location, to characterize the origination, the milestones, and the destination of the data;
- Stewardship, to clarify data rules and ensure that they are enforced;
- Administrative, to control the data lifecycle.
Every organization should develop a strategy for documenting its data. The strategy should be the baseline for data, for description of the data, and to support data stewardship, processes, rules, commitments, etc. In addition, the company needs to support the actual use of data and evidence management on data lineage.