A new definition of data integrity focuses on adherence to an ethical approach to managing and using data, an objective and unbiased view
One of the main areas of responsibility for anyone concerned with the management of data is the enforcement of data integrity. Most data management texts define data integrity as “attention to the consistency, accuracy and correctness of data stored in a database or other electronic file”. Commonly, data integrity refers to the validity of data in its incarnations (electronic, paper, etc.). Additionally, data integrity can be defined as the extent to which all data are complete, consistent, and accurate throughout the data lifecycle.
This approach is primarily a reactive one and is focused on the rules used to create and store data values, by creating and storing the “right” values for each data element. Many data management professionals use the term “data integrity” in this manner.
Definition of Integrity and Data Management
However, the use and management of data in its most robust and active form should be concerned with much more than simply enforcing rules for creating and storing “right” data values. At the least, all those who are part of the data management professions have the responsibility for the proper use and welfare of the assets that represent the people, places and things that are of value to the organization. This view of data management and the integrity of data encompasses the base definition of the word “integrity”, “a firm adherence to a code – especially for moral or artistic values” (Merriam-Webster dictionary, 2012). This definition serves as the foundation for developing a view of data management that includes an alternate definition for “data integrity”.
This alternate definition is one that has as its goal the use and presentation of data without bias, refusing to allow data to support of one point of view to the exclusion of any competing view. This view takes the most ethical approach, and uses the ideas of honesty and incorruptibility found in the definition of integrity to support the activities of all who manage any aspect of data – those responsible for data governance, master data management, data quality management. An ethical approach to data management would be the care and use of data with integrity, without bias or predetermined results, and without any actions that would cause harm or affect negatively the storage or use of that data.
Therefore, “data integrity” can be defined as “managing and using data according to a code or set of values, with honesty”. Data quality professionals can serve as the vanguard of this new approach to data integrity through their oversight of the data rules, values, and application of standards to the data in their areas of responsibility.
Ethical Uses of Data and Prohibitions
However, data quality professionals are not the only ones responsible for creating and maintaining an ethical approach to data management. By choosing to teach all data management professionals of the ethical and honest uses of the data in their areas, organizations can formulate and articulate the acceptable uses of their data – and the consequences of unacceptable uses. Some commonly accepted prohibitions on the use of data, taken from Cornell University’s Information Use in the 21st Century project include:
- Do not use information (even if authorized to access it) to support actions by which individuals might profit (e.g., a change in salary, title, or similar administrative category). Do not disclose information about individuals without prior supervisor authorization.
- Do not engage in what might be termed “administrative voyeurism” (e.g., tracking the pattern of salary raises; determining the source and/or destination of telephone calls or Internet protocol addresses; exploring race and ethnicity indicators; tracking internal stock purchases; perusing server-stored email), unless authorized to conduct such analyses for stated business efforts that conform to ethical norms.
- Do not circumvent the nature or level of data access given to others by providing access to data sets that are broader than those available to them via their own approved levels of access (e.g., providing a company-wide data set of human resource information to a coworker who only has approved access to a single human resource department), unless authorized by superiors with stated business goals that conform to ethical norms.
- Do not facilitate another’s illegal access to the company’s administrative systems or compromise the integrity of the systems data by sharing your passwords or other information that would affect the security of the data and its management or usage.
- Do not choose data to support a pre-determined outcome for analysis. Research should be performed always with an un-biased perspective, without pre-conceived conclusions or intentions to sway the reader or final user of the results. Selective choice of data or sets of data that are suspected to support an expected conclusion is considered to be an act of unprofessional behavior for any person in any research or analytical field.
The application of appropriate data and information security is an essential component to support ethical data use.
Data Management Code of Ethics
Various organizations have developed Code of Ethics for their data management professionals. These codes define a high standard of professional excellence, highlight a set of desired behaviors, and provide a framework to guide daily decision-making for the ethical use of the processes and tools that data quality professionals apply in their data management activities. The codes attempt to demonstrate how the performance of any action can influence all other actions and can affect the course of one’s general behavior. One example is the Accenture “Principles of Data Ethics”. Another example comes from the International Data Quality Association “Code of Ethics and Professional Conduct”.
The ethical use of data can be applied to the creation and analysis/interpretation of data in reports or similar documents, especially those that are drawn from a data warehouse or other big data source. Presenting data from a pre-determined view (e.g., looking for evidence to support an already-chosen outcome or result) is a common activity in some organizations, even in organizations that espouse the traditional definition of “data integrity”.
Impartial Presentation of Data
Reports are written by analysts who can be considered as “custodians” of data, defined as “anyone who has access to, receives or dispenses data”. As custodians, these analysts should be expected to present data in an un-biased format, so the final recipient of the results or report can draw their own conclusions. Giving the final users of the data the opportunity and freedom to make decisions and form conclusions with un-biased data should be a goal of all data management staff, in both transactional systems and with decision support and data warehousing / big data systems.
Why is the impartial presentation of data an issue for data warehouse / big data management? Data drawn from a data warehouse or other big data source can be combined in ways not expected with traditional systems, since the dimensional view of data in a data warehouse allows users and analysts to relate previously unrelated values. Data from big data sources can be combined in a myriad of ways, unknown to the final reader and without the knowledge of those who created the original sources. These new relationships can result in the presentation of data that can be slanted toward or against a particular outcome, and this bias may be invisible to the eventual report reader, making the use of the results biased or skewed and that bias not suspected by the user.
Allowing analysts to display or present data impartially is considered a core integrity value in many organizations that have adopted Peter Block’s “Organizational Stewardship” approach, since it gives the report reader (i.e. final user) the opportunity to use the data as the reader wishes, without the need to filter the analysis through a distant analyst’s prism, one the final reader cannot know. Block’s view of stewardship can be defined succinctly as “giving order to the dispersion of power, moving choice, resources and control to the edges of the organization where actual activity occurs. Recently, this un-biased approach has been used by some investment firms to overcome the impression that their analysts have presented past results in less-than-impartial terms to external customers. The impartial presentation of the facts, with the opportunity for the final user to draw fact-based conclusions could become an objective of data warehousing and other decision support or big data systems’ performance measurements. Such an objective could enhance the integrity of the organizations that use decision support systems to generate data used in multiple levels of analysis.
However, attempts to present data without bias can be taken to an extreme, and can result in analysts abdicating their responsibilities to provide advice and guidance in interpreting complicated data. “Impartial” does not have to mean “without any attempt at clarification or examination” since many uses of data require some interpretation and deduction to be operable. Analysts, and data custodians in general, should strive for a balance between the raw presentation of data versus the tendency to slant the presentation to serve a pre-determined outcome or decision. Data management professionals such as data governance and data quality professionals can assist analysts to develop this balanced approach by working to develop guidelines for data integrity that include this impartial yet advisory presentation of data. Data management and data governance leaders can organize and champion this effort by educating all members of the organization about the need for using data with integrity, and by facilitating discussions on the importance of the un-biased yet examined analysis of the organization’s data to internal and external customers.
Adopting a new definition of “data integrity” could lead to expanding an awareness of the need for active data management within organizations, especially from data quality professionals. All data management professionals can foster the new, ethical, data integrity approach through communication of the possibility of impartial presentation and use of data and by exhibiting the principles of true, ethically based data integrity in their development of standards, definitions and guidelines for data usage.