One of the main areas of responsibility for anyone concerned with the management of data is the enforcement of data integrity.  Most data management texts define data integrity as “attention to the consistency, accuracy and correctness of data stored in a database or other electronic file”.  Commonly, data integrity refers to the validity of data in its incarnations (electronic, paper, etc.).  Additionally, data integrity can be defined as the extent to which all data are complete, consistent, and accurate throughout the data lifecycle.

Understanding the importance of data ethics is paramount. As data processes evolve, particularly with big data, ethical practices become critical to avoid breaches and biases, underscoring the growing significance of data ethics in contemporary data management. This makes data ethics important for building trust with customers, ensuring regulatory compliance, and preventing data misuse.

This approach is primarily a reactive one and is focused on the rules used to create and store data values, by creating and storing the “right” values for each data element.  Many data management professionals use the term “data integrity” in this manner.

What is Data Ethics?

Definition and Importance

Data ethics refers to the moral principles and guidelines that govern the responsible collection, processing, and use of data. In today’s digital age, where data drives decision-making processes across various sectors, understanding the importance of data ethics is paramount. Ethical data practices ensure that data is used in a manner that respects individual rights, promotes fairness, and minimizes potential harm.

Data ethics is crucial for businesses aiming to build trust with their customers as well as to ensure compliance with regulations and avoid the misuse of data. It helps prevent privacy breaches, bias, and reputational damage. By prioritizing data ethics, organizations can maintain a strong reputation, foster customer loyalty, and contribute to a more responsible and ethical data ecosystem. In essence, data ethics is the foundation upon which trustworthy and sustainable data practices are built

Definition of Integrity and Data Management

However, the use and management of data in its most robust and active form should be concerned with much more than simply enforcing rules for creating and storing “right” data values.  At the least, all those who are part of the data management professions have the responsibility for the proper use and welfare of the assets that represent the people, places and things that are of value to the organization.  This view of data management and the integrity of data encompasses the base definition of the word “integrity”, “a firm adherence to a code – especially for moral or artistic values” (Merriam-Webster dictionary, 2012).  This definition serves as the foundation for developing a view of data management that includes an alternate definition for “data integrity”.

This alternate definition is one that has as its goal the use and presentation of data without bias, refusing to allow data to support only one point of view to the exclusion of any competing view. This view takes the most ethical approach and uses the ideas of honesty and incorruptibility found in the definition of integrity to support the activities of all who manage any aspect of data – those responsible for data governance, master data management, data quality management. An ethical approach to data management would be the care and use of data with integrity, without bias or predetermined results, and without any actions that would cause harm or affect negatively the storage or use of that data. Data analysis, in particular, must be conducted with an awareness of its ethical implications and potential consequences, as it can inadvertently lead to harm or bias, affecting individuals or communities.

Therefore, “data integrity” can be defined as “managing and using data according to a code or set of values, with honesty”.  Data quality professionals can serve as the vanguard of this new approach to data integrity through their oversight of the data rules, values, and application of standards to the data in their areas of responsibility. 

Key Principles of Data Ethics

Data Ethics Principles

The key principles of data ethics include:

  1. Informed Consent: Obtaining explicit permission from individuals before collecting their personal data. This ensures that individuals are aware of and agree to the data collection processes and purposes.
  2. Transparency: Clearly communicating how data is collected, stored, and used. Transparency builds trust by allowing individuals to understand the data processes and the reasons behind them.
  3. Data Minimization: Collecting and retaining only the minimum data necessary for the intended purpose. This principle helps reduce the risk of data breaches and data misuse by limiting the amount of data collected.
  4. Data Security: Implementing appropriate measures to protect data against unauthorized access, alteration, destruction, or disclosure. Robust data security practices are essential to safeguard sensitive information and maintain data integrity.
  5. Accuracy and Quality: Ensuring the accuracy, completeness, and reliability of data throughout its life cycle. High-quality data is crucial for making informed decisions and maintaining the credibility of data-driven insights.
  6. Accountability: Establishing mechanisms for accountability and oversight to ensure compliance with ethical data practices. Accountability ensures that organizations adhere to ethical standards and are held responsible for their data practices.

These principles provide a structured approach to ensuring that data is used in a responsible and ethical manner. By adhering to these principles, organizations can build trust with their stakeholders, protect individual rights, and contribute to a more responsible and ethical data ecosystem. Ethical data practices ensure that data processes are aligned with moral values, fostering a culture of integrity and respect in data management.

Ethical Uses of Data and Prohibitions

However, data quality professionals are not the only ones responsible for creating and maintaining an ethical approach to data management.  By choosing to teach all data management professionals of the ethical and honest uses of the data in their areas, organizations can formulate and articulate the acceptable uses of their data – and the consequences of unacceptable uses.  Some commonly accepted prohibitions on the use of data, taken from Cornell University’s Information Use in the 21st Century project include:

  • Do not use information (even if authorized to access it) to support actions by which individuals might profit (e.g., a change in salary, title, or similar administrative category). Do not disclose information about individuals without prior supervisor authorization.
  • Do not engage in what might be termed “administrative voyeurism” (e.g., tracking the pattern of salary raises; determining the source and/or destination of telephone calls or Internet protocol addresses; exploring race and ethnicity indicators; tracking internal stock purchases; perusing server-stored email), unless authorized to conduct such analyses for stated business efforts that conform to ethical norms.
  • Do not circumvent the nature or level of data access given to others by providing access to data sets that are broader than those available to them via their own approved levels of access (e.g., providing a company-wide data set of human resource information to a coworker who only has approved access to a single human resource department), unless authorized by superiors with stated business goals that conform to ethical norms.
  • Do not facilitate another’s illegal access to the company’s administrative systems or compromise the integrity of the systems data by sharing your passwords or other information that would affect the security of the data and its management or usage.
  • Do not choose data to support a pre-determined outcome for analysis.  Research should be performed always with an un-biased perspective, without pre-conceived conclusions or intentions to sway the reader or final user of the results.  Selective choice of data or sets of data that are suspected to support an expected conclusion is considered to be an act of unprofessional behavior for any person in any research or analytical field.

The application of appropriate data and information security is an essential component to support ethical data use.

Trust, Privacy, and Data Governance

In an increasingly data-driven world, prioritizing data privacy and governance is vital for organizations aiming to build trust. Individuals have ownership over their personal information, making it imperative for organizations to respect these rights through transparent practices. Explicit consent must be obtained before collecting personal data, ensuring individuals are informed about the data’s purpose and usage.

To protect sensitive information, organizations should adopt robust security measures, such as encryption and access controls, minimizing the risk of data breaches. Additionally, regulatory compliance, such as adhering to GDPR or CCPA, strengthens an organization’s credibility by demonstrating accountability. Regular audits of data practices not only ensure adherence to ethical norms but also prevent unintended consequences of data misuse. These measures collectively support responsible innovation and safeguard the interests of all stakeholders.

Ethical Challenges in Data Science and AI

As data-driven technologies continue to reshape the digital age, ethical challenges in data science and artificial intelligence have gained prominence. Data scientists play a pivotal role in ensuring that data collection methods align with ethical principles and that data infrastructures are designed to protect sensitive data. Training data must be scrutinized for biases, as they can significantly affect the outcomes of machine learning models.

To address unintended consequences, organizations should establish mechanisms for evaluating ethical implications during data projects. This includes balancing societal values, legal consequences, and security risks. Clear and accessible information about data practices fosters transparency, helping data subjects understand their rights. As part of responsible innovation, integrating ethical considerations into big data projects and financial transactions safeguards both individual and societal well-being. By embracing these practices, businesses can protect data and promote trust in an increasingly interconnected world.

Data Management Code of Ethics

Various organizations have developed Code of Ethics for their data management professionals.  These codes define a high standard of professional excellence, highlight a set of desired behaviors, and provide a framework to guide daily decision-making for the ethical use of the processes and tools that data quality professionals apply in their data management activities.   The codes attempt to demonstrate how the performance of any action can influence all other actions and can affect the course of one’s general behavior.  One example is the Accenture “Principles of Data Ethics”. Another example comes from the International Data Quality Association “Code of Ethics and Professional Conduct”.

The ethical use of data can be applied to the creation and analysis/interpretation of data in reports or similar documents, especially those that are drawn from a data warehouse or other big data source.  Presenting data from a pre-determined view (e.g., looking for evidence to support an already-chosen outcome or result) is a common activity in some organizations, even in organizations that espouse the traditional definition of “data integrity”. 

Impartial Presentation of Data

Reports are written by analysts who can be considered as “custodians” of data, defined as “anyone who has access to, receives or dispenses data”. As custodians, these analysts should be expected to present data in an un-biased format, so the final recipient of the results or report can draw their own conclusions.  Giving the final users of the data the opportunity and freedom to make decisions and form conclusions with un-biased data should be a goal of all data management staff, in both transactional systems and with decision support and data warehousing / big data systems.

Why is the impartial presentation of data an issue for data warehouse / big data management? Data drawn from a data warehouse or other big data source can be combined in ways not expected with traditional systems, since the dimensional view of data in a data warehouse allows users and analysts to relate previously unrelated values.  Data from big data sources can be combined in a myriad of ways, unknown to the final reader and without the knowledge of those who created the original sources.  These new relationships can result in the presentation of data that can be slanted toward or against a particular outcome, and this bias may be invisible to the eventual report reader, making the use of the results biased or skewed and that bias not suspected by the user.

Allowing analysts to display or present data impartially is considered a core integrity value in many organizations that have adopted Peter Block’s “Organizational Stewardship” approach, since it gives the report reader (i.e. final user) the opportunity to use the data as the reader wishes, without the need to filter the analysis through a distant analyst’s prism, one the final reader cannot know.  Block’s view of stewardship can be defined succinctly as “giving order to the dispersion of power, moving choice, resources and control to the edges of the organization where actual activity occurs.  Recently, this un-biased approach has been used by some investment firms to overcome the impression that their analysts have presented past results in less-than-impartial terms to external customers.  The impartial presentation of the facts coupled with appropriate training, and with the opportunity for the final user to draw fact-based conclusions could become an objective of data warehousing and other decision support or big data systems’ performance measurements. Such an objective could enhance the integrity of the organizations that use decision support systems to generate data used in multiple levels of analysis.

However, attempts to present data without bias can be taken to an extreme, and can result in analysts abdicating their responsibilities to provide advice and guidance in interpreting complicated data.  “Impartial” does not have to mean “without any attempt at clarification or examination” since many uses of data require some interpretation and deduction to be operable.  Analysts, and data custodians in general, should strive for a balance between the raw presentation of data versus the tendency to slant the presentation to serve a pre-determined outcome or decision.  Data management professionals such as data governance and data quality professionals can assist analysts to develop this balanced approach by working to develop guidelines for data integrity that include this impartial yet advisory presentation of data.  Data management and data governance leaders can organize and champion this effort by educating all members of the organization about the need for using data with integrity, and by facilitating discussions on the importance of the un-biased yet examined analysis of the organization’s data to internal and external customers.

Conclusion

Adopting a new definition of “data integrity” could lead to expanding an awareness of the need for active data management within organizations, especially from data quality professionals.  All data management professionals can foster the new, ethical, data integrity approach through communication of the possibility of impartial presentation and use of data and by exhibiting the principles of true, ethically based data integrity in their development of standards, definitions and guidelines for data usage.