Change is the Only Constant

Data ethics is a branch of ethics that deals with the moral obligations and responsibilities encompassing the collection, sharing, and use of data, particularly in the digital age. It involves understanding the ethical implications of data collection, storage, and usage. It ensures that data practices are fair, transparent, and respectful of individual privacy. Data ethics is important because it fosters trust and accountability in organizations that handle data while also ensuring that data is used for the greater good. One of the core principles of the General Data Protection Regulation (GDPR) and the California Consumer Privacy Act is that data must be handled ethically.

Data ethics is about making morally sound judgments on data that balance technological possibilities with human rights. Data should benefit society without causing harm or unfairness. Organizations adopting ethical data practices build trust, comply with government regulations, and support responsible innovation in the use of data. However, implementing a data ethics framework within a company is not easy. The ancient Greek philosopher Heraclitus once said, “change is the only constant”, and this is a good concept to keep in mind when implementing data ethics corporate-wide.

There have been several highly embarrassing data ethics episodes over the past few decades. These include the unethical OkCupid data scrape, the Cambridge Analytica Facebook scandal, which resulted in such massive reputational damage that Cambridge Analytica went defunct. Other scandals include Amazon’s AI Recruiting Tool debacle, Google’s Project Nightingale’s patient privacy nightmare, and Clearview AI’s clear violation of personal facial recognition data. Each incident created its own unique ethical problem, showing how complicated the subject of data ethics can be.

An Eye for an Eye: From Hippocrates to HIPAA

Ethics is not new, of course, it’s a part of the human condition. It goes back to the beginning of human history, but it has always been an ethereal thing. Philosophically, Study.com claims it is “the systematic approach to theorizing about what is morally good and how judgments about the good are made. What ‘the good’ means for various philosophers differs in their emphasis on actions or consequences, rules or duties, or one’s disposition or general mode of life. These emphases are not necessarily mutually exclusive and can inform each other within a given ethical theory.”

Ethics has a long, long history, going all the way back to antiquity. In Ancient Egypt, the concept of Maat, which represented truth, balance, order, harmony, law, morality, and justice, became the bedrock of Egyptian ethics. The pharaoh’s duty was to uphold Maat on Earth. In the afterlife, a person’s heart was weighed against the feather of Maat. A life in accordance with Maat meant passing this judgment and achieving immortality. Ethics was about living in harmony within a divine cosmic order.

In Mesopotamia, the Code of Hammurabi is one of the oldest and most detailed legal codes. The famous principle of “an eye for an eye” attempts to establish proportional justice as well as prevent endless blood feuds. While not a philosophical treatise, it established a public, standardized concept of justice and responsibility.

Not until the mid-20th Century did the concepts of privacy and confidentiality gain traction. The origins of data ethics can be traced back to concerns arising from the advent of computers and automated data processing. Early discussions centered around protecting individuals’ privacy and ensuring the confidential handling of personal information.

Data Ethics in the Modern World

In the early 1970s, a U.S. government advisory committee sought to address growing concerns about the misuse of computerized data systems, so they developed the Fair Information Practice Principles (FIPPs). This was a fundamental framework for privacy policy and data protection. Dubbed the “Bill of Rights” for personal data, FIPP emphasized transparency, accountability, and control over personal data. It forms the ethical and conceptual backbone of nearly all modern privacy laws, including Europe’s GDPR and California’s CCPA.

The rise of big data, data mining, and analytics saw the value of data skyrocket. This raised new ethical dilemmas about anonymization, consent, algorithmic bias, and the commodification of personal data.

Today, data ethics has evolved to encompass fairness, consent, stewardship, transparency, and harm avoidance. This is especially so when it comes to AI and machine learning. Discussions now integrate ethical challenges posed by sophisticated data use, algorithmic decision-making, and societal impacts. Data ethics is now seen as an ongoing ethos guiding responsible behavior around data. Companies must maintain a delicate balancing act between innovation and safeguarding human rights, social equity, and corporate trust.

Legislation

In 1996, President Clinton signed the Health Insurance Portability and Accountability Act (HIPAA) into law. It aimed to improve the portability and accountability of health insurance coverage, protect sensitive health information, and reduce healthcare fraud and abuse. It was one of the first data privacy acts to target a specific industry. Later acts included the Gramm-Leach-Bliley Act (GLBA), which protects financial institutions and the personal financial data of their customers as well as the Fair Credit Reporting Act (FCRA), which governs the collection, use, and sharing of consumer credit information by credit reporting agencies. The Payment Card Industry Data Security Standard (PCI DSS) sets security standards for organizations that handle credit card information. Currently, legislators have AI in their sites, with multiple states pushing through legislation that addresses this fast-moving science.

However, finding the right balance between strong government control and letting companies do as they please is difficult. As the leading data ethicists Luciano Floridi and Mariarosaria Taddeo explain in their paper, What is data ethics?, “On the one hand, overlooking ethical issues may prompt negative impact and social rejection…On the other hand, overemphasizing the protection of individual rights in the wrong contexts may lead to regulations that are too rigid, and this in turn can cripple the chances to harness the social value of data science.”

Understanding Data Ethics
Understanding Data Ethics

The Core Principles of Data Ethics

“Data ethics builds on the foundation provided by computer and information ethics, but, at the same time, it refines the approach endorsed so far in this research field, by shifting the level of abstraction of ethical enquiries, from being information-centric to being data-centric. This shift brings into focus the different moral dimensions of all kinds of data, even data that never translates directly into information but can be used to support actions or generate behaviours, for example,” say Floridi and Taddeo.

Data Ethics Principles

Data ethics emphasizes trust, fairness, privacy, transparency, accountability, and security in how data is handled. It can be broken down into the following types:

  • 1

    Consent

    Individuals need to provide informed and voluntary consent before any of their data is collected or used. Consent must be transparent, freely given, and revocable at all times.

  • 2

    Transparency

    Organizations must clearly communicate how data is collected, stored, used, and shared to build trust and allow informed choices.

  • 3

    Fairness

    Data practices should avoid bias and discrimination, ensuring equitable treatment of all individuals in data-driven decisions.

  • 4

    Privacy and Confidentiality

    Sensitive and personal information is protected from unauthorized access, breaches, or any misuse.

  • 5

    Accountability

    Organizations must take responsibility for their data practices and ensure compliance with legal and ethical standards.

  • 6

    Data Minimization

    Only the minimal absolutely necessary data should be collected and retained to limit exposure and risks.

  • 7

    Accuracy and Integrity

    Data should be accurate, reliable, and used truthfully without misrepresentation.

  • 8

    Security

    Strong protective measures like encryption and access controls must safeguard data from harm and improper access.

  • 9

    Continuous Monitoring

    Ethical frameworks require regular audits and improvements to address emerging risks or violations.

  • 10

    Stakeholder Engagement

    Data governance should include perspectives on data subjects and all affected parties.

  • The High Cost of Ethical Blind Spots

    Data ethics is about making morally sound judgments related to data that balance technological possibilities with human rights. It should ensure data benefits society without causing harm or unfairness. Organizations adopting ethical data practices build trust, comply with regulations, and support responsible innovation in data use. There is a growing body of use cases that show data ethics gone wrong, including:

    Scraping OKCupid’s Personal Data

    In 2016, a group of researchers broke the cardinal rule of social science research ethics by releasing about 70,000 OkCupid profiles onto the Open Science Framework, an online community where people share and collaborate over raw data sets. Two Danish researchers scraped the data from the OkCupid website and released publicly identifiable information, such as a user’s age, gender, and sexual orientation. When confronted, the two researchers claimed their actions weren’t ethically wrong, because the data was already public. However, the huge data release raised questions about the ethics of releasing “already public” data. The main concern raised by big data ethics was that even though the information was public, that didn’t mean anyone consented to it being published in an online forum.

    According to Brian Resnick, “The data dump did not reveal anyone’s real name. But it’s entirely possible to use clues from a user’s location, demographics, and OkCupid user name to determine their identity.” The Open Science Framework (OSF) is a free, open-source online platform and project management tool created by the Center for Open Science to support and streamline collaborative research. Partly a response to the traditional academic publishing gatekeepers, anyone can publish data to the OSF. The hope is the freely accessible information spurs innovation and keeps scientists accountable for their analyses. However, users must ensure the integrity of their uploaded information.

    If Kirkegaard has violated either OKCupid’s or OSF’s terms of use, the data will be removed, says Brian Nosek, the executive director of the Open Science Foundation. This seems likely, but it might be too late. As of May 2016, the dataset had been downloaded almost 500 times. The data ethics genie has been released from the bottle, and it’s probably too late to put it back in.

    Cambridge Analytica & Facebook

    Probably the most well-known of all data ethics scandals is the Cambridge Analytica and Facebook scandal, which had to do with informed consent, data privacy, transparency, and psychological manipulation. Cambridge Analytica, a British political consulting company, acquired personal data from millions of Facebook users without their explicit consent when it harvested information from users who had completed a personality quiz. Approximately 270,000 people took the quiz, which doesn’t seem like a huge number, but Cambridge Analytica also collected data from all of the quiz takers’ Facebook friends. The sophisticated psychological profiles built upon this data were used to help target voters with personalized political advertisements during the 2016 US presidential election, as well as the Brexit referendum.

    The lack of informed consent meant users never understood how their data would be utilized. The case also involved data misuse as Facebook’s platform policies were violated, and there was manipulation on a massive scale, raising questions about the health of the democratic processes.

    This resulted in a massive public outcry as well as a $5 billion FTC fine for Facebook, the demise of Cambridge Analytica, and an increase in the global scrutiny of social media platforms. It was a catalyst for regulations like the California Consumer Privacy Act (CCPA), and it pushed platforms to give users more control over their data.

    One of the key takeaways from this scandal was that companies had to ensure informed consent was explicit, not hidden deeply within terms of service or complicated legalese. Companies were also made ethically responsible for how third parties used data collected on their platform.

    Amazon’s AI Recruiting Tool

    In 2018, Amazon developed an experimental AI recruiting tool to help it in the search for top talent. However, the process turned into a PR nightmare that showed models are susceptible to algorithmic bias, can discriminate, and be unfair. Trained on a decade’s worth of resumes submitted to the company by mostly men candidates, the system penalized resumes that included the word “women’s” (e.g., “women’s soccer captain”) and downgraded graduates of all-women’s colleges. It effectively systematized historical gender bias.

    The model demonstrated how biased training data led to biased algorithms, perpetuating societal inequalities under the guise of objective, data-driven decision-making. Once the bias was recognized, Amazon scrapped the project. The company knew it couldn’t guarantee the tool would not be discriminatory. The key data ethics takeaway in this case produced a nice twist on the old modeler’s adage, “Junk in, junk out” — “Bias in, bias out”. Models are a reflection of their training data. Auditing for fairness and bias is not optional; it’s a critical step in the ML development lifecycle.

    Google’s Project Nightingale Nightmare

    Partnering with Ascension, a major US healthcare system, on “Project Nightingale”, Google gained access to the personal health records of millions of patients across 21 states. The data included names, dates of birth, lab results, doctor diagnoses, and hospitalization records, but patients and doctors weren’t notified about the access.

    Google claimed that what they were doing was perfectly legal as it fell under HIPAA’s “business associate” clause. However, the project raised huge transparency issues. Patients had no idea their most sensitive data was being shared with a tech giant that had committed a slew of privacy violations in the past. The secondary use of their data for purposes like developing AI tools was not communicated, clearly violating the principle of ethical transparency.

    Internal whistleblowers informed the media, which led to a federal inquiry. The episode sparked a national debate on patient privacy and the role of big tech in healthcare. The most important takeaway: legality does not equate with ethics. Just because you can use data under a law doesn’t mean you should use the data. Without explicit, transparent communication and consideration of patient expectations, companies should avoid the temptation of using data that users haven’t explicitly given them permission to use.

    Clearview AI

    In 2020, Clearview AI developed a facial recognition tool by scraping billions of public images from websites like Facebook, YouTube, Venmo, and others without asking for consent from the individuals or even the websites where the data resided. Clearview AI then sold law enforcement agencies in the US access to the database, including U.S. Immigration and Customs Enforcement (ICE), the FBI, the U.S. Department of Homeland Security, the U.S. Border Patrol, and the Department of Defense.

    This case challenged the notion of “public” data. Although one could argue that the images were publicly available online, individuals had a reasonable expectation that their photos would not be collected into a global perpetual facial database for identification purposes. Clearview AI’s actions represent a massive violation of contextual integrity, i.e., data collected in one context (social sharing) was used in a completely different one. This sets a dangerous precedent for mass surveillance, and now that the facial recognition genie is out of the bottle, there’s no way to put it back in.

    What works in the United States doesn’t always translate to other places. Clearview AI has been fined millions of dollars in Europe and the UK. It has been banned from operating in several countries and faces multiple lawsuits in the US. It still remains a subject of intense legal and ethical debate. Like the earlier OKCupid case, publicly available data can still be unethically sourced. The context and potential for harm in how data is used and aggregated must be central to any ethical evaluation of data use.

    Legal Advice And Counseling For Digital Technologi

    Establishing a Robust Data Ethics Framework

    As the examples above show, there are a lot of ways for organizations to get into ethical trouble. Getting caught on the wrong side of the law can be enormously costly. Silicon Valley’s cavalier “Ask forgiveness, not permission” philosophy sounds great in practice, but can come with a huge price tag, as Facebook found out after its Cambridge Analytica scandal.

    Implementing a strong data ethics framework starts with data collection, which involves gathering and storing user data from various sources. This essential process must be done in an ethical and responsible manner. Informed consent is mandatory. Individuals must be made aware of how their data will be used and shared. Data analysts must take into account the ethical implications of data collection, including the potential for biased algorithms and privacy violations.

    In today’s big data world, the amount of data collected can be overwhelming, but it is essential to manage it in a way that respects individual privacy and promotes ethical data use. Big data ethics requires ensuring that data collection, storage, and usage are done in a way that respects individual privacy and promotes ethical data use. Data security, protecting sensitive data, and promoting responsible innovation are all paramount.

    AI, machine learning, and deep learning modeling utilize massive data sets and are particularly susceptible if biased data is used. When building these types of models, corporations need to be particularly cognizant of the inherent ethical concerns in their data, including the potential for biased algorithms and unintended consequences.

    The Zero Trust Mindset

    Data privacy and security are some of the most important aspects of ethical data management because they involve protecting individual privacy while preventing unauthorized access to data. Data professionals must ensure that data is stored and transmitted securely, and that access to data is restricted to authorized personnel only.

    Keeping data safe in the modern digital landscape requires a layered approach, often called “defense in depth.” No single solution is foolproof, but combining multiple strategies can create a robust shield. Users should have strong, unique passwords. Operating systems, applications, web browsers, and firmware should be kept up to date with all software updates. Most breaches exploit known vulnerabilities that have already been patched. Multi-factor authentication or two-factor authentication should be required for user access.

    Companies should embrace a “Zero Trust” mindset. This is a security model that assumes a breach is inevitable or has already happened. It requires strict identity verification for every person and device trying to access resources on a private network. The use of encryption and other security measures can help to protect data privacy and security. Data breaches can have serious consequences, including financial losses and reputational damage.

    Security is a never-ending process. By layering security strategies one atop the other, companies can create a formidable defense that protects their most valuable digital assets while also providing the framework for a powerful data ethics structure.

    Common Themes and Lessons Learned

    A common thread runs through all of the above examples. Users must be clearly informed about how their data is collected and used. Proactive auditing is essential to ensure models don’t encode and scale human bias. Context always matters. Using data for a purpose not intended by its original reason for collection is a major ethical violation.

    The above cases often involve powerful organizations making decisions that significantly impacted less-powerful individuals. Beyond legality, ethical data practices often require going beyond what is strictly legal and taking into account what is morally right.

    Core Ethical Principles Enforced by Legislation

    While the laws around data ethics vary, the following tenets are consistently promoted:

    Ethical PrincipleWhat It MeansLegislation That Enforces It
    TransparencyBeing open about what data is collected and how it is used.GDPR, CCPA, EU AI Act
    Fairness & Non-DiscriminationEnsuring algorithms and data processes do not produce biased or discriminatory outcomes.EU AI Act, NYC Local Law 144, ECOA
    AccountabilityOrganizations are responsible for complying with principles and must be able to demonstrate that compliance.GDPR, EU AI Act
    Privacy by DesignIntegrating data protection into the design of projects and processes from the outset.GDPR, PIPEDA
    Human-in-the-LoopEnsuring human oversight of significant automated decisions.GDPR, EU AI Act

    Data Ethics: The Moral Compass for Data Handling

    In his article, Aspects of Data Ethics in a Changing World: Where Are We Now?, David J. Hand states, “In considering ethical matters, we must consider both current and future uses of data. Progress in data science and technology is often described as if it were a question of reaching a new status quo: as if, once we have developed and implemented tools for handling the vast data sets and real-time issues, we can relax. However, that is to misunderstand the nature of the changes we are witnessing.”

    Hand believes these changes will be ongoing. With data ethics, the only constant is change, as the famous saying goes. “We are not approaching a plateau but are on the slopes of doubtless even more dramatic changes,” Hand says. The current technology we have is good, but new data technologies will need to be created, Hand believes, adding that the current examples, such as blockchain, homomorphic computation, and quantum computing, are a good start. Data ethics will be a constant problem for corporations. It will never be solved. It will constantly require updates and new takes.

    However, it is a fight worth having. Ethical data usage promotes trust and accountability between an organization and its customers. It also helps prevent data breaches and other security incidents and can promote responsible innovation. The benefits of ethical data usage include improved decision-making, increased efficiency, and enhanced corporate reputation. It ensures that data will be used for the greater good, a battle that has never been more important.