1
Consent
Individuals need to provide informed and voluntary consent before any of their data is collected or used. Consent must be transparent, freely given, and revocable at all times.
2
Transparency
Organizations must clearly communicate how data is collected, stored, used, and shared to build trust and allow informed choices.
3
Fairness
Data practices should avoid bias and discrimination, ensuring equitable treatment of all individuals in data-driven decisions.
4
Privacy and Confidentiality
Sensitive and personal information is protected from unauthorized access, breaches, or any misuse.
5
Accountability
Organizations must take responsibility for their data practices and ensure compliance with legal and ethical standards.
6
Data Minimization
Only the minimal absolutely necessary data should be collected and retained to limit exposure and risks.
7
Accuracy and Integrity
Data should be accurate, reliable, and used truthfully without misrepresentation.
8
Security
Strong protective measures like encryption and access controls must safeguard data from harm and improper access.
9
Continuous Monitoring
Ethical frameworks require regular audits and improvements to address emerging risks or violations.
10
Stakeholder Engagement
Data governance should include perspectives on data subjects and all affected parties.
The High Cost of Ethical Blind Spots
Data ethics is about making morally sound judgments related to data that balance technological possibilities with human rights. It should ensure data benefits society without causing harm or unfairness. Organizations adopting ethical data practices build trust, comply with regulations, and support responsible innovation in data use. There is a growing body of use cases that show data ethics gone wrong, including:
Scraping OKCupid’s Personal Data
In 2016, a group of researchers broke the cardinal rule of social science research ethics by releasing about 70,000 OkCupid profiles onto the Open Science Framework, an online community where people share and collaborate over raw data sets. Two Danish researchers scraped the data from the OkCupid website and released publicly identifiable information, such as a user’s age, gender, and sexual orientation. When confronted, the two researchers claimed their actions weren’t ethically wrong, because the data was already public. However, the huge data release raised questions about the ethics of releasing “already public” data. The main concern raised by big data ethics was that even though the information was public, that didn’t mean anyone consented to it being published in an online forum.
According to Brian Resnick , “The data dump did not reveal anyone’s real name. But it’s entirely possible to use clues from a user’s location, demographics, and OkCupid user name to determine their identity.” The Open Science Framework (OSF) is a free, open-source online platform and project management tool created by the Center for Open Science to support and streamline collaborative research. Partly a response to the traditional academic publishing gatekeepers, anyone can publish data to the OSF. The hope is the freely accessible information spurs innovation and keeps scientists accountable for their analyses. However, users must ensure the integrity of their uploaded information.
If Kirkegaard has violated either OKCupid’s or OSF’s terms of use, the data will be removed, says Brian Nosek , the executive director of the Open Science Foundation. This seems likely, but it might be too late. As of May 2016, the dataset had been downloaded almost 500 times. The data ethics genie has been released from the bottle, and it’s probably too late to put it back in.
Cambridge Analytica & Facebook
Probably the most well-known of all data ethics scandals is the Cambridge Analytica and Facebook scandal, which had to do with informed consent, data privacy, transparency, and psychological manipulation. Cambridge Analytica, a British political consulting company, acquired personal data from millions of Facebook users without their explicit consent when it harvested information from users who had completed a personality quiz. Approximately 270,000 people took the quiz, which doesn’t seem like a huge number, but Cambridge Analytica also collected data from all of the quiz takers’ Facebook friends. The sophisticated psychological profiles built upon this data were used to help target voters with personalized political advertisements during the 2016 US presidential election, as well as the Brexit referendum.
The lack of informed consent meant users never understood how their data would be utilized. The case also involved data misuse as Facebook’s platform policies were violated, and there was manipulation on a massive scale, raising questions about the health of the democratic processes.
This resulted in a massive public outcry as well as a $5 billion FTC fine for Facebook, the demise of Cambridge Analytica, and an increase in the global scrutiny of social media platforms. It was a catalyst for regulations like the California Consumer Privacy Act (CCPA), and it pushed platforms to give users more control over their data.
One of the key takeaways from this scandal was that companies had to ensure informed consent was explicit, not hidden deeply within terms of service or complicated legalese. Companies were also made ethically responsible for how third parties used data collected on their platform.
In 2018, Amazon developed an experimental AI recruiting tool to help it in the search for top talent. However, the process turned into a PR nightmare that showed models are susceptible to algorithmic bias, can discriminate, and be unfair. Trained on a decade’s worth of resumes submitted to the company by mostly men candidates, the system penalized resumes that included the word “women’s” (e.g., “women’s soccer captain”) and downgraded graduates of all-women’s colleges. It effectively systematized historical gender bias.
The model demonstrated how biased training data led to biased algorithms, perpetuating societal inequalities under the guise of objective, data-driven decision-making. Once the bias was recognized, Amazon scrapped the project. The company knew it couldn’t guarantee the tool would not be discriminatory. The key data ethics takeaway in this case produced a nice twist on the old modeler’s adage, “Junk in, junk out” — “Bias in, bias out”. Models are a reflection of their training data. Auditing for fairness and bias is not optional; it’s a critical step in the ML development lifecycle.
Google’s Project Nightingale Nightmare
Partnering with Ascension, a major US healthcare system, on “Project Nightingale”, Google gained access to the personal health records of millions of patients across 21 states. The data included names, dates of birth, lab results, doctor diagnoses, and hospitalization records, but patients and doctors weren’t notified about the access.
Google claimed that what they were doing was perfectly legal as it fell under HIPAA’s “business associate” clause. However, the project raised huge transparency issues. Patients had no idea their most sensitive data was being shared with a tech giant that had committed a slew of privacy violations in the past. The secondary use of their data for purposes like developing AI tools was not communicated, clearly violating the principle of ethical transparency.
Internal whistleblowers informed the media, which led to a federal inquiry. The episode sparked a national debate on patient privacy and the role of big tech in healthcare. The most important takeaway: legality does not equate with ethics. Just because you can use data under a law doesn’t mean you should use the data. Without explicit, transparent communication and consideration of patient expectations, companies should avoid the temptation of using data that users haven’t explicitly given them permission to use.
Clearview AI
In 2020, Clearview AI developed a facial recognition tool by scraping billions of public images from websites like Facebook, YouTube, Venmo, and others without asking for consent from the individuals or even the websites where the data resided. Clearview AI then sold law enforcement agencies in the US access to the database, including U.S. Immigration and Customs Enforcement (ICE), the FBI, the U.S. Department of Homeland Security, the U.S. Border Patrol, and the Department of Defense.
This case challenged the notion of “public” data. Although one could argue that the images were publicly available online, individuals had a reasonable expectation that their photos would not be collected into a global perpetual facial database for identification purposes. Clearview AI’s actions represent a massive violation of contextual integrity, i.e., data collected in one context (social sharing) was used in a completely different one. This sets a dangerous precedent for mass surveillance, and now that the facial recognition genie is out of the bottle, there’s no way to put it back in.
What works in the United States doesn’t always translate to other places. Clearview AI has been fined millions of dollars in Europe and the UK. It has been banned from operating in several countries and faces multiple lawsuits in the US. It still remains a subject of intense legal and ethical debate. Like the earlier OKCupid case, publicly available data can still be unethically sourced. The context and potential for harm in how data is used and aggregated must be central to any ethical evaluation of data use.
Establishing a Robust Data Ethics Framework
As the examples above show, there are a lot of ways for organizations to get into ethical trouble. Getting caught on the wrong side of the law can be enormously costly. Silicon Valley’s cavalier “Ask forgiveness, not permission” philosophy sounds great in practice, but can come with a huge price tag, as Facebook found out after its Cambridge Analytica scandal.
Implementing a strong data ethics framework starts with data collection, which involves gathering and storing user data from various sources. This essential process must be done in an ethical and responsible manner. Informed consent is mandatory. Individuals must be made aware of how their data will be used and shared. Data analysts must take into account the ethical implications of data collection, including the potential for biased algorithms and privacy violations.
In today’s big data world, the amount of data collected can be overwhelming, but it is essential to manage it in a way that respects individual privacy and promotes ethical data use. Big data ethics requires ensuring that data collection, storage, and usage are done in a way that respects individual privacy and promotes ethical data use. Data security, protecting sensitive data, and promoting responsible innovation are all paramount.
AI, machine learning, and deep learning modeling utilize massive data sets and are particularly susceptible if biased data is used. When building these types of models, corporations need to be particularly cognizant of the inherent ethical concerns in their data, including the potential for biased algorithms and unintended consequences.
The Zero Trust Mindset
Data privacy and security are some of the most important aspects of ethical data management because they involve protecting individual privacy while preventing unauthorized access to data. Data professionals must ensure that data is stored and transmitted securely, and that access to data is restricted to authorized personnel only.
Keeping data safe in the modern digital landscape requires a layered approach, often called “defense in depth.” No single solution is foolproof, but combining multiple strategies can create a robust shield. Users should have strong, unique passwords. Operating systems, applications, web browsers, and firmware should be kept up to date with all software updates. Most breaches exploit known vulnerabilities that have already been patched. Multi-factor authentication or two-factor authentication should be required for user access.
Companies should embrace a “Zero Trust” mindset. This is a security model that assumes a breach is inevitable or has already happened. It requires strict identity verification for every person and device trying to access resources on a private network. The use of encryption and other security measures can help to protect data privacy and security. Data breaches can have serious consequences, including financial losses and reputational damage.
Security is a never-ending process. By layering security strategies one atop the other, companies can create a formidable defense that protects their most valuable digital assets while also providing the framework for a powerful data ethics structure.
Common Themes and Lessons Learned
A common thread runs through all of the above examples. Users must be clearly informed about how their data is collected and used. Proactive auditing is essential to ensure models don’t encode and scale human bias. Context always matters. Using data for a purpose not intended by its original reason for collection is a major ethical violation.
The above cases often involve powerful organizations making decisions that significantly impacted less-powerful individuals. Beyond legality, ethical data practices often require going beyond what is strictly legal and taking into account what is morally right.
Core Ethical Principles Enforced by Legislation
While the laws around data ethics vary, the following tenets are consistently promoted:
Ethical Principle What It Means Legislation That Enforces It Transparency Being open about what data is collected and how it is used. GDPR, CCPA, EU AI Act Fairness & Non-Discrimination Ensuring algorithms and data processes do not produce biased or discriminatory outcomes. EU AI Act, NYC Local Law 144, ECOA Accountability Organizations are responsible for complying with principles and must be able to demonstrate that compliance. GDPR, EU AI Act Privacy by Design Integrating data protection into the design of projects and processes from the outset. GDPR, PIPEDA Human-in-the-Loop Ensuring human oversight of significant automated decisions. GDPR, EU AI Act
Data Ethics: The Moral Compass for Data Handling
In his article, Aspects of Data Ethics in a Changing World: Where Are We Now? , David J. Hand states, “In considering ethical matters, we must consider both current and future uses of data. Progress in data science and technology is often described as if it were a question of reaching a new status quo: as if, once we have developed and implemented tools for handling the vast data sets and real-time issues, we can relax. However, that is to misunderstand the nature of the changes we are witnessing.”
Hand believes these changes will be ongoing. With data ethics, the only constant is change, as the famous saying goes. “We are not approaching a plateau but are on the slopes of doubtless even more dramatic changes,” Hand says. The current technology we have is good, but new data technologies will need to be created, Hand believes, adding that the current examples, such as blockchain, homomorphic computation, and quantum computing, are a good start. Data ethics will be a constant problem for corporations. It will never be solved. It will constantly require updates and new takes.
However, it is a fight worth having. Ethical data usage promotes trust and accountability between an organization and its customers. It also helps prevent data breaches and other security incidents and can promote responsible innovation. The benefits of ethical data usage include improved decision-making, increased efficiency, and enhanced corporate reputation. It ensures that data will be used for the greater good, a battle that has never been more important.