Personal Data in context of AI Models - Excerpt of EDPB Opinion 28/2024

Sanjay Basu PhD

MIT Alumnus|Fellow IETE |AI/Quantum|Executive Leader|Author|5x Patents|Life Member-ACM,AAAI,Futurist

Published Dec 18, 2024

European Data Protection Board published their opinion on the processing of personal data in the context of AI models. Here is an excerpt -

The research paper outlines various scenarios concerning the unlawful processing of personal data by controllers in the context of developing AI models and its implications under the General Data Protection Regulation (GDPR).

In Scenario 1, a controller unlawfully processes personal data to develop an AI model, retaining this data within the model for subsequent processing during its deployment phase. The European Data Protection Board (EDPB) emphasizes that the supervisory authorities (SAs) have the power to impose corrective measures on the initial unlawful processing, which can directly affect subsequent processing. This includes the possibility of deleting unlawfully processed data, thereby restricting further processing actions linked to that data. The EDPB asserts that whether the development and deployment of the model constitute separate processing activities must be analyzed on a case-by-case basis. It highlights that if the subsequent processing is predicated on legitimate interests, the unlawful nature of the initial data handling should influence the assessment of such interests.

Scenario 2 addresses a situation where personal data unlawfully processed during model development is then processed by another controller. The EDPB notes the critical need for establishing the roles and responsibilities of different actors under the GDPR framework. Each controller must ensure the lawfulness of its data processing and SAs should evaluate not only the initial processing by the original controller but also the subsequent processing undertaken by the second controller. SAs may also consider whether the second controller conducted sufficient due diligence regarding the model's compliance with lawful processing during development.

In Scenario 3, a controller unlawfully processes personal data but subsequently anonymizes it before deployment. Here, if the anonymization is verifiable and successful, the EU GDPR would not apply to the model's operation, thereby shielding subsequent processing of anonymized data from the implications of the initial unlawful activity. However, mere claims of anonymization do not suffice; SAs need to ensure that such assertions are substantiated through a thorough case-specific evaluation.

The research paper summary focuses on the European Data Protection Board's (EDPB) opinion on the processing of personal data in the development and deployment of artificial intelligence (AI) models. The key points are:

1. The EDPB was requested by the Irish supervisory authority to provide guidance on several issues related to AI and data protection, including when an AI model can be considered anonymous, how controllers can demonstrate the appropriateness of using legitimate interest as a legal basis, and the consequences of unlawful processing during the development phase on the subsequent use of the AI model.

2. Regarding anonymity, the EDPB states that AI models trained on personal data cannot always be considered anonymous, and any claims of anonymity should be assessed on a case-by-case basis. Controllers must demonstrate that the likelihood of directly or indirectly extracting personal data from the model is insignificant.

3. For using legitimate interest as a legal basis, the EDPB provides guidance on the three-step test that supervisory authorities should apply: identifying a legitimate interest, assessing the necessity of the processing, and balancing the legitimate interest against the data subjects' rights and freedoms. Specific risks to fundamental rights in both the development and deployment phases must be considered.

4. Regarding unlawful processing during development, the EDPB outlines three scenarios and the potential consequences. If personal data is retained in the model, the lawfulness of subsequent processing may be impacted. Controllers deploying the model must assess whether the initial development involved unlawful processing. If the model is truly anonymized, the unlawfulness of the initial processing may not impact the subsequent use.

5. The EDPB emphasizes the need for a case-by-case assessment by supervisory authorities, considering the specific context and characteristics of the AI model and its intended use.

Background

On September 4, 2024, the Irish supervisory authority (IE SA) submitted a request to the European Data Protection Board (EDPB) for an opinion concerning the processing of personal data in the context of AI models, seeking clarity on several key issues under the General Data Protection Regulation (GDPR). After a review period, the request was acknowledged as complete by the EDPB by September 13, 2024. The IE SA emphasized the importance of establishing a coherent position on these matters, given the lack of harmonization among national supervisory authorities and the significant implications for data subjects in the European Economic Area (EEA).

The IE SA outlined several questions related to AI model operations. The first query sought to determine whether AI models trained on personal data can ever be classified as not processing personal data, including considerations on how to demonstrate compliance in this context. Further questions explored how data controllers can ensure that their reliance on legitimate interests as a legal basis for processing is appropriately validated, particularly regarding third-party and first-party data. The request also inquired about the ramifications of using unlawfully processed personal data in the creation or development of an AI model and its subsequent operation or processing.

The EDPB recognized the request as a matter of general application under Article 64(2) GDPR, emphasizing that the dynamics of AI technologies pose new challenges for interpreting and applying existing GDPR provisions. They asserted that the rapid proliferation of AI poses numerous implications for data subjects throughout the EEA. Consequently, the opinion aims to support national supervisory authorities in their assessments while remaining consistent with the accountability principle outlined in GDPR.

Importantly, the opinion clarified that while many aspects relevant to AI models are addressed, additional considerations such as the processing of special categories of data, automated decision-making, purpose compatibility, and data protection impact assessments (DPIAs) were acknowledged as relevant but not exhaustively examined in this specific opinion. The EDPB's guidance intends to bolster compliance efforts while championing responsible innovation in AI.

Key Notions

The EDPB provides clarifications on key terminology for the Opinion, defining "first-party data" as personal data collected directly from data subjects, while "third-party data" refers to data obtained from external sources, such as data brokers or web scraping. Web scraping, commonly used to collect publicly available information, can inadvertently include personal data. The EDPB outlines the “life-cycle” of AI models, distinguishing between the “development phase” (which includes creation, training, and fine-tuning) and the “deployment phase” (which involves operational usage). The Opinion emphasizes that both phases can involve personal data processing.

The EU Artificial Intelligence Act defines an "AI system" as a machine-based system that operates autonomously and generates outputs based on input, indicating that the capacity to infer is central to AI systems. Although AI models are crucial to these systems, they are not deemed AI systems independently; they require integration within a broader framework. AI models undergo training processes to learn from data, employing techniques like machine learning to identify patterns.

In discussing personal data in relation to AI models, the EDPB references the GDPR definition of personal data and the context of anonymity. AI models, even those not specifically designed to generate identifiable information, can still retain personal data, absorbed within their parameters. Therefore, such models may not be considered anonymous as identifiable information can potentially be extracted.

The EDPB indicates that the determination of whether an AI model trained on personal data is anonymous requires a case-by-case assessment based on specific criteria. The Opinion highlights the dynamic nature of research regarding the extraction of personal data from AI models, noting potential vulnerabilities that may allow for the unintended recovery of personal information. Overall, while the EDPB offers general guidance, it emphasizes the need for a thorough evaluation of each model's specifics in relation to anonymity and personal data processing.

General consideration regarding anonymization in the context at hand

The research paper discusses the general considerations regarding anonymization in the context of AI models. The key points are:

1. The definition of 'personal data' in the GDPR has a wide scope, encompassing all kinds of information that relate to an identified or identifiable natural person, even if the information is technically organized or encoded in a way that does not make the relation immediately apparent. This is particularly relevant for AI models, where the parameters represent statistical relationships that may allow for the extraction of accurate or inaccurate personal data.

2. To determine if an AI model can be considered anonymous, supervisory authorities (SAs) should check whether personal data related to the training data cannot be extracted from the model, and whether any output produced when querying the model does not relate to the data subjects whose personal data was used to train the model.

3. SAs should evaluate the measures implemented by controllers to ensure and prove the anonymity of an AI model, considering factors like whether the model is publicly available or only accessible to employees. The presence or absence of certain elements is not a conclusive criterion, but SAs should assess methodological choices and the use of privacy-preserving techniques.

4. SAs should evaluate the documentation provided by controllers, including evidence of the model's theoretical resistance to re-identification techniques and the controls designed to limit or assess the success and impact of main attacks.

5. The assessment should also consider the measures regarding the outputs of the model, to lower the likelihood of obtaining personal data related to the training data from queries, even if these measures do not have an impact on the risk of direct extraction of personal data from the model.

6. Overall, the evaluation of the anonymity of an AI model should be done on a case-by-case basis, considering the context of development and deployment, the state of the art, and the effectiveness of the measures implemented by the controller.

Technical measures:

The research paper discusses technical measures that can be taken to mitigate risks while developing AI systems or models, without resulting in the anonymization of the model or violating other GDPR obligations.

One set of measures mentioned under Section 3.2.2 are suitable for this purpose, though the specific details are not provided in this excerpt. In addition to those, the paper outlines several other relevant technical measures that can be considered.

One such measure is pseudonymization, which could involve preventing the combination of data based on individual identifiers. However, the paper notes that this may not be appropriate if the controller can demonstrate a reasonable need to gather different data about a particular individual for the development of the AI system or model.

Another measure suggested is masking or substituting personal data in the training set, such as replacing names and email addresses with fake information. This could be particularly useful when the actual content of the data is not relevant to the overall processing, such as in the case of training large language models.

The paper emphasizes that the selection and implementation of appropriate technical measures should be made on a case-by-case basis, taking into account the specific risks involved, the necessity of the data processing, and the GDPR's requirements. The goal is to find ways to mitigate risks without resorting to complete anonymization of the model or violating other data protection obligations.

Measures that facilitate the exercise of individuals' rights:

The research paper discusses measures that can be taken to facilitate individuals' exercise of their rights in the context of AI development and deployment. The key points are:

a. Observing a reasonable period of time between the collection of a training dataset and its use, allowing data subjects to exercise their rights during this period.

b. Providing an unconditional 'opt-out' option for data subjects to object to the processing before it takes place, beyond the conditions in Article 21 GDPR.

c. Allowing data subjects to exercise their right to erasure even when the specific grounds in Article 17(1) GDPR do not apply.

d. Allowing data subjects to submit claims of personal data regurgitation or memorization, and requiring controllers to assess relevant unlearning techniques.

Conclusion

The paper also discusses transparency measures that could help overcome information asymmetry, such as releasing information beyond GDPR requirements, using alternative communication channels, and providing transparency labels and reports.

In the context of web scraping, specific mitigating measures are suggested, such as excluding sensitive data content, excluding certain websites, and respecting robots.txt or ai.txt files. An opt-out list managed by the controller is also proposed.

For the deployment phase, additional measures are discussed, including technical measures to prevent storage, regurgitation or generation of personal data, and measures to facilitate the exercise of rights like the right to erasure.

The paper emphasizes that supervisory authorities have discretionary power to assess the lawfulness of processing and impose appropriate, necessary and proportionate corrective measures, taking into account the circumstances of each case.

Personal Data in context of AI Models - Excerpt of EDPB Opinion 28/2024

Sanjay Basu PhD

MIT Alumnus|Fellow IETE |AI/Quantum|Executive Leader|Author|5x Patents|Life Member-ACM,AAAI,Futurist

Background

Key Notions

Conclusion

A Technocrat's discernment

4,312 followers

More articles by this author

Others also viewed

An ”Generated” Analysis of Abacus.AI: Privacy, Trustability, and Corporate Integrity

GDPR Article 5 Principles & Novel AI Use Cases

Can GenAI Be Trusted? Why Privacy-First AI is the Key to Responsible Innovation

Data Governance and Privacy in the AI Age: Building Trust Through Responsible Data Practices

Data Privacy vs. AI Innovation: India's Balancing Act

Balancing Innovation and Responsibility: Navigating AI Risks, GDPR, and Intellectual Property Concerns

Anticipating Data Privacy Vulnerabilities in AI Development

Balancing Privacy and Utility: Optimizing Synthetic Data Generation

Importance of Adding Data Governance to AI Implementations in Companies

PDPC’s Denise Wong on Building Trust in Singapore’s Data and AI Future

Explore topics

Background

Key Notions

Conclusion

A Technocrat's discernment

4,312 followers

The Rise of Synthetic Spirituality

Aug 11, 2025

Why Good Intentions Aren't Enough

Aug 8, 2025

Neuroplasticity and the Philosophy of Becoming

Aug 4, 2025

Can Machines Be Trustworthy? The Standards Trying to Make It Happen

Aug 1, 2025

Why LangGraph + LangChain Run Better on OCI

Jul 30, 2025

The Quantum Mind

Jul 22, 2025

VMware, Tanzu, and the Broadcom Reformation (2024–2025 Retrospective)

Jul 20, 2025

V. Emerging Technologies and Applications

Jul 18, 2025

Empire of AI: Dreams and Nightmares in Sam Altman’s OpenAI by Karen Hao

Jul 17, 2025

Agentic AI in Focus

Jul 16, 2025

Others also viewed

An ”Generated” Analysis of Abacus.AI: Privacy, Trustability, and Corporate Integrity

GDPR Article 5 Principles & Novel AI Use Cases

Can GenAI Be Trusted? Why Privacy-First AI is the Key to Responsible Innovation

Data Governance and Privacy in the AI Age: Building Trust Through Responsible Data Practices

Data Privacy vs. AI Innovation: India's Balancing Act

Balancing Innovation and Responsibility: Navigating AI Risks, GDPR, and Intellectual Property Concerns

Anticipating Data Privacy Vulnerabilities in AI Development

Balancing Privacy and Utility: Optimizing Synthetic Data Generation

Importance of Adding Data Governance to AI Implementations in Companies

PDPC’s Denise Wong on Building Trust in Singapore’s Data and AI Future

Explore topics