| Privacy Matters

EU: EDPB Opinion on AI Provides Important Guidance though Many Questions Remain

James Clark, Heidi Waem, John Magee and Rachel de Souza — Tue, 14 Jan 2025 13:53:05 +0000

A much-anticipated Opinion from the European Data Protection Board (EDPB) on AI models and data protection has not resulted in the clear or definitive guidance that businesses operating in the EU had hoped for. The Opinion emphasises the need for case-by-case assessments to determine GDPR applicability, highlighting the importance of accountability and record-keeping, while also flagging ‘legitimate interests’ as an appropriate legal basis under specific conditions. In rejecting the proposed Hamburg thesis, the EDPB has stated that AI models trained on personal data should be considered anonymous only if personal data cannot be extracted or regurgitated.

Introduction

On 17 December 2024, the EDPB published a much-anticipated Opinion on AI models and data protection. The Opinion includes the EDPB’s view on the following key questions: does the development and use of an AI model involve the processing of personal data; and if so, what is the correct legal basis for that processing?

As is sometimes the case with EDPB Opinions, which necessarily represent the consensus view of the supervisory authorities of 27 different Member States, the Opinion does not provide many clear or definitive answers. Instead, the EDPB offers indicative guidance and criteria, calling for case-by-case assessments of AI models to understand whether, and how, they are impacted by the GDPR. In this context, the Opinion repeatedly highlights the importance of accountability and record-keeping by businesses developing or using AI, so that the applicability of data protection laws, and the business’ compliance with those laws, can be properly assessed.

Whilst the equivocation of the Opinion might be viewed as unhelpful by European businesses looking for regulatory certainty, it is also a reflection of the complexities inherent in this intersection of law and technology.

In summary, the answers given by the EDPB to the four questions in the Opinion are as follows:

Can an AI model, which has been trained using personal data, be considered anonymous? Yes, but only in some cases. It must be impossible, using all means reasonably likely to be used, to obtain personal data from the model, either through attacks which aim to extract the original training data from the model itself, or through interactions with the AI model (i.e., personal data provided in responses to prompts / queries).
Is ‘legitimate interests’ an appropriate legal basis for the training and development of an AI model? In principle yes, but only where the processing of personal data is necessary to develop the AI model, and where the ‘balancing test’ can be resolved in favour of the controller. In particular, the issue of data minimisation, and the related issue of web-scraping / indiscriminate capture of data, will be relevant here.
Is ‘legitimate interests’ an appropriate legal basis for the deployment of an AI model? In principle yes, but only where the processing of personal data is necessary to deploy the AI model, and where the ‘balancing test’ can be resolved in favour of the controller. Here, the impact on the data subject of the use of the AI model is of predominant importance.
If an AI Model has been found to have been created, updated or developed using unlawfully processed personal data, how does this impact the subsequent use of that AI model? This depends in part on whether the AI model was first anonymised before being disclosed to the deployer of that model (see Question 1). Otherwise, the deployer of the model may need to assess the lawfulness of the development of the model as part of its accountability obligations.

Background

The Opinion was issued by the EDPB under Article 64 of the GDPR, in response to a request from the Irish Data Protection Commission. Article 64 requires the EDPB to publish an opinion on matters of ‘general application’ or which ‘produce effects in more than one Member State’.

In this case, the Irish DPC asked the EDPB to provide an opinion on the above-mentioned questions – a request that is not surprising given the general importance of AI models to businesses across the EU, but also in light of the large number of technology companies developing those models who have established their European operations in Ireland.

In order to understand the Opinion, it helps to be familiar with certain concepts and terminology relating to AI.

First, the Opinion distinguishes between an ‘AI system’ and an ‘AI model’. For the former, the EDPB relies on the definition given in the EU AI Act. In short: a machine-based system operating with some degree of autonomy that infers, from inputs, how to produce outputs such as predictions, content, recommendations, or decisions. An AI model, meanwhile, is a component part of an AI system. Colloquially, it is the ‘brain’ of the AI system – an algorithm, or series of algorithms (such as in the form of a neural network), that recognises patterns in data. AI models require the addition of further components, such as a user interface, to become AI systems. To take a common example – the generative AI system known as Chat GPT is a software application comprised of an AI model (the GPT Large Language Model) connected to a chatbot-style user interface that allows the user to submit queries (or ‘prompts’) to the model in the form of natural language questions. Whilst the Opinion is notionally concerned only with AI models, at times the Opinion appears to blur the distinction between the model and the system, in particular, when discussing the significance of model outputs that are only rendered comprehensible to the user through an interface that sits outside of the model.

Second, the Opinion relies on an understanding of a typical ‘AI lifecycle’, pursuant to which an AI model is first developed by training the model on large volumes of data. This training may happen in a number of phases which become increasingly refined (referred to as ‘fine-tuning’). Only after an AI model is developed can it be used, or ‘deployed’, in a live setting, as part of an AI system. Often, the developer of an AI model will not be the same person as the deployer. This is relevant because the Opinion variously addresses both development and deployment phases.

The significance of the ‘Hamburg thesis’

With respect to the key question of whether AI models can be considered anonymous, the Opinion follows in the wake of a much-discussed paper published in July 2024 by the data protection authority for the German state of Hamburg. The paper took the position that AI models (specifically, Large Language Models) are, in isolation, anonymous – they do not involve the processing of personal data.

In order to reach that conclusion, the paper decoupled the model itself from: (i) the prior training of the model (which may involve the collection and further processing of personal data as part of the training dataset); and (ii) the subsequent use of the model, whereby a prompt/input may contain personal data, and an output may be used in a way that means it constitutes personal data.

Looking only at the AI model itself, the paper decided that the tokens and values which make up the ‘inner workings’ of a typical AI model do not, in any meaningful way, relate to or correspond with information about identifiable individuals. Consequently, the model itself was found to be anonymous, even if the development and use of the model involves the processing of personal data.

The Hamburg thesis was welcomed for several reasons, not least because it resolved difficult questions such as how data subject rights could be understood in relation to an AI model (if someone asks for their personal data to be deleted, then what can this mean in the context of an AI model?), and the question of the lawful basis for ‘storing’ personal data in an AI model (as distinct from the lawful basis for collecting and preparing data to train the model).

However, as we go on to explain, the EDPB Opinion does not follow the relatively simple and certain framework presented by the Hamburg thesis. Instead, it introduces uncertainty by asserting that there are, in fact, scenarios where an AI model contains personal data, but that this must be determined on a case-by-case basis.

Are AI models anonymous?

First, the Opinion is only concerned with AI models that have been trained using personal data. Therefore, AI models trained using solely non-personal data (such as statistical data, or financial data relating to businesses) can, for the avoidance of doubt, be considered anonymous. However, in this context the broad scope of ‘personal data’ under the GDPR must be remembered, and the Opinion does not suggest any de minimis level of personal data that needs to be involved in the training of the AI model for the question of GDPR applicability to arise.

Where personal data is used in the training phase, the next question is whether the model is specifically designed to provide personal data regarding individuals whose personal data were used to train the model. If so, the AI model will not be anonymous. For example, an AI model that is trained to provide a user, on request, with biographical information and contact details for directors of public companies, or a generative AI model that is trained on the voice recordings of famous singers so that it can, in turn, mimic the voices of those singers. In each case, the model is trained on personal data of specific individuals, in order to be able to produce other personal data about those individuals as an output.

Finally, there is the intermediary case of AI models that are trained on personal data, but that are not designed to provide personal data related to the training data as an output. It is this use case that the Opinion focuses on. The conclusion is that AI models in this category may be anonymous, but only if the developer of the model can demonstrate that information about individuals whose personal data was used to train the model cannot be ‘obtained from’ the model, using all means reasonably likely to be used. Notwithstanding that personal data used for training the model no longer exists within the model in its original form (but rather it is “represented through mathematical objects“), that information is, in the eyes of the EDPB, still capable of constituting personal data.

The following question then arises: how does someone ‘obtain’ personal data from an AI model? In short, the Opinion posits two possibilities. First, that training data is ‘extracted’ via deliberate attacks. The Opinion refers to an evolving field of research in this area and makes reference to techniques such as ‘model inversion’, ‘reconstruction attacks’, and ‘attribute and membership inference’. These are techniques that can be deployed to trick the model into revealing training data, or otherwise reconstruct that training data, in some cases relying on privileged access to the model itself. Second, is the risk of accidental or inadvertent ‘regurgitation’ of personal data as part of an AI model’s outputs.

Consequently, a developer must be able to demonstrate that its AI model is resistant both to attacks that extract personal data directly from the model, as well as to the risk of regurgitation of personal data in response to queries: “In sum, the EDPB considers that, for an AI model to be considered anonymous, using reasonable means, both (i) the likelihood of direct (including probabilistic) extraction of personal data regarding individuals whose personal data were used to train the model; as well as (ii) the likelihood of obtaining, intentionally or not, such personal data from queries, should be insignificant for any data subject“.

Which criteria should be used to evaluate whether an AI model is anonymous?

Recognising the uncertainty in its conclusion that the AI models ‘may or may not‘ be anonymous, the EDPB provides a list of criteria that can be used to assess the likelihood of a model being found to contain personal data. These include:

Steps taken to avoid or limit the collection of personal data during the training phase.
Data minimisation or masking measures (e.g., pseudonymisation) applied to reduce the volume and sensitivity of personal data used during the training phase.
The use of methodologies during model development that reduce privacy risks (e.g., regularisation methods to improve model generalisation and reduce overfitting, and appropriate and effective privacy-preserving techniques, such as differential privacy).
Measures that reduce the likelihood of obtaining personal data from queries (e.g., ensuring the AI system blocks the presentation to the user of outputs that may contain personal data).
Document-based audits (internal or external) undertaken by the model developer that include an evaluation of the chosen measures and of their impact to limit the likelihood of identification.
Testing of the model to demonstrate its resilience to different forms of data extraction attacks.

What is the correct legal basis for AI models?

When using personal data to train an AI model, the preferred legal basis is normally the ‘legitimate interests’ of the controller, under Article 6(1)(f) GDPR. This is for practical reasons. Whilst, in some circumstances, it may be possible to obtain GDPR-compliant consent from individuals authorising the use of their data for AI training purposes, in most cases this will not be feasible.

Helpfully, the Opinion accepts that legitimate interests is, in principle, a viable legal basis for processing personal data to train an AI model. Further, the Opinion also suggests that it should be straightforward for businesses to identify a lawful legitimate interest. For example, the Opinion cites “developing an AI system to detect fraudulent content or behaviour” as a sufficiently precise and real interest.

However, where businesses may have more difficulty is in showing that the processing of personal data is necessary to realise their legitimate interest, and that their legitimate interest is not outweighed by any impact on the rights and freedoms of data subjects (the ‘balancing test’). Whilst this is fundamentally just a restatement of existing legal principles, the following sentence should nevertheless cause some concern for businesses developing AI models, in particular Large Language Models: “If the pursuit of the purpose is also possible through an AI model that does not entail processing of personal data, then processing personal data should be considered as not necessary“. Technically speaking, it may often be the case that personal data is not essential for the training of an AI model – however, this does not mean that it is straightforward to systematically remove all personal data from a training dataset, or otherwise replace all identifying elements with ‘dummy’ values.

With respect to the balancing test, the EDPB asks businesses to consider a data subject’s interest in self-determination and in maintaining control over their own data when considering whether it is lawful to collect personal data for model training purposes. In particular, it may be more difficult to satisfy the balancing test if a developer is scraping large volumes of personal data (especially including any sensitive data categories) against their wishes, without their knowledge, or otherwise in contexts that would not be reasonably expected by the data subject.

When it comes to the separate purpose of deploying an AI model, the EDPB asks businesses to consider the impact on the data subject’s fundamental rights that arise from the purpose for which the AI model is used. For example, AI models that are used to block content publication may adversely affect a data subject’s fundamental right to freedom of expression. However, conversely the EDPB recognises that the deployment of AI models may have a positive impact on a data subject’s rights and freedoms – for example, an AI model that is used to improve accessibility to certain services for people with disabilities). In line with Recital 47 GDPR, the EDPB reminds controllers to consider the ‘reasonable expectations’ of data subjects in relation to both training and deployment uses of personal data.

Finally, the Opinion discusses a range of ‘mitigating measures’ that may be used to reduce risks to data subjects and therefore tip the balancing test in favour of the controller. These include:

Technical measures to reduce the volume or sensitivity of personal data at use (e.g., pseudonymisation, masking).
Measures to facilitate the exercise of data subject rights (e.g., providing an unconditional right for data subjects to opt-out of the use of their personal data for training or deploying the model; allowing a reasonable period of time to elapse between collection of training data and its use).
Transparency measures (e.g., public communications about the controller’s practices in connection with the use of personal data for AI model development).
Measures specific to web-scraping (e.g., excluding publications that present particular risks; excluding certain data categories or sources; excluding websites that clearly object to web scraping).

Notably, the EDPB observes that, to be effective, these mitigating measures must go beyond mere compliance with GDPR obligations (for example, providing a GDPR compliant privacy notice, which a controller would in any case be required to do, would not be an effective transparency measure for these purposes).

When are companies liable to non-compliant AI models?

In its final question, the DPC sought clarification from the EDPB on how a deployer of an AI model might be impacted by any unlawful processing of personal data in the development phase of the AI model.

According to the EDPB, such ‘upstream’ unlawful processing may impact a subsequent deployer of an AI model in the following ways:

Corrective measures taken against the developer may have a knock-on effect on the deployer – for example, if the developer is ordered to delete personal data unlawfully collected for training purposes, the developer would not be allowed to subsequently process this data. However, this raises an important practical question about how such data could be identified in, and deleted from, the AI model, taking into account the fact that the model does not retain training data in its original form.
Unlawful processing in the development phase may impact the legal basis for the deployment of the model – in particular, if the deployer of the AI model is relying on ‘legitimate interests’, it will be more difficult to satisfy the balancing test in light of the deficiencies associated with the collection and use of the training data.

In light of these risks, the EDPB recommends that deployers take reasonable steps to assess the developer’s compliance with data protection laws during the training phase. For example, can the developer explain the sources of data used, steps taken to comply with the minimisation principle, and any legitimate interest assessments conducted for the training phase? For certain AI models, the transparency obligations imposed in relation to AI systems under the AI Act should assist a deployer in obtaining this information from a third party AI model developer. While the opinion provides a useful framework for assessing GDPR issues with AI systems, businesses operating in the EU may be frustrated with the lack of certainty or definitive guidance on many key questions relating to this new era of technology innovation.

Ireland: Increased regulatory convergence of AI and data protection: X suspends training of AI chatbot with EU user data after Irish regulator issues High Court proceedings

Mon, 19 Aug 2024 12:23:43 +0000

The Irish Data Protection Commission (DPC) has welcomed X’s agreement to suspend its processing of certain personal data for the purpose of training its AI chatbot tool, Grok. This comes after the DPC issued suspension proceedings against X in the Irish High Court. The DPC described this as the first time that any Lead Supervisory Authority had taken such an action, and the first time that it had utilised these particular powers.

Section 134 of the Data Protection Act 2018 allows the DPC, where it considers there is an urgent need to act to protect the rights and freedoms of data subjects, to make an application to the High Court for an order requiring a data controller to suspend, restrict, or prohibit the processing of personal data.

The High Court proceedings were issued on foot of a complaint to the DPC raised by consumer rights organisations Euroconsumers, and Altroconsumo on behalf of data subjects in the EU/EEA. The complainants argued that the Grok chatbot was being trained with user data in a manner that did not sufficiently explain the purposes of data processing, and that more data than necessary was being collected. They further argued that X may have been handling sensitive data without sufficient reasons for doing so.

Much of the complaint stemmed from X’s initial approach of having data sharing automatically turned on for users in the EU/EEA, which it later mitigated by adding an opt-out setting. X claimed that it had relied on the lawful basis of legitimate interest under the GDPR, but the complainants argued that X’s privacy policy – dating back to September 2023 – was insufficiently clear as to how this applied to the processing of user data for the purposes of training AI models such as Grok.

This development follows a similar chain of events involving Meta in June. Complaints from privacy advocacy organisation NOYB were made against Meta’s reliance on ‘legitimate interest’ in relation to the use of data to train AI models. This led to engagement with the DPC and the eventual decision in June by Meta to pause relevant processing (without the need for the authority to invoke s134).

The DPC and other European supervisory authorities strive to emphasise the principles of lawfulness, fairness and transparency at the heart of the GDPR, and their actions illustrate that any activities which purport to threaten these values will be dealt with directly.

The DPC has previously taken the approach of making informal requests and has stated that the exercise of its powers in this case comes after extensive engagement with X on its model training. The High Court proceedings highlight the DPC’s willingness to escalate action where there remains a perceived risk to data subjects.

The DPC has, in parallel, stated that it intends to refer the matter to the EDPB although there has been no confirmation of such referral as of this date.

Such referral will presumably form part of a thematic examination of AI processing by data controllers. The topic is also the subject of debate from individual DPAs, as evidenced by the Discussion Paper on Large Language Models and Personal Data recently published by the Hamburg DPA.

The fact much of the high profile activity relating to regulation of AI is coming from the data protection sphere will no doubt bolster the EDPB’s recommendation in a statement last month that Data Protection Authorities (DPAs) are best placed to regulate high risk AI.

It is expected that regulatory scrutiny and activity will only escalate and accelerate in tandem with the increase in integration of powerful AI models into existing services by ‘big tech’ players to enrich data. This is particularly the case where it is perceived that data sets are being re-purposed and further processing is taking place. In such circumstances, it is essential that an appropriate legal basis is being relied upon – noting the significant issues that can arise if there is an over-reliance on legitimate interest. The DPC and other regulators are likely to investigate, engage and ultimately intervene where it believes that data subjects’ rights under the GDPR are threatened. Perhaps in anticipation of more cross-border enforcement activity, last month, the European Commission proposed a new law to streamline cooperation between DPAs when enforcing the GDPR in such cases.

A fundamental lesson from these developments is that, in the new AI paradigm, ensuring there is a suitable legal basis for any type of processing and the principles of fairness and transparency are complied with should be an absolute priority.

Europe: The EU AI Act’s relationship with data protection law: key takeaways

James Clark, Muhammed Demircan and Kalyna Kettas — Thu, 25 Apr 2024 12:54:20 +0000

Disclaimer: The blogpost below is based on a previously published Thomson Reuters Practical Law practice note (EU AI Act: data protection aspects (EU)) and only presents a short overview of and key takeaways from this practice note. This blogpost has been produced with the permission of Thomson Reuters, who has the copyright over the full version of the practice note. Interested readers may access the full version practice note through this link (paywall).

On 13 March 2024, the European Parliament plenary session formally adopted at first reading the EU AI Act. The EU AI Act is now expected to be formally adopted in a few weeks’ time. Following publication in the Official Journal of the European Union, it will enter into force 20 days later.

Artificial intelligence (“AI”) systems rely on data inputs from initial development, through the training phase, and in live use. Given the broad definition of personal data under European data protection laws, AI systems’ development and use will frequently result in the processing of personal data.

At its heart, the EU AI Act is a product safety law that provides for the safe technical development and use of AI systems. With a couple of exceptions, it does not create any rights for individuals. By contrast, the GDPR is a fundamental rights law that gives individuals a wide range of rights in relation to the processing of their data. As such, the EU AI Act and the GDPR are designed to work hand-in-glove, with the latter ‘filing the gap’ in terms of individual rights for scenarios where AI systems use data relating to living persons.

Consequently, as AI becomes a regulated technology through the EU AI Act, practitioners and organisations must understand the close relationship between the EU data protection law and the EU AI Act.

1. EU data protection law and AI systems

1.1 The GDPR and AI systems

The General Data Protection Regulation (“GDPR”) is a technology-neutral regulation. As the definition of “processing” under the GDPR is broad (and in practice includes nearly all activities conducted on personal data, including data storage), it is evident that the GDPR applies to AI systems, to the extent that personal data is present somewhere in the lifecycle of an AI system.
It is often technically very difficult to separate personal data from non-personal data, which increases the likelihood that AI systems process personal data at some point within their lifecycle.
While AI is not explicitly mentioned in the GDPR, the automated decision-making framework (article 22 GDPR) serves as a form of indirect control over the use of AI systems, on the basis that AI systems are frequently used to take automated decisions that impact individuals.
In some respects, there is tension between the GDPR and AI. AI typically entails the collection of vast amounts of data (in particular, in the training phase), while many AI systems have a broad potential range of applications (reflecting the imitation of human-like intelligence), making the clear definition of “processing purposes” difficult.
At the same time, there is a clear overlap between many of the data protection principles and the principles and requirements established by the EU AI Act for the safe development and use of AI systems. The relationship between AI and data protection is expressly recognised in the text of the EU AI Act, which states that it is without prejudice to the GDPR. In developing the EU AI Act, the European Commission relied in part on article 16 of the Treaty on the Functioning of the European Union (“TFEU”), which mandates the EU to lay down the rules relating to the protection of individuals regarding the processing of personal data.

1.2 Data protection authorities’ enforcement against AI systems

Before the EU AI Act, the EU data protection authorities (“DPA”) were among the first regulatory bodies to take enforcement action against the use of AI systems. These enforcement actions have been based on a range of concerns, in particular, lack of legal basis to process personal data or special categories of personal data, lack of transparency, automated decision-making abuses, failure to fulfil data subject rights and data accuracy issues.
Examples of DPA enforcement actions are already lengthy. The most notable ones include the Italian DPA’s temporary ban decision on OpenAI’s ChatGPT, the Italian DPA’s Deliveroo fine in relation to the company’s AI-enabled automated rating of rider performance, the French DPA’s Clearview AI fine, a facial recognition platform that scrapes billions of photographs from the internet and the Dutch DPA’s fine on the Dutch Tax and Customs Administration for various GDPR infringements in relation to an AI-based fraud notification facility application.
As the DPAs shape their enforcement policies based in part on public concerns, and as public awareness of and interest in AI continues to rise, it is likely that DPAs will continue to sharpen their focus on AI (also see section 6 for DPAs as a potential enforcer of the EU AI Act).

2. Scope and applicability of the GDPR and EU AI Act

2.1 Scope of the GDPR and the EU AI Act

The material scope of the GDPR is the processing of personal data by wholly or partly automated means, or manual processing of personal data where that data forms part of a relevant filing system (article 2 GDPR). The territorial scope of the GDPR is defined in article 3 GDPR and covers different scenarios.
Consequently, the GDPR has an extraterritorial scope, meaning that: Controllers and processors established in the EU processing in the context of that establishment must comply with the GDPR even if the processing of personal data occurs in a third country. Non-EU controllers and processors have to comply with the GDPR if they target or monitor individuals in the EU.
On the other hand, the material scope of the EU AI Act is based around its definition of an AI system. Territorially, the EU AI Act applies to providers, deployers, importers, distributors, and authorised representatives (see, section 2.2 for details).
Unlike the GDPR, the EU AI Act has a robust risk categorisation, and it brings different obligations to the different AI risk categories. Most obligations under the EU AI Act apply to high-risk AI systems only (covered in article 6 and Annex III EU AI Act). Various AI systems are also subject to specific obligations (such as general-purpose AI models) and transparency obligations (such as emotional categorisation systems).

2.2 Interplay between roles under the GDPR and the EU AI Act

As the GDPR distinguishes between controllers and processors, so the EU AI Act distinguishes between different categories of regulated operators.
The provider (the operator who develops an AI system or has an AI system developed) and the deployer (the operator under whose authority an AI system is used) are the most significant in practice.
Organisations that process personal data in the course of developing or using an AI system will need to consider the roles they play under both the GDPR and the EU AI Act. Some examples follow.

Example 1: provider (the EU AI Act) and controller (the GDPR)

Example 2: deployer (EU AI Act) and controller (the GDPR)

A company (A) that processes personal data in the context of training a new AI system will be acting as both a provider under the EU AI Act and as a controller under the GDPR. This is because the company is developing a new AI system and, as part of that development, is taking decisions about how to process personal data for the purpose of training the AI system.

A company (B) that purchases the AI system described in Example 1: provider (EU AI Act) and controller (the GDPR) from company A and uses it in a way that involves the processing of personal data (for example, as a chatbot to talk to customers, or as an automated recruitment tool) will be acting as both a deployer under the EU AI Act and as a separate controller under the GDPR for the processing of its own personal data (that is, it is not the controller for the personal data used to originally train the AI system but it is for any data it uses in conjunction with the AI).

More complex scenarios may arise when companies offer services that involve the processing of personal data and the use of an AI system to process that data. Depending on the facts, the customers of such services may qualify as controllers or processors (under the GDPR) although they would typically be deployers under the EU AI Act.
These examples raise important questions about the relationship between the nature of roles under the EU AI Act and their relationship to roles under the GDPR which are still to be resolved in practice. Companies that develop or deploy AI systems should carefully analyse their roles under the respective laws, preferably prior to the kick-off of relevant development and deployment projects.

3. Relationship between the GDPR principles and the EU AI Act

The GDPR is built around the data protection principles set out in article 5 GDPR. These principles are lawfulness, fairness, transparency, purpose limitation, data minimisation, accuracy, storage limitation, integrity and confidentiality.
On the other hand, the first intergovernmental standard on AI, the recommendation on artificial intelligence issued by the OECD (OECD Recommendation of the Council on Artificial Intelligence, “OECD AI Principles”) introduces five complementary principles for responsible stewardship of trustworthy AI that have strong links to the principles in the GDPR: Inclusive growth, sustainability and well-being, human centred-values, fairness, transparency, explainability, robustness, security, safety and accountability.
The EU AI Act also refers to general principles applicable to all AI systems, as well as specific obligations that require the principles to be put in place in certain methods. The EU AI Act principles are set out in recital 27 and are influenced by the OECD AI Principles and the seven ethical principles for AI developed by the independent High-Level Expert Group on AI (HLEG). Although recitals do not have the same legally binding status as the operative provisions which follow hem and cannot overrule an operative provision, they can help with interpretation and to determine meaning.
Recital 27 EU AI Act refers to the following principles: Human agency and oversight, technical robustness and safety, privacy and data governance, transparency, diversity, non-discrimination, fairness, social and environmental wellbeing. Some of these principles already materialise through specific EU AI Act obligations: Article 10 EU AI Act prescribes data governance practices for high-risk AI systems, article 13 EU AI Act deals with transparency, articles 14 and 26 EU AI Act introduce human oversight and monitoring requirements, article 27 EU AI Act introduces the obligation to conduct fundamental rights impact assessments for some high-risk AI systems.
Understanding the synergies and differences between the GDPR principles and the EU AI Act principles will allow organisations to leverage their existing knowledge of GDPR and their existing GDPR compliance programmes. This is therefore a crucial step to lower compliance costs. The full practice note includes comprehensive tables that compare the practicalities in this regard.

4. Human oversight under the EU AI Act and automated decision-making under the GDPR

Under article 22 GDPR, data subjects have the right not to be subject to solely automated decisions involving the processing of personal data that result in legal or similarly significant effects. Where such decisions are taken, they must be based on one of the grounds set out in article 22(2) GDPR.
Like the GDPR, the EU AI Act is also concerned with ensuring that fundamental rights and freedoms are protected by allowing for appropriate human supervision and intervention (the so called “human-in-the-loop” effect).
Article 14 EU AI Act requires high-risk AI system to be designed and developed in such a way (including with appropriate human-machine interface tools) that they can be effectively overseen by natural persons during the period in which the AI system is in use. In other words, providers must take a “human-oversight-by-design” approach to developing AI systems.
According to article 26.1 EU AI Act, the deployer of an AI system must take appropriate technical and organisational measures to ensure its use of an AI system is in accordance with the instructions of use accompanying the system, including with respect to human oversight.
The level of human oversight and intervention exercised by a user of an AI system may be determinative in bringing the system in or out of scope of the automated decision-making framework under the GDPR. In other words, a meaningful intervention by a human being at a key stage of the AI system’s decision-making process may be sufficient to ensure that the decision is no longer wholly automated for the purposes of article 22 GDPR. Perhaps more likely, AI systems will be used to make wholly automated decisions, but effective human oversight will operate as a safeguard to ensure that the automated decision-making process is fair and that an individual’s rights, including their data protection rights, are upheld.

5. Conformity assessments and fundamental rights impact assessments under the EU AI Act and the DPIAs under the GDPR

Under the EU AI Act, the conformity assessment is designed to ensure accountability by the provider with each of the EU AI Act’s requirements for the safe development of a high-risk AI system (as set out in Title III, Chapter 2 EU AI Act). Conformity assessments are not risk assessments but rather demonstrative tools that show compliance with the EU AI Act’s requirements.
The DPIA, on the other hand, is a mandatory step required under the GDPR for high-risk personal data processing activities.
Consequently, there are significant differences in terms of both purpose and form between a conformity assessment and a DPIA. However, in the context of high-risk AI systems, the provider of such systems may also need to conduct a DPIA relation to the use of personal data in the development and training of the system. In such case, the technical documentation that are drafted for conformity assessments may help establishing the factual context of a DPIA. Similarly, the technical information may be helpful to a deployer of the AI system that is required to conduct a DPIA in relation to its use of the system.
The requirement under the EU AI Act to conduct a fundamental rights impact assessment (“FRIA”) is similar, conceptually, to a DPIA. As with a DPIA, the purpose of a FRIA is to identify and mitigate risks to the fundamental rights of natural persons, in this case arising from the deployment of an AI system. For more details regarding the FRIA, see Fundamental Rights Impact Assessments under the EU AI Act: Who, what and how?.
Practically speaking, organisations generally already have governance mechanisms in place to bring legal, IT and business professionals together for impact assessments such as the DPIA. When it comes to a FRIA, such mechanisms can be leveraged. As with a DPIA, the first step is likely to consist of a pre-FRIA screening to identify the use of an in-scope high-risk AI system (recognising that, as a good practice step, organisations may choose to conduct FRIAs for a wider range of AI systems than is strictly required by the EU AI Act).

6. National competent authorities under EU AI Act and DPAs

Under the EU AI Act, each member state is required to designate one or more national competent authorities to supervise the application and implementation of the EU AI Act, as well as to carry out market surveillance activities.
The national competent authorities will be supported by the European Artificial Intelligence Board and the European AI Office. The most notable duty of the European AI Office is to enforce and supervise the new rules for general purpose AI models.
The appointment of the DPAs as enforcers of the EU AI Act will solidify the close relationship between the EU GDPR and the EU AI Act.

Clearview AI -v- Information Commissioner

James Clark and David Cook — Mon, 23 Oct 2023 08:29:07 +0000

Summary

A UK court has reversed a fine imposed on the provider of a facial image database service, Clearview AI, on the basis that the (UK) GDPR did not apply to the processing of personal data by the company. In so doing, the court has provided helpful judicial interpretation of both the territorial and material scope of UK data protection law.

The key takeaways were:

A controller or processor may be caught by the extra-territorial scope of the UK GDPR on the basis that its processing activities relate to the monitoring of the behaviour of data subjects in the UK, even where that entity is not itself monitoring data subjects, but where its activities enable its customers to conduct such monitoring.
A reminder that processing activities that are carried out for, or connected to, law enforcement purposes – for example, where a company provides its services solely to law enforcement agencies – will fall outside of the scope of the UK GDPR. If those law enforcement agencies are in the UK, then the processing will instead be subject to the parallel law enforcement processing regime under the Data Protection Act. However, if the law enforcement agencies are outside of the UK (as Clearview AI’s customers were) then UK data protection law will not engage.

Background

On 18 May 2022, the Information Commissioner’s Office brought twin-track enforcement action against Clearview AI in the form of: (1) an Enforcement Notice; and (2) a Monetary Penalty Notice (i.e., a fine) in the amount of GBP 7.5 million.

The ICO had concluded that Clearview AI:

was a controller of personal data under the GDPR as applied in the EU, and under the UK GDPR and Data Protection Act 2018 with respect to the UK data protection framework; and
was or had been processing personal data of UK residents within the scope of the GDPR (in respect of processing taking place up to the end of the Brexit transition period of 23:00 on 31 December 2020) and the UK GDPR (in respect of subsequent processing).

The ICO concluded that Clearview AI had infringed whole rafts of the UK GDPR and GDPR in respect of the requirements:

represented by the core data protection principles under Article 5;
as to lawfulness of processing under Article 6;
around the processing of special category data under Article 9;
represented by the transparency obligations under Article 14;
represented by the data subject rights to subject access (under Article 15), to rectification (under Article 16), to erasure (under Article 17), and the right to object (under Article 21);
as to automated decision making under Article 22; and
to undertake a data protection impact assessment under Article 35.

Preliminary Issue

Clearview AI appealed the enforcement notice and the monetary penalty to the First-tier Tribunal (Information Rights). The matters before the Tribunal did not relate to whether Clearview AI had infringed the GDPR or UK GDPR. The issue under consideration was solely relating to the jurisdictional challenge brought by Clearview AI.

It primarily considered three questions as to:

whether as a matter of law, Article 3(2)(b) can apply where the monitoring of behaviour is carried out by a third party rather than the data controller;
whether as a matter of fact, processing of data by Clearview AI was related to monitoring by either Clearview AI itself or by its customers; or
whether the processing by Clearview AI was beyond the material scope of the GDPR by operation of Article 2(2)(a) GDPR and/or was not relevant processing for the purposes of Article 3 of the UK GDPR thereby removing the processing from the material scope of the UK GDPR.

Clearview AI argued that the data processing undertaken by it in the context of the services was outside the territorial scope of the GDPR and UK GDPR, with the consequence that the ICO had no jurisdiction to issue the notices.

Clearview AI services

Clearview AI provides law enforcement agencies with access to a database of facial images scraped from public sources. It uses biometric processing to match a customer’s image with an image and identity in its database. In detail, its services were broadly achieved through the following phases of activity:

Activity 1

It copied and scraped facial images in photographs that it found across the public internet and stored them with a series of connected databases and sub-databases, linked to the source of the image, the date and time it was made, and information around associated social media profiles.
That material was then used for the creation of a set of vectors for each facial image, using the Clearview AI machine learning facial recognition algorithm.
The facial vectors were then stored in a neural-network database, clustered together according to closeness in facial similarities.

Activity 2

If a customer wished to use the service, they would upload the facial image being searched for to the Clearview AI system. Vectors would be created of the facial features of that image, which would then be compared to the facial vectors of the stored images on the database using the Clearview AI machine learning facial recognition algorithm. Up to 120 matching facial images would be returned, along with an assessment of the degree of similarity. The service was found to achieve over 99% accuracy.

Further Activity

The returned images would allow a customer to then view additional, non-Clearview AI derived information, (such as by visiting the source page where the image was scraped from) as to:
- the person’s name;
- the person’s relationship status, whether they have a partner and who that may be;
- whether the person is a parent;
- the person’s associates;
- the place the photo was taken;
- where the person is based/lives/is currently located;
- what social media is used by the person;
- whether the person smokes/drinks alcohol;
- the person’s occupation or pastime(s);
- whether the person can drive a car;
- what the person is carrying/doing and whether that is legal; and/or
- whether the person has been arrested.

The Tribunal considered that it was reasonably likely that the database would contain the images of UK residents and/or images taken within the UK of persons resident elsewhere. It was therefore found that the Clearview AI service could have an impact on UK residents, irrespective of whether it was used by UK customers.

The Clearview AI service was used by customers for commercial purposes prior to 2020 and is not currently used by customers in the UK or in the EU at all. Its customers are in the United States, as well as other countries globally (including Panama, Brazil, Mexico, and the Dominican Republic). It was acknowledged that investigators in one country may be interested in behaviour happening in another country, given that criminal activity is not limited by national boundaries.

Clearview AI had offered its service on a trial basis to law enforcement and government organisations within the UK between June 2019 and March 2020. An overseas law enforcement agency could use the service as part of an investigation into the alleged criminal activity of a UK resident.

Conclusions of the Tribunal

The Tribunal considered that Clearview AI was the sole controller responsible for Activity 1 (as described above) and that Clearview AI was joint controller with its customers for Activity 2. The Further Activity was then processing for which Clearview AI was not a controller at all.

The ICO submitted that the Clearview AI service was being used to monitor the behaviour of the data subjects. The Tribunal concluded that Clearview AI did not monitor behaviour itself but that its customers used the service to monitor the behaviour of data subjects. Consequently, for the purposes of Article 3(2)(b), Clearview AI’s services were related to the monitoring of the behaviour of data subjects. Clearview AI’s status as a joint controller with its customer for the purposes of Activity 2 may have been a significant factor in establishing a sufficiently close nexus between Clearview AI, as the ‘service provider’, and its customer, as the entity actually conducting the behavioural monitoring.

However, whilst the processing activities may in theory have been within the territorial scope of the (UK) GDPR, what was decisive was that they fell outside of its material scope. The Tribunal accepted that Clearview AI offered its services exclusively to non-UK/EU law enforcement and national security agencies, and their contractors, in support of the discharge of their respective criminal law enforcement and national security functions. Such activities fall outside of the scope of the GDPR and the UK GDPR. Whilst the UK has data protection regimes under the Data Protection Act 2018 that apply to both law enforcement and intelligence agencies, those regimes only bind processing activities relating to UK law enforcement or intelligence agencies.

We’re now seamlessly global. Here’s what to expect.

Juliet McNulty — Tue, 12 Sep 2023 21:29:52 +0000

Dear subscriber,

Thank you for subscribing and being a part of DLA Piper’s Data Protection, Privacy and Cybersecurity community. We appreciate your continued engagement with our insights and the evolving nature of the landscape.

Our goal for this blog is to help you navigate all aspects of data protection, privacy, and cybersecurity laws, while considering the ever-expanding geographic footprint of businesses. Here at DLA Piper, we understand how compliance across jurisdictions makes it even harder to solve today’s most pressing privacy problems and prepare for future cybersecurity threats.

We’re excited to announce that Privacy Matters will be bringing you more on privacy and cybersecurity issues at home and abroad.

Find quickly shared updates on relevant topics on global data protection, privacy, and cybersecurity issues
Get perspectives from leading DLA Piper professionals from around the world
Easily navigate our improved, user-friendly layout

With this change, you’ll need to add privacymatters@comms.com to your contacts to ensure blog updates make it to your inbox.

Thanks for reading,

The Data Privacy team at DLA Piper

A Pro-Innovation Approach: UK Government publishes white paper on the future of governance and regulation of artificial intelligence

Magda Zmorka — Fri, 31 Mar 2023 10:02:29 +0000

Authors: James Clark, Coran Darling, Andrew Dyson, Gareth Stokes, Imran Syed & Rachel de Souza

In November 2021, the UK Government (“Government”) issued the National Artificial Intelligence (AI) Strategy, with the ambition of making the UK a global AI superpower over the next decade. The strategy promised a thriving ecosystem, supported by Government policy that would look at establishing an effective regulatory framework; a new governmental department focussed on AI and other innovative technologies; and collaboration with national regulators.

On 29 March 2023 the Government published the long-awaited white paper (“Paper”) setting out how the UK anticipates it will achieve the first, and most important, of these goals – the creation of a blueprint for future governance and regulation of AI in the UK. The Paper is open for consultation until 21 June 2023.

The Paper, headed “A pro-innovation approach”, recognises the importance of building a framework that engenders trust and confidence in responsible use of AI (noting the key risks to health, security, privacy, and more, that can arise through an unregulated approach), but cautions against ‘overbearing’ regulation which may adversely impact innovation and investment.

This theme runs throughout the Paper and expands into recommendations that support a relatively light touch, and arguably a more organic regulatory approach, than we have seen in other jurisdictions. This is most notably the case when compared to the approach of the EU, where the focus has been on development of a harmonizing AI-specific law and supporting AI-specific regulatory regime.

The Paper contends that effective AI regulation can be constructed without the need for new cross-sectoral legislation. Instead, the UK is aiming to establish “a deliberately agile and iterative approach” that avoids the risk of “rigid and onerous legislative requirements on businesses”. This ambition should be largely achieved by co-opting regulators in regulated sectors to effectively take direct responsibility for the establishment, promotion, and oversight of responsible AI in their respective regulated domains. This would then be supported by the development of non-binding assurance schemes and technical standards.

Core Principles

This approach may be different in execution from the proposals we are seeing come out of Europe with the AI Act. If we look beneath the surface, however, we find the Paper committing the UK to core principles for responsible AI which are consistent across both regimes:

Safety, security, and robustness: AI should function in a secure, safe, and robust manner, where risks can be suitably monitored and mitigated;
Appropriate transparency and explainability: organisations developing and deploying AI should be able to communicate the method in which it is used and be able to adequately explain an AI system’s decision-making process;
Fairness: AI should be used in ways that comply with existing regulation and must not discriminate against individuals or create unfair commercial outcomes;
Accountability and governance: appropriate measures should be taken to ensure there is appropriate oversight of AI systems and there are adequate measures to follow accountability; and
Contestability and redress: there must be clear routes to dispute harmful outcomes or decisions generated by AI.

The Government intends to use the principles as a universal guardrail to guide the development and use of AI by companies in the UK. This approach that aligns with international thinking that can be traced back to the OECD AI Principles (2019), the Council of Europe’s 2021 paper on a legal framework for artificial intelligence, and recent Blueprint for an AI Bill of Rights proposed by the White House’s Office of Science and Technology Policy.

Regulator Led Approach

The UK does not intend to codify these core principles into law, at least for the time being. Rather, the UK intends to lean on the supervisory and enforcement powers of existing regulatory bodies, charging them with ensuring that the core principles are followed by organisations for whom they have regulatory responsibility.

Regulatory bodies, rather than lawmakers or any ‘super-regulator’, will therefore be left to determine how best to promote compliance in practice. This means, for example, that the FCA will be left to regulate AI across financial services; the MHRA to consider what is appropriate in the field of medicines and medical devices; and the SRA for legal service professionals. This approach is already beginning to play out in some areas. For example, in October 2022, the Bank of England and FCA jointly released a Discussion Paper on Artificial Intelligence and Machine Learning (DP5/22), which is intended to progress the debate on how regulation and policy should play a role in use of AI in financial services.

To enable this to work, the Paper contemplates a new statutory duty on regulators which requires them to have due regard to the principles in the performance of their tasks. Many of these duties already exist in other areas, such as the so-called ‘growth duty’ that came into effect in 2017 which requires regulators to have regard to the desirability of promoting economic growth. Regulators will be required by law to ensure that their guidance, supervision, and enforcement of existing sectoral laws takes account of the core principles for responsible AI. Precisely what that means in practice remains to be seen.

Coordination Layer

The Paper recognises that there are risks with a de-centralised framework. For example, regulators may establish conflicting requirements, or fail to address risks that fall between gaps.

To address this, the Paper announces the Government’s intention to create a ‘coordination layer’ that will cut across sectors of the economy and allow for central coordination on key issues of AI regulation. The coordination layer will consist of several support functions, provided from within Government, including:

assessment of the effectiveness of the de-centralised regulatory framework – including a commitment to remain responsive and adapt the framework if necessary;
central monitoring of AI risks arising in the UK;
public education and awareness-raising around AI; and
testbeds and sandbox initiatives for the development of new AI-based technologies.

The Paper also recognises the likely importance of technical standards as a way of providing consistent, cross-sectoral assurance that AI has been developed responsibly and safely. To this end, the Government will continue to invest in the AI Standards Hub, formed in 2022, whose role is to lead the UK’s contribution to the development of international standards for the development of AI systems.

This standards-based approach may prove particularly useful for those deploying AI in multiple jurisdictions and has already been recognised within the EU AIA, which anticipates compliance being established by reference to common technical standards published by recognised standards bodies. It seems likely that over time this route (use of commonly recognised technical standards) will become the de facto default route to securing practical compliance to the emerging regulatory regimes. This would certainly help address the concerns many will have about the challenge of meeting competing regulatory regimes across national boundaries.

International comparisons

EU Artificial Intelligence Act

The proposed UK framework will inevitably attract comparisons with the different approach taken by the EU AIA. Where the UK intends to take a sector-by-sector approach to regulating AI, the EU has opted for a horizontal cross-sector regulation-led approach. Further, the EU clearly intends exactly the same single set of rules to apply EU-wide. The EU AIA is framed as a directly-effective Regulation whereby the EU AIA applies directly as law across the bloc, rather than the ‘EU Directive’ method, which would require Member States to develop domestic legislation to comply with the adopted framework.

The EU and UK approaches each have potential benefits. The EU’s single horizontal approach of regulation across the bloc ensures that organisations engaging in regulated AI activities will, for the most part, only be required to understand and comply with the AI Act’s single framework and apply a common standard based on the use to which AI is being put, regardless of sector.

The UK’s approach provides a less certain legislative framework, as companies may find that they are regulated differently in different sectors. While this should be mitigated through the ‘coordination layer’, it will likely lead to questions about exactly what rules apply when, and the risk of conflicting areas of regulatory guidance. This additional complexity will no doubt be a potential detractor for the UK, but if adopted effectively the benefits of having a regime that is agile to evolving needs and technologies, could trump the EU with its more codified approach. In theory, it should be much easier for the UK to implement changes via regulatory standards, guidance, or findings than it would be for the EU to push amendments through a relatively static legislative process.

US Approach

There are clear parallels between the UK and the likely direction of travel in the US, where a sector-by-sector approach to the regulation of AI is the preferred choice. In October 2022, the White House Office of Science and Technology Policy published a Blueprint for an AI Bill of Rights (“Blueprint”). Much like the Paper, the Blueprint sets out an initial framework for how US authorities, technology companies, and the public can work to ensure AI is implemented in a safe and accountable manner. The US anticipate setting out principles that will be used to help guide organisations to manage and (self-) regulate the use of AI, but without the level of directional control that the UK anticipate passing down to sector specific regulators. Essentially the US position will be to avoid direct intervention into state or federal level regulations which will be left to others to decide. It remains to be seen how the concepts framed in the Blueprint might eventually translate into powers for US regulators.

A Push for Global Interoperability

While the Government seeks to capitalise upon the UK’s strategic position as third in the world for number of domestic AI companies, it also recognises the importance of collaboration with international partners. Focus moving forward will therefore be directed to supporting global opportunities while protecting the public against cross-border risks. The Government intends to promote interoperability between the UK approach and differing standards and approaches across jurisdictions. This will ensure that the UK’s regulatory framework encourages the development of a compatible system of global AI governance that will allow organisations to pursue ventures across jurisdictions, rather than being isolated by jurisdiction-specific regulations. The approach is expected to leverage existing proven and agreed upon assurance techniques and international standards play a key role in the wider regulatory ecosystem. Doing so is therefore expected to support cross-border trade by setting out internationally accepted ‘best practices’ that can be recognised by external trading partners and regulators.

Next steps

The Government acknowledges that AI continues to develop at pace, and new risk and opportunities continue to emerge. To continue to strengthen the UK’s position as a leader in AI, the Government is already working in collaboration with regulators to implement the Paper’s principles and framework. It anticipates that it will continue to scale up these activities at speed in the coming months.

In addition to allowing for responses to their consultation (until 21 June 2023), the Government has staggered its next steps into three phases: i) within the first 6 months from publication of the Paper; ii) 6 to 12 months from publication; and iii) beyond 12 months from publication.

Find out more

You can find out more on AI and the law and stay up to date on the UK’s push towards regulating AI at Technology’s Legal Edge, DLA Piper’s tech-sector blog.

For more information on AI and the emerging legal and regulatory standards, visit DLA Piper’s focus page on AI.

You can find a more detailed guide on the AI Act and what’s in store for AI in Europe in DLA Piper’s AI Regulation Handbook.

To assess your organisation’s maturity on its AI journey (and check where you stand against sector peers) you can use DLA Piper’s AI Scorebox tool.

DLA Piper continues to monitor updates and developments of AI and its impacts on industry across the world. For further information or if you have any questions, please contact the authors or your usual DLA Piper contact.

Keeping an ‘AI’ on your data: UK data regulator recommends lawful methods of using personal information and artificial intelligence

Magda Zmorka — Tue, 08 Nov 2022 12:25:21 +0000

Authors: Jules Toynton, Coran Darling

Data is often the fuel that powers AI used by organisations. It tailors search parameters, spots behavioural trends, and predicts future possible outcomes (to highlight a just a few uses). In response, many of these organisations seek to accumulate and use as much data as possible, in order to make their systems work that little bit faster or more accurately.

In many cases, providing the data is not subject to copyright or other such restrictions, this is without many issues – organisations are able to amass large quantities of data that can be used initially to train their AI systems, or, after deployment, continue to update their datasets to ensure the latest and most accurate data is used.

Where this becomes a potential issue, is when the data being collected and used is personal information. For example, the principle of ‘data minimisation’ requires that only the necessary amount and type of personal data is used to develop an AI system. This is at odds with the ‘data hoarding’ corporate mentality described above, which seeks to know as much detail as possible. Furthermore, the principle of ‘purpose limitation’ places several restrictions on the re-use of historic data sets to train AI systems. This may cause particular headaches when working with an AI vendor that wishes to further commercialise the AI which has benefited from the learnings and developments of your data in a way that is beyond the purpose for which the data was originally provided.

It is however acknowledged by the Information Commissioner’s Office (“ICO”), the UK’s data regulator, that AI and personal data will forever be interlinked – unavoidably so in certain situations. In response, in November 2022, the ICO released a set of guidance on how organisations can use AI and personal data appropriately and lawfully, in accordance with the data privacy regime of the UK. The guidance is also supplemented by a number of frequently raised concerns when combining AI with personal data, including: should I carry out an impact assessment, do outputs need to comply with the principle of accuracy, and do organisations need permission to analyse personal data.

In this article we discuss some of the key recommendations in the context of the wider regulatory landscape for data and AI.

Key Recommendations:

The guide offers eight methods organisations can use to improve their handling of AI and personal information.

Take a risk-based approach when developing and deploying AI:

A first port of call for organisations should be an assessment of whether AI is needed for what is sought to be deployed. Most AI will typically fall within the remit of ‘high-risk’ if it engages with personal information for the purposes of the proposed EU AI Regulation (“AI Act”) (and likely a similar category within the developing UK framework). This will result in additional obligations and measures that will be required to be followed by the organisation in its deployment of the AI. A less technical and more privacy preserving alternative is therefore recommended by the ICO where possible.

Should AI be chosen after this, a data privacy impact assessment should be carried out to identify and minimise data risks that the AI poses to data subjects, as well as mitigating the harm it may cause. At this stage the ICO also recommends consulting different groups who may be impacted using AI in this context to better understand the potential risks.

Consider how decisions can be explained to the individuals affected:

As the ICO notes, it can be difficult to explain how AI arrives at certain decisions and outputs, particularly in the case of machine learning and complex algorithms where input values and trends change based on the AI’s ability to learn and teach itself based on the data it is fed.

Where possible, the ICO recommends that organisations:

be clear and open with subjects on how and why personal data is being used;
consider what explanation is needed in the context that the AI will be deployed;
assess what explanations are likely to be expected;
assess the potential impact of AI decisions to understand the detail required in explanations; and
consider how individual rights requests will be handled.

The ICO have acknowledged that this is a difficult area of data privacy and has provided detailed guidance, co-badged with the Alan Turing Institute, on “Explaining decisions made with AI”.

Limit data collection to only what is needed:

Contrary to several held beliefs by organisations, the ICO recommend that data is kept to a minimum where possible. This does not mean that data cannot be collected, but rather appropriate consideration must be given to the data that is collected and retained.

Organisations should therefore:

ensure that the personal data you use is accurate, adequate, relevant and limited, based on the context of the use of the AI; and
consider which techniques can be used to preserve privacy as much as practical. For example, as the ICO notes, synthetic data or federated learning could be used to minimise the personal data being processed.

It should be noted that data protection’s accuracy principle does not mean that an AI system needs to be 100% statistically accurate (which is unlikely to be practically achievable). Instead organisations should factor in the possibility of inferences/decisions being incorrect, and ensure that there are processes in place to ensure fairness and overall accuracy of outcome.

Address risks of bias and discrimination at an early stage:

A persistent concern throughout many applications of AI, particularly those interacting with sensitive data, is bias and discrimination. This is made worse in instances where too much of one trend of data is used, as the biases present in such data will form part of the essential decision-making process of the AI, thereby ‘hardwiring’ bias into the system. All steps should therefore be taken to (to the extent that it reflects the wider trend accurately) get as much variety within data used to train AI systems as possible.

To greater understand this issue, the ICO recommends that organisations:

assess whether the data gathered is accurate, representative, reliable, relevant, and up-to-date with the population or different sets of people with which the AI will be applied; and
map out consequences of the decisions made by the AI system for different groups and assess whether these are acceptable from a data privacy regulatory standpoint as well as internally.

Where AI does produce biased or discriminatory decisions, this is likely to conflict with the requirement for processing of personal data to be fair, as well as obligations of several other more specific regulatory frameworks. A prime example of this is the Equality Act, which ensures that discrimination on the grounds of protected characteristics, by AI or otherwise, is prohibited. Care should be taken by organisations to ensure that decisions are made in such a way that prevents repercussions from the wider data privacy and AI regimes, as well as those specific to the sectors and activities in which they are involved.

Dedicate time and resources to preparing data:

As noted above, the quality of an AI’s output is only going to be as good as the data it is fed and trained with. Organisations should therefore ensure sufficient resources are dedicated to preparing the data to be used.

As part of this process, organisations should expect to:

create clear criteria and lines of accountability about the labelling of data involving protected characteristics and/or special category data;
consult members of protected groups where applicable to define the labelling criteria; and
involve multiple human labellers to ensure consistency of categorisation and delineation and to assist with fringe cases.

Ensure AI systems are made and kept secure:

It should be of little surprise that the addition of new technologies can create new security risks (or exacerbate current ones). In the context of the AI Act and UK data privacy regulation (and indeed when a more established UK AI regime emerges), organisations are/will be legally required to implement appropriate technical and organisational measures to ensure suitable security protocols are in place for the risk associated with the information.

In order to do this, organisations could:

complete security risk assessments to create a baseline understanding of where risks are present;
complete regular model debugging on a regular basis; and
proactively monitor the system and investigate any anomalies (in some cases, the AI Act and any future UK AI framework may require human oversight as an additional protective measure regardless of the data privacy requirement).

Human review of AI outcomes should be meaningful:

Depending on the purpose of the AI, it should be established early on whether the outputs are being used to support a human decision-maker or whether decisions are solely autonomous. As the ICO highlights, data subjects deserve to know whether decisions with their data have been made purely autonomously, or with the assistance of AI. In instances where they are being used to assist a human, the ICO recommends that they are reviewed in a meaningful way.

This would therefore require that reviewers are:

adequately trained to interpret and challenge outputs made by AI systems;
sufficiently senior to have the authority to override automated decisions; and
accounting for other additional factors that weren’t included as part of the initial input data.

Data subjects have the right under the UK GDPR not to be subject to a solely automated decision, where that decision has a legal or similarly significant effect, and also have the right to receive meaningful information about the logic involved in the decision. Therefore, although worded as a recommendation, where AI is making significant decisions, meaningful human review becomes a requirement (or at least must be available on request).

Work with external suppliers involved to ensure that AI is used appropriately:

A final recommendation offered by the ICO is that where AI is procured from a third party, it is done so with their involvement. While it is usually the organisation’s responsibility (as controller) to comply with all regulations, this can be achieved more effectively with the involvement of those who create and supply the technology.

In order to comply with the obligations of both the AI Act and relevant data privacy regulations, organisations would therefore be expected to:

choose a supplier by carrying out the appropriate due diligence ahead of procurements;
work with the supplier to carry out assessments prior to deployment, such as impact assessments;
agree and document roles and responsibilities with the external supplier, such as who will answer individual rights requests;
request documentation from the external supplier that demonstrates they implemented a privacy by design approach; and
consider any international transfers of personal data.

When working with some AI providers, for example, with larger providers who may develop AI for a large range of applications as well as offer services to tailor their AI solutions for particular customers (and to commercialise these learnings), it may not be clear whether they are a processor or controller (or even a joint controller with the client for some processing). Where that company has enough freedom to use its expertise to decide what data to collect and how to apply its analytic techniques, it is likely to be a data controller as well.

Get in touch

For more information on AI and the emerging legal and regulatory standards visit DLA Piper’s focus page on AI.

You can find a more detailed guide on the AI Regulation and what’s in store for AI in Europe in DLA Piper’s AI Regulation Handbook.

To assess your organisation’s maturity on its AI journey in (and check where you stand against sector peers) you can use DLA Piper’s AI Scorebox tool.

You can find more on AI, technology, data privacy, and the law at Technology’s Legal Edge, DLA Piper’s tech-sector blog and Privacy Matters, DLA Piper’s Global Privacy and Data Protection resource.

DLA Piper continues to monitor updates and developments of AI and its impacts on industry in the UK and abroad. For further information or if you have any questions, please contact the authors or your usual DLA Piper contact.

EUROPE: Data protection regulators publish myth-busting guidance on machine learning

Magda Zmorka — Mon, 10 Oct 2022 08:31:58 +0000

Authors: Coran Darling, James Clark

In its proposed AI Regulation (“AI Act”), the EU recognises AI as one of the most important technologies of the 21^st century. It is often forgotten, however, that AI is not one specific type of technology. Instead, it is an umbrella term for a range of technologies capable of imitating certain aspects of human intelligence and decision-making – ranging from basic document processing software through to advanced learning algorithms.

One branch of the AI family is machine learning (“ML”), which uses models trained by datasets to resolve an array of complicated problems. The specific form and function of an ML system depends on the tasks it is intended to complete. For example, the ML system could be used to determine likely trends of categories of persons to default on loan agreements through the processing of financial default information. During the development and training of their algorithms, ML systems begin to adapt and recognise patterns within its data. They can then use this training to interpret new data and form outputs based on the intended process.

The use of ML systems gives rise to several questions which lawyers and compliance professionals may be uncertain about answering. For example: How is the data interpreted? How can an outcome be verified as accurate? Does the use of large datasets remove any chance of bias in my decision making?

In attempt to resolve some of these issues, the Agencia Española de Protección de Datos (Spain’s data regulator) (“AEPD”) and the European Data Protection Supervisor (“EDPS”) have jointly published a report addressing a number of notable misunderstandings about ML systems. The joint report forms part of a growing trend among European data protection regulators to openly grapple with AI issues, recognising the inextricable link between AI systems – which are inherently creatures of data – and data protection law.

In this article we elaborate on some of these clarifications in the context of a world of developing AI and data privacy frameworks.

Causality requires more than finding correlations

As a brief reminder to ourselves, causality is “the relationship that exists between cause and effect” whereas correlation is “the relationship between two factors that occur or evolve with some synchronisation”.

ML systems are very good at distinguishing correlations with datasets but typically lack the ability to accurately infer a causal relationship between data and outcomes. The example given in the report is that, given certain data, a system could reach the conclusion that tall people are smarter than shorter people simply by finding a correlation between height and IQ scores. As we are all aware, correlation does not imply causation (despite the spurious inferences from data that can be made, for example from the correlation between the per capita consumption of mozzarella cheese and the number of civil engineering doctorates awarded).

It is therefore necessary to ensure data is suitably vetted throughout the initial training period, and at the point of output, to ensure that the learning process within the ML system has not resulted in it attributing certain outcomes with correlating, but non-causal, information. Having some form of human supervision to determine when certain variables are being overweighted within the decision process may assist in doing so and allow intervention when bias is detected at an early stage in processing.

Training datasets must meet accuracy and representativeness thresholds

Contrary to belief, a greater variety in data does not necessarily mean that it is a better dataset or better able to mitigate bias. It is instead better to have a focused dataset that accurately reflects the trend being investigated. For example, having data on all types of currency in relation to their conversion to dollars is not helpful when seeking to find patterns and trends in the fluctuation in conversion between dollars and pounds sterling.

Furthermore, the addition of too much of certain data may lead to inaccuracies and bias in outcomes. For example, as noted in the report, the use of light-skinned male images in a dataset used to train facial recognition software will be largely unhelpful in correcting any existing biases for ethnicity or gender within the system.

The General Data Protection Regulation (“GDPR”) requires that processing of personal data be proportionate to its purpose. Care should therefore be taken when seeking to increase the amount of data in a dataset. Substantial increases in data used to produce a minimal correction in a training dataset, for example, may not be deemed proportionate and lead to breach of the requirements of the Regulation.

Well-performing machine learning systems require datasets above a certain quality threshold

It is equally not necessary that training datasets be completely error-free. Often, this is not possible or commercially feasible. Instead, datasets should be held to a certain quality that allows for a comprehensive and sufficiently accurate description of data. Providing that the average result is accurate to the overall trend, ML systems are typically able to deal with a low level of inaccuracy.

As the report notes, some models are even created and trained using synthetic data (artificially generated datasets – described in greater detail in our earlier article on the subject) that replicate the outcome of real data. In some cases, synthetic data may even be aggregated data from actual datasets which retains the benefit of accurate data trends while removing many compliance issues associated with personally identifiable information.

This is not to say that organisations should not strive to attain an accurate data set and, in fact, under the AI Act it is a mandatory requirement that a system’s data be accurate and robust. Ensuring accuracy and, where relevant, currency of personal data is also a requirement under the GDPR. However, it is important to remember that ‘accuracy’ in the context of ML need not be an absolute value.

Federated and distributed learning allows the development of systems without sharing training data sets

One approach proposed for the developing of accurate ML systems, in the absence of synthetic data, is the creation of large data-sharing repositories, often held in substantial cloud computing infrastructure. Under another limb of the EU’s digital strategy – the Data Governance Act – the Commission is attempting to promote data sharing frameworks through trusted and certified ‘data intermediation services’. Such services may have a role to play in supporting ML. The report highlights that while this means of centralised learning is an effective way of collating large quantities of data, this method comes with its own challenges.

For example, in instances of personal data, the controller and processor of the data must consider the data in the context of their obligations under the GDPR and other data protection regulations. Requirements regarding purpose limitation, accountability, and international transfers may all therefore become applicable. Furthermore, the collation of sensitive data increases the interest of other parties, particularly those with malevolent intent, in gaining access. Without suitable protections put in place, a centralised dataset with large quantities of data may become a honeypot for hackers and corporate parties seeking to gain an upper hand.

The report offers, as an alternative, the use of distributed on-site and federated learning. Distributed on-site learning involves the data controller downloading a generic or pre-trained model to a local server. The server then uses its own dataset to train and improve the downloaded model. After this is completed, there is no further need for the generic model. By comparison, with federalised learning the controller trains a model with its own data and then sends only its parameters to a central server for aggregation. It should be noted however, that often this is not the most efficient method and may even be a barrier to entry or development for smaller organisations in the ML sector, due to cost and expertise restrictions.

Once deployed, machine learning models performance may deteriorate until further trained

Unlike other technologies, ML models are not plug in and forget systems. The nature of ML is that the system adapts and evolves over time. Consequently, once deployed, an ML system must be consistently tested to ensure it remains capable of solving the problems for which it was created. Once mature, a model may no longer provide accurate results if it does not evolve with its subject matter. For example, a ML model aimed at predicting futures prices of coffee beans will deteriorate if it is not fed new and refreshed data.

The result of this, should the data not be updated for some time, is an inaccurate model that will produce tainted, biased, or completely incorrect judgements and outcomes (a situation known as data drift). This may also occur in instances where the interpretation of the data changes within the algorithm while the general distribution does not (known as concept drift). As the report notes, it is therefore necessary to monitor the ML system to detect any deterioration in the model and act on its decay.

A well-designed machine learning model can produce decisions understandable to relevant stakeholders

Perhaps the fault of popular media, there is a recurring belief that the automatic decisions taken by ML algorithms cannot be explained. While this may be the case for a select few models, a well-designed model will typically produce decisions that can be readily understood by stakeholders.

Some of the factors which are important in terms of explainability are understanding which parameters were considered and their weighting in decision making. The degree of ‘explainability’ demanded from a model is likely to vary based on the data involved and the likelihood of a decision to impact the lives of data subjects (if any). For example, far greater explainability would be expected from a model that deals with credit scoring or employment applications than those tasked with predicting futures markets.

It is possible to provide meaningful transparency to users without harming IP rights

A push towards transparency and explainability has naturally led many to question how to effectively protect trade secrets and IP when everyone can see how their models and ML systems behave. As the report highlights, transparency and the protection of IP are not incompatible, and there are several methods of providing transparency to users without harming proprietary know-how or IP. While users should be provided with sufficient information to know what their data (particularly personal data) is being used for, this does not necessarily mean that specific technical details need disclosed.

The report compares the requirement to the provision of advisory leaflets with medicine. It is necessary to alert users to what may happen when using the medicine (or model/system) without providing an explanation of how this is specifically achieved. In cases of personal information, further explanation may be required to comply with the principles set out in applicable data protection regulation. At a minimum, data processors and controllers should properly inform users and subjects of the impacts of the ML and its decision-making on their daily lives.

Further protections for individuals may be achieved through certification in accordance with international standards, overt limitations of system behaviour, or the use of human moderators with appropriate technical knowledge.

Machine learning systems are subjects to different types of biases

It is often assumed that bias is an inherently human thing. While it is correct to say that a ML system is not in itself biased, the system will perform as it is taught. This means that while ML systems can be free from human bias in many cases, this is entirely subject to its inherent and learned characteristics. Where training or subsequent data is heavily one-sided or too much weight is ascribed to certain data points, the model may interpret this ‘incorrectly’ therefore leading to ‘biased’ results.

The inherent lack of ‘humanity’ in these systems does however have its drawback. As the report notes, ML systems have a limited ability to adapt to soft-contextual changes and unforeseen circumstances, such as changes in market trends due to new legislation or social norms. This point further highlights the need for appropriate human oversight of the functioning of ML systems.

Predictions are only accurate when future events reproduce past trends

‘ML systems are capable of predicting the future’ is perhaps one of the most common misconceptions with the technology. Rather, they can only predict possible future outcomes to the extent that they reflect the trends of previous data. The likelihood that you buy coffee on a Monday morning when you have habitually done so since starting your job indicates that it is certainly likely that you will do so this coming Monday, but it does not guarantee that you will do so, or that an unforeseen event will prevent you from doing so.

Applying this to the context of commerce, a ML system may be able to (with relative accuracy) predict the long-term trend of a particular futures market but cannot guarantee with absolute certainty that market behaviour will follow suit, particularly in the case of black swan events, such as droughts or unexpected political decisions.

To increase the chances of a more accurate outcome, organisations should seek to obtain as large a data set as possible, with as many variables considered as obtainable, while maintaining factual accuracy to the trends of data they utilise. This will allow the ML system to better predict behavioural responses to certain data and therefore produce more accurate prediction outcomes.

A system’s ability to find non-evident correlations in data can end up with the discovery of new data, unknown to the data subject

A simultaneous advantage and risk of ML systems is their capacity to map data points and establish correlations previously unanticipated by the system’s human designers. In short, it is therefore not always possible to anticipate the outcomes they may produce. Consequently, systems may identify trends in data that were not previously sought, such as predispositions to diseases. While this may be beneficial in certain circumstances, such as health, it may equally be unnecessary or inappropriate in other contexts.

Where these ML systems begin processing personal data beyond the scope of their original purpose, considerations of lawfulness, transparency, and purpose limitation under the GDPR will be engaged. Failure to appropriately justify the processing of personal data in this manner without clear purpose may result in breach of the Regulation and the subsequent penalties that accompany it.

Get in touch

For more information on AI and the emerging legal and regulatory standards visit DLA Piper’s focus page on AI.

You can find a more detailed guide on the AI Regulation and what’s in store for AI in Europe in DLA Piper’s AI Regulation Handbook.

To assess your organisation’s maturity on its AI journey in (and check where you stand against sector peers) you can use DLA Piper’s AI Scorebox tool.

UK: New National Strategy for Health Data

Magda Zmorka — Wed, 13 Jul 2022 12:59:07 +0000

Author: James Clark

The UK’s Department for Health and Social Care (“DHSC”) has published a major strategy document (‘Data saves lives: reshaping health and social care with data’) outlining the government’s plans for the regulation and use of data in healthcare.

In this post, we look at some of the most interesting proposals outlined in the strategy and consider what they might mean for the future regulation of data and technology in UK healthcare.

Secure Data Environments

The NHS will step up its investment in and use of ‘secure data environments’ (sometimes referred to as ‘trusted research environments’). In simple terms, these are specially designated, secure servers on which a third party researcher’s access to health data can be properly controlled and monitored. These will become the default route for NHS organisations to provide access to their de-identified data for research and analysis. This creates opportunities for providers of secure data platforms and the privacy enhancing technologies on which these platforms depend. It also highlights the need for companies working with the NHS to increase their own familiarisation with, and investment in, secure data environments.

Secure data environments are a hot topic in data circles. For example, they also emerge in the EU’s new Data Governance Act, in the form of its creation of ‘data intermediation services’ – i.e., services that provide a secure environment in which companies or individuals can share data.

Fair Terms for Data Partnerships

The strategy also contains proposals for the data sharing agreements that NHS bodies use when providing access to health data. Supposedly responding to public concerns about data sharing partnerships with the private sector, the Government will:

Require data sharing arrangements to embody 5 core principles (for example, any use of NHS data not available in the public domain must have an explicit aim to improve the health, welfare or care of patients in the NHS, or the operation of the NHS, and any data sharing arrangement must be transparently and clearly communicated to the public).
Develop commercial principles to ensure that partnerships for access to data contain appropriate contractual safeguards. This will lead to a review and likely update of NHS Digital’s template data sharing and data access agreements by December 2023.

Consequently, those organisations accessing NHS datasets are likely to see changes in the contractual terms on which that access is provided, and greater scrutiny of the overall arrangement to ensure adherence with principles designed to encourage public trust and confidence in such arrangements.

Trust and Transparency

On a similar theme, the strategy contains a range of other proposals designed to improve the public’s trust in the use of health data.

Alongside the investment in secure data environments, the Government also publicly commits to increase investment in a wider range of privacy enhancing technologies (or ‘PETs’), such as homomorphic encryption (a technology that allows functions to be performed on encrypted data without ever having to decrypt it) and synthetic data (artificially manufactured data which strongly mimics real-world data, but without the privacy consequences). The ICO has written supportively about some of these technologies in its updated draft guidance on anonymisation, and consequently there seems to be a concerted push towards the adoption of technical solutions to privacy concerns in an ever more data-dependent world.

The Government also plans to further improve transparency and understanding around how it uses health data (public confusion surrounding changes to the National Data-Opt Out regime in 2021 is admitted as an example of the sort of failing the Government wants to avoid in the future). Developments on this front will include a ‘Data Pact’ (a high-level charter outlining core guarantees towards the public in terms of fair use of health data) and an online hub, with a transparency statement explaining how publicly held health and care data is used in practice.

Improving Access to Health Data

Alongside the focus on public trust and transparency, the strategy is also concerned with promoting greater access to health data in the public interest. This is a theme that has been prominent internationally following the Covid pandemic – a renewed understanding of the importance of health data for research and development purposes, leading to a demand to break down unnecessary barriers to accessing and combining datasets for these purposes.

The Government plans to do this partly through major investment in (of up to £200 million) in NHS data infrastructure to make research-ready data available to researchers. DHSC envisages a ‘vibrant hub of genomics, imaging, pathology, and citizen generated data, where AI-enabled tools and technologies can be deployed’.

On the legislative front, it’s likely that this part of the strategy will also be supported by the Government’s impending Data Reform Bill, which amongst other things, is making changes to the research provisions of UK data protection law to, for example, provide a clearer definition of scientific research, a broader form of consent where used as a lawful basis for research, and a more concrete privacy notice exemption where data is repurposed for scientific research purposes. All of these changes are expressly intended to promote greater use of personal data, including health data, for responsible research purposes.

There are strong parallels here with the EU’s proposals for a European Health Data Space, which will promote access to electronic health data for secondary purposes.

Encouraging AI Innovation

No data strategy in 2022 would be complete without consideration of Artificial Intelligence (AI). On this front, DHSC:

Commits to working with the Office of AI (OAI) on its developing plans for the regulation of AI in the United Kingdom. The OAI’s White Paper on the governance and regulation of AI is expected imminently and will be closely scrutinised as the UK’s response to the EU’s draft AI Act. The health sector is one of the most sensitive and important in an AI context and the NHS’ work on this will be led by a newly created NHS AI Lab.
Will develop unified standards for the efficacy and safety testing of AI systems, working with the Medicines and Healthcare products Regulatory Agency (MHRA) and the National Institute for Clinical Excellence (NICE). Safety standards that can be used by development teams building AI systems are an important part of the regulatory framework for safe AI, and this is likely to be a welcome step.
Will, through the NHS AI Lab, develop a methodology for evaluating the AI safety of market-authorised products in healthcare.

In summary, the strategy contains an ambitious set of proposals that are intended to cement the UK’s position as a world leader in healthcare informatics and data-driven health research. Notably, they are clearly designed to balance and reconcile competing demands for greater access to and use of health data, with the protection of trust, privacy and security in that data.