| Privacy Matters

EU: EDPB Opinion on AI Provides Important Guidance though Many Questions Remain

James Clark, Heidi Waem, John Magee and Rachel de Souza — Tue, 14 Jan 2025 13:53:05 +0000

A much-anticipated Opinion from the European Data Protection Board (EDPB) on AI models and data protection has not resulted in the clear or definitive guidance that businesses operating in the EU had hoped for. The Opinion emphasises the need for case-by-case assessments to determine GDPR applicability, highlighting the importance of accountability and record-keeping, while also flagging ‘legitimate interests’ as an appropriate legal basis under specific conditions. In rejecting the proposed Hamburg thesis, the EDPB has stated that AI models trained on personal data should be considered anonymous only if personal data cannot be extracted or regurgitated.

Introduction

On 17 December 2024, the EDPB published a much-anticipated Opinion on AI models and data protection. The Opinion includes the EDPB’s view on the following key questions: does the development and use of an AI model involve the processing of personal data; and if so, what is the correct legal basis for that processing?

As is sometimes the case with EDPB Opinions, which necessarily represent the consensus view of the supervisory authorities of 27 different Member States, the Opinion does not provide many clear or definitive answers. Instead, the EDPB offers indicative guidance and criteria, calling for case-by-case assessments of AI models to understand whether, and how, they are impacted by the GDPR. In this context, the Opinion repeatedly highlights the importance of accountability and record-keeping by businesses developing or using AI, so that the applicability of data protection laws, and the business’ compliance with those laws, can be properly assessed.

Whilst the equivocation of the Opinion might be viewed as unhelpful by European businesses looking for regulatory certainty, it is also a reflection of the complexities inherent in this intersection of law and technology.

In summary, the answers given by the EDPB to the four questions in the Opinion are as follows:

Can an AI model, which has been trained using personal data, be considered anonymous? Yes, but only in some cases. It must be impossible, using all means reasonably likely to be used, to obtain personal data from the model, either through attacks which aim to extract the original training data from the model itself, or through interactions with the AI model (i.e., personal data provided in responses to prompts / queries).
Is ‘legitimate interests’ an appropriate legal basis for the training and development of an AI model? In principle yes, but only where the processing of personal data is necessary to develop the AI model, and where the ‘balancing test’ can be resolved in favour of the controller. In particular, the issue of data minimisation, and the related issue of web-scraping / indiscriminate capture of data, will be relevant here.
Is ‘legitimate interests’ an appropriate legal basis for the deployment of an AI model? In principle yes, but only where the processing of personal data is necessary to deploy the AI model, and where the ‘balancing test’ can be resolved in favour of the controller. Here, the impact on the data subject of the use of the AI model is of predominant importance.
If an AI Model has been found to have been created, updated or developed using unlawfully processed personal data, how does this impact the subsequent use of that AI model? This depends in part on whether the AI model was first anonymised before being disclosed to the deployer of that model (see Question 1). Otherwise, the deployer of the model may need to assess the lawfulness of the development of the model as part of its accountability obligations.

Background

The Opinion was issued by the EDPB under Article 64 of the GDPR, in response to a request from the Irish Data Protection Commission. Article 64 requires the EDPB to publish an opinion on matters of ‘general application’ or which ‘produce effects in more than one Member State’.

In this case, the Irish DPC asked the EDPB to provide an opinion on the above-mentioned questions – a request that is not surprising given the general importance of AI models to businesses across the EU, but also in light of the large number of technology companies developing those models who have established their European operations in Ireland.

In order to understand the Opinion, it helps to be familiar with certain concepts and terminology relating to AI.

First, the Opinion distinguishes between an ‘AI system’ and an ‘AI model’. For the former, the EDPB relies on the definition given in the EU AI Act. In short: a machine-based system operating with some degree of autonomy that infers, from inputs, how to produce outputs such as predictions, content, recommendations, or decisions. An AI model, meanwhile, is a component part of an AI system. Colloquially, it is the ‘brain’ of the AI system – an algorithm, or series of algorithms (such as in the form of a neural network), that recognises patterns in data. AI models require the addition of further components, such as a user interface, to become AI systems. To take a common example – the generative AI system known as Chat GPT is a software application comprised of an AI model (the GPT Large Language Model) connected to a chatbot-style user interface that allows the user to submit queries (or ‘prompts’) to the model in the form of natural language questions. Whilst the Opinion is notionally concerned only with AI models, at times the Opinion appears to blur the distinction between the model and the system, in particular, when discussing the significance of model outputs that are only rendered comprehensible to the user through an interface that sits outside of the model.

Second, the Opinion relies on an understanding of a typical ‘AI lifecycle’, pursuant to which an AI model is first developed by training the model on large volumes of data. This training may happen in a number of phases which become increasingly refined (referred to as ‘fine-tuning’). Only after an AI model is developed can it be used, or ‘deployed’, in a live setting, as part of an AI system. Often, the developer of an AI model will not be the same person as the deployer. This is relevant because the Opinion variously addresses both development and deployment phases.

The significance of the ‘Hamburg thesis’

With respect to the key question of whether AI models can be considered anonymous, the Opinion follows in the wake of a much-discussed paper published in July 2024 by the data protection authority for the German state of Hamburg. The paper took the position that AI models (specifically, Large Language Models) are, in isolation, anonymous – they do not involve the processing of personal data.

In order to reach that conclusion, the paper decoupled the model itself from: (i) the prior training of the model (which may involve the collection and further processing of personal data as part of the training dataset); and (ii) the subsequent use of the model, whereby a prompt/input may contain personal data, and an output may be used in a way that means it constitutes personal data.

Looking only at the AI model itself, the paper decided that the tokens and values which make up the ‘inner workings’ of a typical AI model do not, in any meaningful way, relate to or correspond with information about identifiable individuals. Consequently, the model itself was found to be anonymous, even if the development and use of the model involves the processing of personal data.

The Hamburg thesis was welcomed for several reasons, not least because it resolved difficult questions such as how data subject rights could be understood in relation to an AI model (if someone asks for their personal data to be deleted, then what can this mean in the context of an AI model?), and the question of the lawful basis for ‘storing’ personal data in an AI model (as distinct from the lawful basis for collecting and preparing data to train the model).

However, as we go on to explain, the EDPB Opinion does not follow the relatively simple and certain framework presented by the Hamburg thesis. Instead, it introduces uncertainty by asserting that there are, in fact, scenarios where an AI model contains personal data, but that this must be determined on a case-by-case basis.

Are AI models anonymous?

First, the Opinion is only concerned with AI models that have been trained using personal data. Therefore, AI models trained using solely non-personal data (such as statistical data, or financial data relating to businesses) can, for the avoidance of doubt, be considered anonymous. However, in this context the broad scope of ‘personal data’ under the GDPR must be remembered, and the Opinion does not suggest any de minimis level of personal data that needs to be involved in the training of the AI model for the question of GDPR applicability to arise.

Where personal data is used in the training phase, the next question is whether the model is specifically designed to provide personal data regarding individuals whose personal data were used to train the model. If so, the AI model will not be anonymous. For example, an AI model that is trained to provide a user, on request, with biographical information and contact details for directors of public companies, or a generative AI model that is trained on the voice recordings of famous singers so that it can, in turn, mimic the voices of those singers. In each case, the model is trained on personal data of specific individuals, in order to be able to produce other personal data about those individuals as an output.

Finally, there is the intermediary case of AI models that are trained on personal data, but that are not designed to provide personal data related to the training data as an output. It is this use case that the Opinion focuses on. The conclusion is that AI models in this category may be anonymous, but only if the developer of the model can demonstrate that information about individuals whose personal data was used to train the model cannot be ‘obtained from’ the model, using all means reasonably likely to be used. Notwithstanding that personal data used for training the model no longer exists within the model in its original form (but rather it is “represented through mathematical objects“), that information is, in the eyes of the EDPB, still capable of constituting personal data.

The following question then arises: how does someone ‘obtain’ personal data from an AI model? In short, the Opinion posits two possibilities. First, that training data is ‘extracted’ via deliberate attacks. The Opinion refers to an evolving field of research in this area and makes reference to techniques such as ‘model inversion’, ‘reconstruction attacks’, and ‘attribute and membership inference’. These are techniques that can be deployed to trick the model into revealing training data, or otherwise reconstruct that training data, in some cases relying on privileged access to the model itself. Second, is the risk of accidental or inadvertent ‘regurgitation’ of personal data as part of an AI model’s outputs.

Consequently, a developer must be able to demonstrate that its AI model is resistant both to attacks that extract personal data directly from the model, as well as to the risk of regurgitation of personal data in response to queries: “In sum, the EDPB considers that, for an AI model to be considered anonymous, using reasonable means, both (i) the likelihood of direct (including probabilistic) extraction of personal data regarding individuals whose personal data were used to train the model; as well as (ii) the likelihood of obtaining, intentionally or not, such personal data from queries, should be insignificant for any data subject“.

Which criteria should be used to evaluate whether an AI model is anonymous?

Recognising the uncertainty in its conclusion that the AI models ‘may or may not‘ be anonymous, the EDPB provides a list of criteria that can be used to assess the likelihood of a model being found to contain personal data. These include:

Steps taken to avoid or limit the collection of personal data during the training phase.
Data minimisation or masking measures (e.g., pseudonymisation) applied to reduce the volume and sensitivity of personal data used during the training phase.
The use of methodologies during model development that reduce privacy risks (e.g., regularisation methods to improve model generalisation and reduce overfitting, and appropriate and effective privacy-preserving techniques, such as differential privacy).
Measures that reduce the likelihood of obtaining personal data from queries (e.g., ensuring the AI system blocks the presentation to the user of outputs that may contain personal data).
Document-based audits (internal or external) undertaken by the model developer that include an evaluation of the chosen measures and of their impact to limit the likelihood of identification.
Testing of the model to demonstrate its resilience to different forms of data extraction attacks.

What is the correct legal basis for AI models?

When using personal data to train an AI model, the preferred legal basis is normally the ‘legitimate interests’ of the controller, under Article 6(1)(f) GDPR. This is for practical reasons. Whilst, in some circumstances, it may be possible to obtain GDPR-compliant consent from individuals authorising the use of their data for AI training purposes, in most cases this will not be feasible.

Helpfully, the Opinion accepts that legitimate interests is, in principle, a viable legal basis for processing personal data to train an AI model. Further, the Opinion also suggests that it should be straightforward for businesses to identify a lawful legitimate interest. For example, the Opinion cites “developing an AI system to detect fraudulent content or behaviour” as a sufficiently precise and real interest.

However, where businesses may have more difficulty is in showing that the processing of personal data is necessary to realise their legitimate interest, and that their legitimate interest is not outweighed by any impact on the rights and freedoms of data subjects (the ‘balancing test’). Whilst this is fundamentally just a restatement of existing legal principles, the following sentence should nevertheless cause some concern for businesses developing AI models, in particular Large Language Models: “If the pursuit of the purpose is also possible through an AI model that does not entail processing of personal data, then processing personal data should be considered as not necessary“. Technically speaking, it may often be the case that personal data is not essential for the training of an AI model – however, this does not mean that it is straightforward to systematically remove all personal data from a training dataset, or otherwise replace all identifying elements with ‘dummy’ values.

With respect to the balancing test, the EDPB asks businesses to consider a data subject’s interest in self-determination and in maintaining control over their own data when considering whether it is lawful to collect personal data for model training purposes. In particular, it may be more difficult to satisfy the balancing test if a developer is scraping large volumes of personal data (especially including any sensitive data categories) against their wishes, without their knowledge, or otherwise in contexts that would not be reasonably expected by the data subject.

When it comes to the separate purpose of deploying an AI model, the EDPB asks businesses to consider the impact on the data subject’s fundamental rights that arise from the purpose for which the AI model is used. For example, AI models that are used to block content publication may adversely affect a data subject’s fundamental right to freedom of expression. However, conversely the EDPB recognises that the deployment of AI models may have a positive impact on a data subject’s rights and freedoms – for example, an AI model that is used to improve accessibility to certain services for people with disabilities). In line with Recital 47 GDPR, the EDPB reminds controllers to consider the ‘reasonable expectations’ of data subjects in relation to both training and deployment uses of personal data.

Finally, the Opinion discusses a range of ‘mitigating measures’ that may be used to reduce risks to data subjects and therefore tip the balancing test in favour of the controller. These include:

Technical measures to reduce the volume or sensitivity of personal data at use (e.g., pseudonymisation, masking).
Measures to facilitate the exercise of data subject rights (e.g., providing an unconditional right for data subjects to opt-out of the use of their personal data for training or deploying the model; allowing a reasonable period of time to elapse between collection of training data and its use).
Transparency measures (e.g., public communications about the controller’s practices in connection with the use of personal data for AI model development).
Measures specific to web-scraping (e.g., excluding publications that present particular risks; excluding certain data categories or sources; excluding websites that clearly object to web scraping).

Notably, the EDPB observes that, to be effective, these mitigating measures must go beyond mere compliance with GDPR obligations (for example, providing a GDPR compliant privacy notice, which a controller would in any case be required to do, would not be an effective transparency measure for these purposes).

When are companies liable to non-compliant AI models?

In its final question, the DPC sought clarification from the EDPB on how a deployer of an AI model might be impacted by any unlawful processing of personal data in the development phase of the AI model.

According to the EDPB, such ‘upstream’ unlawful processing may impact a subsequent deployer of an AI model in the following ways:

Corrective measures taken against the developer may have a knock-on effect on the deployer – for example, if the developer is ordered to delete personal data unlawfully collected for training purposes, the developer would not be allowed to subsequently process this data. However, this raises an important practical question about how such data could be identified in, and deleted from, the AI model, taking into account the fact that the model does not retain training data in its original form.
Unlawful processing in the development phase may impact the legal basis for the deployment of the model – in particular, if the deployer of the AI model is relying on ‘legitimate interests’, it will be more difficult to satisfy the balancing test in light of the deficiencies associated with the collection and use of the training data.

In light of these risks, the EDPB recommends that deployers take reasonable steps to assess the developer’s compliance with data protection laws during the training phase. For example, can the developer explain the sources of data used, steps taken to comply with the minimisation principle, and any legitimate interest assessments conducted for the training phase? For certain AI models, the transparency obligations imposed in relation to AI systems under the AI Act should assist a deployer in obtaining this information from a third party AI model developer. While the opinion provides a useful framework for assessing GDPR issues with AI systems, businesses operating in the EU may be frustrated with the lack of certainty or definitive guidance on many key questions relating to this new era of technology innovation.

Ireland: Increased regulatory convergence of AI and data protection: X suspends training of AI chatbot with EU user data after Irish regulator issues High Court proceedings

Mon, 19 Aug 2024 12:23:43 +0000

The Irish Data Protection Commission (DPC) has welcomed X’s agreement to suspend its processing of certain personal data for the purpose of training its AI chatbot tool, Grok. This comes after the DPC issued suspension proceedings against X in the Irish High Court. The DPC described this as the first time that any Lead Supervisory Authority had taken such an action, and the first time that it had utilised these particular powers.

Section 134 of the Data Protection Act 2018 allows the DPC, where it considers there is an urgent need to act to protect the rights and freedoms of data subjects, to make an application to the High Court for an order requiring a data controller to suspend, restrict, or prohibit the processing of personal data.

The High Court proceedings were issued on foot of a complaint to the DPC raised by consumer rights organisations Euroconsumers, and Altroconsumo on behalf of data subjects in the EU/EEA. The complainants argued that the Grok chatbot was being trained with user data in a manner that did not sufficiently explain the purposes of data processing, and that more data than necessary was being collected. They further argued that X may have been handling sensitive data without sufficient reasons for doing so.

Much of the complaint stemmed from X’s initial approach of having data sharing automatically turned on for users in the EU/EEA, which it later mitigated by adding an opt-out setting. X claimed that it had relied on the lawful basis of legitimate interest under the GDPR, but the complainants argued that X’s privacy policy – dating back to September 2023 – was insufficiently clear as to how this applied to the processing of user data for the purposes of training AI models such as Grok.

This development follows a similar chain of events involving Meta in June. Complaints from privacy advocacy organisation NOYB were made against Meta’s reliance on ‘legitimate interest’ in relation to the use of data to train AI models. This led to engagement with the DPC and the eventual decision in June by Meta to pause relevant processing (without the need for the authority to invoke s134).

The DPC and other European supervisory authorities strive to emphasise the principles of lawfulness, fairness and transparency at the heart of the GDPR, and their actions illustrate that any activities which purport to threaten these values will be dealt with directly.

The DPC has previously taken the approach of making informal requests and has stated that the exercise of its powers in this case comes after extensive engagement with X on its model training. The High Court proceedings highlight the DPC’s willingness to escalate action where there remains a perceived risk to data subjects.

The DPC has, in parallel, stated that it intends to refer the matter to the EDPB although there has been no confirmation of such referral as of this date.

Such referral will presumably form part of a thematic examination of AI processing by data controllers. The topic is also the subject of debate from individual DPAs, as evidenced by the Discussion Paper on Large Language Models and Personal Data recently published by the Hamburg DPA.

The fact much of the high profile activity relating to regulation of AI is coming from the data protection sphere will no doubt bolster the EDPB’s recommendation in a statement last month that Data Protection Authorities (DPAs) are best placed to regulate high risk AI.

It is expected that regulatory scrutiny and activity will only escalate and accelerate in tandem with the increase in integration of powerful AI models into existing services by ‘big tech’ players to enrich data. This is particularly the case where it is perceived that data sets are being re-purposed and further processing is taking place. In such circumstances, it is essential that an appropriate legal basis is being relied upon – noting the significant issues that can arise if there is an over-reliance on legitimate interest. The DPC and other regulators are likely to investigate, engage and ultimately intervene where it believes that data subjects’ rights under the GDPR are threatened. Perhaps in anticipation of more cross-border enforcement activity, last month, the European Commission proposed a new law to streamline cooperation between DPAs when enforcing the GDPR in such cases.

A fundamental lesson from these developments is that, in the new AI paradigm, ensuring there is a suitable legal basis for any type of processing and the principles of fairness and transparency are complied with should be an absolute priority.

HONG KONG: Artificial Intelligence – Model Personal Data Protection Framework

Carolyn Bigg and Gwyneth To — Thu, 13 Jun 2024 11:37:18 +0000

In the rapid development of artificial intelligence (“AI”), regulators are playing catch up in creating frameworks to aid and regulate its development.

As the AI landscape begins to mature, different jurisdictions have begun to publish guidance and frameworks. Most recently, on 11 June 2024, Hong Kong’s Office of the Privacy Commissioner for Personal Data (“PCPD”) published the Artificial Intelligence: Model Personal Data Protection Framework (“Model Framework”) as a step to provide organisations with internationally recognised practical recommendations and best practices in the procurement and implementation of AI.

Summary of the Model Framework

The key underlying theme for Hong Kong’s Model Framework lies in the ethical procurement, implementation and use of AI systems, in compliance with data protection under the Personal Data Privacy Ordinance (“PDPO”).

The non-binding Model Framework seeks to promote organisation’s internal governance measures. As such, the Model Framework focuses on four key areas for organisations to take measures throughout the lifecycle of the deployment of AI:

Establishing an AI strategy and governance – to formulate an internal strategy and governance considerations in the procurement of AI solutions.

Conducting risk assessment with human oversight – undergo risk assessments and tailor risk management with respect to the organisation’s use of AI, including the decision of the level of human oversight in automated decision making.

Customising AI models and implementation and management of AI systems – preparation and management of data in the use of AI systems to ensure data and system security.

Communicating and engaging with stakeholders – communicate with relevant stakeholders (e.g. suppliers, customers, regulators) to promote transparency and trust in the use of AI.

It is worth noting that the Model Framework makes reference to the 2021 Guidance on the Ethical Development and Use of Artificial Intelligence (“Guidance”), also issued by the PCPD. The Model Framework, which focuses on the procurement of AI solutions, complements the earlier Guidance which is primarily aimed at AI solution providers and vendors.

As a recap, the Guidance recommends three data stewardship values of being respectful, beneficial and fair, as well as seven ethical principles of accountability, human oversight, transparency and interpretability, data privacy, beneficial AI, reliability robustness and security, and fairness – which are not foreign concepts for organisations from a data protection perspective.

Comparison with other jurisdictions

With different jurisdictions each grappling with their own AI regulatory framework, the common theme is the goal of ensuring the responsible use of AI. That said, there are slight nuances in the focus of each regulator.

For instance, the AI Act of the European Union considers AI systems in terms of their risk level, whereby serious AI incidents must be reported to relevant market surveillance authorities. Hong Kong’s Model Framework differs in that its approach to AI incidents mirrors the PDPO’s non-compulsory reporting of general personal data incidents.

Meanwhile in Singapore, the regulatory framework also touches on the responsible use of AI in personal data protection. That said, compared to the Hong Kong Model Framework’s personal data protection focus, the Singapore’s regulatory framework is a more general, broader governance model for generative AI applications.

Next steps

The publication of the Model Framework is a welcomed move, as it provides more clarity as to the direction and focus of Hong Kong regulators on the use of AI. We expect more standards and guidance to be gradually published, with personal data protection as a central theme to the compliance of such.

Whilst different global regulators differ slightly in their focus – the central goal of responsible use of AI remains. As such, organisations currently using or considering to use AI in their operations – be it for internal or external purposes – should focus on designing a global internal strategy and governance rules, in order to understand and mitigate the risks associated with their use of AI.

As a first step, organisations should understand the extent and use of AI in their operations (i.e. whether this is a procurement of AI solutions, or the implementation and training of the organisation’s own AI model). With this, organisations should then undergo an internal data audit to understand the scope and extent of information involved in the deployment of AI, in order to assess and mitigate risks accordingly.

Please contact Carolyn Bigg (Carolyn.Bigg@dlapiper.com) if you would like to discuss what these latest developments mean for your organisation.

This article was not generated by artificial intelligence.

Europe: The EU AI Act’s relationship with data protection law: key takeaways

James Clark, Muhammed Demircan and Kalyna Kettas — Thu, 25 Apr 2024 12:54:20 +0000

Disclaimer: The blogpost below is based on a previously published Thomson Reuters Practical Law practice note (EU AI Act: data protection aspects (EU)) and only presents a short overview of and key takeaways from this practice note. This blogpost has been produced with the permission of Thomson Reuters, who has the copyright over the full version of the practice note. Interested readers may access the full version practice note through this link (paywall).

On 13 March 2024, the European Parliament plenary session formally adopted at first reading the EU AI Act. The EU AI Act is now expected to be formally adopted in a few weeks’ time. Following publication in the Official Journal of the European Union, it will enter into force 20 days later.

Artificial intelligence (“AI”) systems rely on data inputs from initial development, through the training phase, and in live use. Given the broad definition of personal data under European data protection laws, AI systems’ development and use will frequently result in the processing of personal data.

At its heart, the EU AI Act is a product safety law that provides for the safe technical development and use of AI systems. With a couple of exceptions, it does not create any rights for individuals. By contrast, the GDPR is a fundamental rights law that gives individuals a wide range of rights in relation to the processing of their data. As such, the EU AI Act and the GDPR are designed to work hand-in-glove, with the latter ‘filing the gap’ in terms of individual rights for scenarios where AI systems use data relating to living persons.

Consequently, as AI becomes a regulated technology through the EU AI Act, practitioners and organisations must understand the close relationship between the EU data protection law and the EU AI Act.

1. EU data protection law and AI systems

1.1 The GDPR and AI systems

The General Data Protection Regulation (“GDPR”) is a technology-neutral regulation. As the definition of “processing” under the GDPR is broad (and in practice includes nearly all activities conducted on personal data, including data storage), it is evident that the GDPR applies to AI systems, to the extent that personal data is present somewhere in the lifecycle of an AI system.
It is often technically very difficult to separate personal data from non-personal data, which increases the likelihood that AI systems process personal data at some point within their lifecycle.
While AI is not explicitly mentioned in the GDPR, the automated decision-making framework (article 22 GDPR) serves as a form of indirect control over the use of AI systems, on the basis that AI systems are frequently used to take automated decisions that impact individuals.
In some respects, there is tension between the GDPR and AI. AI typically entails the collection of vast amounts of data (in particular, in the training phase), while many AI systems have a broad potential range of applications (reflecting the imitation of human-like intelligence), making the clear definition of “processing purposes” difficult.
At the same time, there is a clear overlap between many of the data protection principles and the principles and requirements established by the EU AI Act for the safe development and use of AI systems. The relationship between AI and data protection is expressly recognised in the text of the EU AI Act, which states that it is without prejudice to the GDPR. In developing the EU AI Act, the European Commission relied in part on article 16 of the Treaty on the Functioning of the European Union (“TFEU”), which mandates the EU to lay down the rules relating to the protection of individuals regarding the processing of personal data.

1.2 Data protection authorities’ enforcement against AI systems

Before the EU AI Act, the EU data protection authorities (“DPA”) were among the first regulatory bodies to take enforcement action against the use of AI systems. These enforcement actions have been based on a range of concerns, in particular, lack of legal basis to process personal data or special categories of personal data, lack of transparency, automated decision-making abuses, failure to fulfil data subject rights and data accuracy issues.
Examples of DPA enforcement actions are already lengthy. The most notable ones include the Italian DPA’s temporary ban decision on OpenAI’s ChatGPT, the Italian DPA’s Deliveroo fine in relation to the company’s AI-enabled automated rating of rider performance, the French DPA’s Clearview AI fine, a facial recognition platform that scrapes billions of photographs from the internet and the Dutch DPA’s fine on the Dutch Tax and Customs Administration for various GDPR infringements in relation to an AI-based fraud notification facility application.
As the DPAs shape their enforcement policies based in part on public concerns, and as public awareness of and interest in AI continues to rise, it is likely that DPAs will continue to sharpen their focus on AI (also see section 6 for DPAs as a potential enforcer of the EU AI Act).

2. Scope and applicability of the GDPR and EU AI Act

2.1 Scope of the GDPR and the EU AI Act

The material scope of the GDPR is the processing of personal data by wholly or partly automated means, or manual processing of personal data where that data forms part of a relevant filing system (article 2 GDPR). The territorial scope of the GDPR is defined in article 3 GDPR and covers different scenarios.
Consequently, the GDPR has an extraterritorial scope, meaning that: Controllers and processors established in the EU processing in the context of that establishment must comply with the GDPR even if the processing of personal data occurs in a third country. Non-EU controllers and processors have to comply with the GDPR if they target or monitor individuals in the EU.
On the other hand, the material scope of the EU AI Act is based around its definition of an AI system. Territorially, the EU AI Act applies to providers, deployers, importers, distributors, and authorised representatives (see, section 2.2 for details).
Unlike the GDPR, the EU AI Act has a robust risk categorisation, and it brings different obligations to the different AI risk categories. Most obligations under the EU AI Act apply to high-risk AI systems only (covered in article 6 and Annex III EU AI Act). Various AI systems are also subject to specific obligations (such as general-purpose AI models) and transparency obligations (such as emotional categorisation systems).

2.2 Interplay between roles under the GDPR and the EU AI Act

As the GDPR distinguishes between controllers and processors, so the EU AI Act distinguishes between different categories of regulated operators.
The provider (the operator who develops an AI system or has an AI system developed) and the deployer (the operator under whose authority an AI system is used) are the most significant in practice.
Organisations that process personal data in the course of developing or using an AI system will need to consider the roles they play under both the GDPR and the EU AI Act. Some examples follow.

Example 1: provider (the EU AI Act) and controller (the GDPR)

Example 2: deployer (EU AI Act) and controller (the GDPR)

A company (A) that processes personal data in the context of training a new AI system will be acting as both a provider under the EU AI Act and as a controller under the GDPR. This is because the company is developing a new AI system and, as part of that development, is taking decisions about how to process personal data for the purpose of training the AI system.

A company (B) that purchases the AI system described in Example 1: provider (EU AI Act) and controller (the GDPR) from company A and uses it in a way that involves the processing of personal data (for example, as a chatbot to talk to customers, or as an automated recruitment tool) will be acting as both a deployer under the EU AI Act and as a separate controller under the GDPR for the processing of its own personal data (that is, it is not the controller for the personal data used to originally train the AI system but it is for any data it uses in conjunction with the AI).

More complex scenarios may arise when companies offer services that involve the processing of personal data and the use of an AI system to process that data. Depending on the facts, the customers of such services may qualify as controllers or processors (under the GDPR) although they would typically be deployers under the EU AI Act.
These examples raise important questions about the relationship between the nature of roles under the EU AI Act and their relationship to roles under the GDPR which are still to be resolved in practice. Companies that develop or deploy AI systems should carefully analyse their roles under the respective laws, preferably prior to the kick-off of relevant development and deployment projects.

3. Relationship between the GDPR principles and the EU AI Act

The GDPR is built around the data protection principles set out in article 5 GDPR. These principles are lawfulness, fairness, transparency, purpose limitation, data minimisation, accuracy, storage limitation, integrity and confidentiality.
On the other hand, the first intergovernmental standard on AI, the recommendation on artificial intelligence issued by the OECD (OECD Recommendation of the Council on Artificial Intelligence, “OECD AI Principles”) introduces five complementary principles for responsible stewardship of trustworthy AI that have strong links to the principles in the GDPR: Inclusive growth, sustainability and well-being, human centred-values, fairness, transparency, explainability, robustness, security, safety and accountability.
The EU AI Act also refers to general principles applicable to all AI systems, as well as specific obligations that require the principles to be put in place in certain methods. The EU AI Act principles are set out in recital 27 and are influenced by the OECD AI Principles and the seven ethical principles for AI developed by the independent High-Level Expert Group on AI (HLEG). Although recitals do not have the same legally binding status as the operative provisions which follow hem and cannot overrule an operative provision, they can help with interpretation and to determine meaning.
Recital 27 EU AI Act refers to the following principles: Human agency and oversight, technical robustness and safety, privacy and data governance, transparency, diversity, non-discrimination, fairness, social and environmental wellbeing. Some of these principles already materialise through specific EU AI Act obligations: Article 10 EU AI Act prescribes data governance practices for high-risk AI systems, article 13 EU AI Act deals with transparency, articles 14 and 26 EU AI Act introduce human oversight and monitoring requirements, article 27 EU AI Act introduces the obligation to conduct fundamental rights impact assessments for some high-risk AI systems.
Understanding the synergies and differences between the GDPR principles and the EU AI Act principles will allow organisations to leverage their existing knowledge of GDPR and their existing GDPR compliance programmes. This is therefore a crucial step to lower compliance costs. The full practice note includes comprehensive tables that compare the practicalities in this regard.

4. Human oversight under the EU AI Act and automated decision-making under the GDPR

Under article 22 GDPR, data subjects have the right not to be subject to solely automated decisions involving the processing of personal data that result in legal or similarly significant effects. Where such decisions are taken, they must be based on one of the grounds set out in article 22(2) GDPR.
Like the GDPR, the EU AI Act is also concerned with ensuring that fundamental rights and freedoms are protected by allowing for appropriate human supervision and intervention (the so called “human-in-the-loop” effect).
Article 14 EU AI Act requires high-risk AI system to be designed and developed in such a way (including with appropriate human-machine interface tools) that they can be effectively overseen by natural persons during the period in which the AI system is in use. In other words, providers must take a “human-oversight-by-design” approach to developing AI systems.
According to article 26.1 EU AI Act, the deployer of an AI system must take appropriate technical and organisational measures to ensure its use of an AI system is in accordance with the instructions of use accompanying the system, including with respect to human oversight.
The level of human oversight and intervention exercised by a user of an AI system may be determinative in bringing the system in or out of scope of the automated decision-making framework under the GDPR. In other words, a meaningful intervention by a human being at a key stage of the AI system’s decision-making process may be sufficient to ensure that the decision is no longer wholly automated for the purposes of article 22 GDPR. Perhaps more likely, AI systems will be used to make wholly automated decisions, but effective human oversight will operate as a safeguard to ensure that the automated decision-making process is fair and that an individual’s rights, including their data protection rights, are upheld.

5. Conformity assessments and fundamental rights impact assessments under the EU AI Act and the DPIAs under the GDPR

Under the EU AI Act, the conformity assessment is designed to ensure accountability by the provider with each of the EU AI Act’s requirements for the safe development of a high-risk AI system (as set out in Title III, Chapter 2 EU AI Act). Conformity assessments are not risk assessments but rather demonstrative tools that show compliance with the EU AI Act’s requirements.
The DPIA, on the other hand, is a mandatory step required under the GDPR for high-risk personal data processing activities.
Consequently, there are significant differences in terms of both purpose and form between a conformity assessment and a DPIA. However, in the context of high-risk AI systems, the provider of such systems may also need to conduct a DPIA relation to the use of personal data in the development and training of the system. In such case, the technical documentation that are drafted for conformity assessments may help establishing the factual context of a DPIA. Similarly, the technical information may be helpful to a deployer of the AI system that is required to conduct a DPIA in relation to its use of the system.
The requirement under the EU AI Act to conduct a fundamental rights impact assessment (“FRIA”) is similar, conceptually, to a DPIA. As with a DPIA, the purpose of a FRIA is to identify and mitigate risks to the fundamental rights of natural persons, in this case arising from the deployment of an AI system. For more details regarding the FRIA, see Fundamental Rights Impact Assessments under the EU AI Act: Who, what and how?.
Practically speaking, organisations generally already have governance mechanisms in place to bring legal, IT and business professionals together for impact assessments such as the DPIA. When it comes to a FRIA, such mechanisms can be leveraged. As with a DPIA, the first step is likely to consist of a pre-FRIA screening to identify the use of an in-scope high-risk AI system (recognising that, as a good practice step, organisations may choose to conduct FRIAs for a wider range of AI systems than is strictly required by the EU AI Act).

6. National competent authorities under EU AI Act and DPAs

Under the EU AI Act, each member state is required to designate one or more national competent authorities to supervise the application and implementation of the EU AI Act, as well as to carry out market surveillance activities.
The national competent authorities will be supported by the European Artificial Intelligence Board and the European AI Office. The most notable duty of the European AI Office is to enforce and supervise the new rules for general purpose AI models.
The appointment of the DPAs as enforcers of the EU AI Act will solidify the close relationship between the EU GDPR and the EU AI Act.

SINGAPORE: Proposed Guidelines on Use of Personal Data in AI Systems

Juliet McNulty — Wed, 02 Aug 2023 14:46:59 +0000

Authors: Carolyn Bigg, Lauren Hurcombe and Yue Lin Lee.

On 18 July 2023, Singapore’s Personal Data Protection Commission (“PDPC”) issued for public consultation a set of proposed guidelines for the use of personal data in AI recommendation and decision systems (“Proposed Guidelines”). The public consultation is open until 31 August 2023.

The Proposed Guidelines aim to clarify the application of the Singapore Personal Data Protection Act (“PDPA”) in the context of developing and deploying AI systems involving the use of personal data for making recommendations or predictions for human decision-makers or autonomous decision-making.

Key takeaways for businesses:

Exceptions to consent may apply: Under the PDPA, businesses are required to obtain consent for the collection and use of personal data unless deemed consent or an exception applies. The Proposed Guidelines clarify that the Business Improvement Exception may be applied by the organisation when it is either developing a new product, enhancing an existing one, or using an AI system to boost operational efficiency and offer personalised services. This also extends to data sharing within company groups for these purposes. Relevant applications include social media recommendation engines and AI systems enhancing product competitiveness.

In addition, the Research Exception may also be considered by the organisation when it conducts commercial research to advance science and engineering without a product development plan. This includes collaborative research with other companies. For the Research Exception to apply, several conditions must be met, including that data in individually identifiable form is essential for the research and there also needs to be a clear public benefit. However, it may be difficult for organisations to rely on this exception given there is traditionally a high threshold for a public benefit to accrue.

Consent and Notification Obligations continue to apply: If relying on consent instead of an exception under the PDPA, organisations should craft consent language that enables individuals to give meaningful consent. The Proposed Guidelines highlight that the consent need not be overly technical or detailed, but should be proportionate having regard to the potential harm to the individual and the level of autonomy of the AI system. For example, a social media platform providing personalised content recommendations should explain why specific content is shown and the factors affecting the ranking of posts (e.g., past user interactions or group memberships).
Navigating B2B AI deployments: Where businesses engage professional service providers to provide bespoke or fully customisable AI systems, such service providers may be acting as data intermediaries / data processors and are subject to obligations under the PDPA in relation to the protection and retention of personal data. To support businesses in meeting their consent, notification and accountability obligations, service providers should adopt practices such as pre-processing stage data mapping and labelling and maintaining training data records. Service providers should familiarise themselves with the information needed to meet their customer’s PDPA obligations and design systems to facilitate information extraction relevant to these obligations. In addition, organisations should undertake a data protection impact assessment when deploying, using or designing AI systems.

Our observations

The Proposed Guidelines build on the PDPC’s existing Model AI Governance Framework (PDPC | Singapore’s Approach to AI Governance) (first released in 2019 and updated in 2020), and are in line with Singapore’s pro-innovation, business-friendly approach in developing AI in a lawful but pragmatic way.

In recent months, the APAC region has seen a trend of businesses harnessing data to develop and deploy AI systems, fueled by pro-innovation and pro-collaboration regulations across the region, such as the new generative AI measures in China. While countries in the region are considering their unique approach in AI regulation, a common thread is the recognition of the pivotal role that data plays in powering AI solutions.

The Draft Advisory Guidelines on use of Personal Data in AI Recommendation and Decision Systems may be accessed here.

To find out more on AI and AI laws and regulations, visit DLA Piper’s Focus on Artificial Intelligence page and Technology’s Legal Edge blog.

Please contact Carolyn Bigg (Partner), Lauren Hurcombe (Partner) or Yue Lin Lee (Senior Associate) if you have any questions or to see what this means for your organisation.

A Pro-Innovation Approach: UK Government publishes white paper on the future of governance and regulation of artificial intelligence

Magda Zmorka — Fri, 31 Mar 2023 10:02:29 +0000

Authors: James Clark, Coran Darling, Andrew Dyson, Gareth Stokes, Imran Syed & Rachel de Souza

In November 2021, the UK Government (“Government”) issued the National Artificial Intelligence (AI) Strategy, with the ambition of making the UK a global AI superpower over the next decade. The strategy promised a thriving ecosystem, supported by Government policy that would look at establishing an effective regulatory framework; a new governmental department focussed on AI and other innovative technologies; and collaboration with national regulators.

On 29 March 2023 the Government published the long-awaited white paper (“Paper”) setting out how the UK anticipates it will achieve the first, and most important, of these goals – the creation of a blueprint for future governance and regulation of AI in the UK. The Paper is open for consultation until 21 June 2023.

The Paper, headed “A pro-innovation approach”, recognises the importance of building a framework that engenders trust and confidence in responsible use of AI (noting the key risks to health, security, privacy, and more, that can arise through an unregulated approach), but cautions against ‘overbearing’ regulation which may adversely impact innovation and investment.

This theme runs throughout the Paper and expands into recommendations that support a relatively light touch, and arguably a more organic regulatory approach, than we have seen in other jurisdictions. This is most notably the case when compared to the approach of the EU, where the focus has been on development of a harmonizing AI-specific law and supporting AI-specific regulatory regime.

The Paper contends that effective AI regulation can be constructed without the need for new cross-sectoral legislation. Instead, the UK is aiming to establish “a deliberately agile and iterative approach” that avoids the risk of “rigid and onerous legislative requirements on businesses”. This ambition should be largely achieved by co-opting regulators in regulated sectors to effectively take direct responsibility for the establishment, promotion, and oversight of responsible AI in their respective regulated domains. This would then be supported by the development of non-binding assurance schemes and technical standards.

Core Principles

This approach may be different in execution from the proposals we are seeing come out of Europe with the AI Act. If we look beneath the surface, however, we find the Paper committing the UK to core principles for responsible AI which are consistent across both regimes:

Safety, security, and robustness: AI should function in a secure, safe, and robust manner, where risks can be suitably monitored and mitigated;
Appropriate transparency and explainability: organisations developing and deploying AI should be able to communicate the method in which it is used and be able to adequately explain an AI system’s decision-making process;
Fairness: AI should be used in ways that comply with existing regulation and must not discriminate against individuals or create unfair commercial outcomes;
Accountability and governance: appropriate measures should be taken to ensure there is appropriate oversight of AI systems and there are adequate measures to follow accountability; and
Contestability and redress: there must be clear routes to dispute harmful outcomes or decisions generated by AI.

The Government intends to use the principles as a universal guardrail to guide the development and use of AI by companies in the UK. This approach that aligns with international thinking that can be traced back to the OECD AI Principles (2019), the Council of Europe’s 2021 paper on a legal framework for artificial intelligence, and recent Blueprint for an AI Bill of Rights proposed by the White House’s Office of Science and Technology Policy.

Regulator Led Approach

The UK does not intend to codify these core principles into law, at least for the time being. Rather, the UK intends to lean on the supervisory and enforcement powers of existing regulatory bodies, charging them with ensuring that the core principles are followed by organisations for whom they have regulatory responsibility.

Regulatory bodies, rather than lawmakers or any ‘super-regulator’, will therefore be left to determine how best to promote compliance in practice. This means, for example, that the FCA will be left to regulate AI across financial services; the MHRA to consider what is appropriate in the field of medicines and medical devices; and the SRA for legal service professionals. This approach is already beginning to play out in some areas. For example, in October 2022, the Bank of England and FCA jointly released a Discussion Paper on Artificial Intelligence and Machine Learning (DP5/22), which is intended to progress the debate on how regulation and policy should play a role in use of AI in financial services.

To enable this to work, the Paper contemplates a new statutory duty on regulators which requires them to have due regard to the principles in the performance of their tasks. Many of these duties already exist in other areas, such as the so-called ‘growth duty’ that came into effect in 2017 which requires regulators to have regard to the desirability of promoting economic growth. Regulators will be required by law to ensure that their guidance, supervision, and enforcement of existing sectoral laws takes account of the core principles for responsible AI. Precisely what that means in practice remains to be seen.

Coordination Layer

The Paper recognises that there are risks with a de-centralised framework. For example, regulators may establish conflicting requirements, or fail to address risks that fall between gaps.

To address this, the Paper announces the Government’s intention to create a ‘coordination layer’ that will cut across sectors of the economy and allow for central coordination on key issues of AI regulation. The coordination layer will consist of several support functions, provided from within Government, including:

assessment of the effectiveness of the de-centralised regulatory framework – including a commitment to remain responsive and adapt the framework if necessary;
central monitoring of AI risks arising in the UK;
public education and awareness-raising around AI; and
testbeds and sandbox initiatives for the development of new AI-based technologies.

The Paper also recognises the likely importance of technical standards as a way of providing consistent, cross-sectoral assurance that AI has been developed responsibly and safely. To this end, the Government will continue to invest in the AI Standards Hub, formed in 2022, whose role is to lead the UK’s contribution to the development of international standards for the development of AI systems.

This standards-based approach may prove particularly useful for those deploying AI in multiple jurisdictions and has already been recognised within the EU AIA, which anticipates compliance being established by reference to common technical standards published by recognised standards bodies. It seems likely that over time this route (use of commonly recognised technical standards) will become the de facto default route to securing practical compliance to the emerging regulatory regimes. This would certainly help address the concerns many will have about the challenge of meeting competing regulatory regimes across national boundaries.

International comparisons

EU Artificial Intelligence Act

The proposed UK framework will inevitably attract comparisons with the different approach taken by the EU AIA. Where the UK intends to take a sector-by-sector approach to regulating AI, the EU has opted for a horizontal cross-sector regulation-led approach. Further, the EU clearly intends exactly the same single set of rules to apply EU-wide. The EU AIA is framed as a directly-effective Regulation whereby the EU AIA applies directly as law across the bloc, rather than the ‘EU Directive’ method, which would require Member States to develop domestic legislation to comply with the adopted framework.

The EU and UK approaches each have potential benefits. The EU’s single horizontal approach of regulation across the bloc ensures that organisations engaging in regulated AI activities will, for the most part, only be required to understand and comply with the AI Act’s single framework and apply a common standard based on the use to which AI is being put, regardless of sector.

The UK’s approach provides a less certain legislative framework, as companies may find that they are regulated differently in different sectors. While this should be mitigated through the ‘coordination layer’, it will likely lead to questions about exactly what rules apply when, and the risk of conflicting areas of regulatory guidance. This additional complexity will no doubt be a potential detractor for the UK, but if adopted effectively the benefits of having a regime that is agile to evolving needs and technologies, could trump the EU with its more codified approach. In theory, it should be much easier for the UK to implement changes via regulatory standards, guidance, or findings than it would be for the EU to push amendments through a relatively static legislative process.

US Approach

There are clear parallels between the UK and the likely direction of travel in the US, where a sector-by-sector approach to the regulation of AI is the preferred choice. In October 2022, the White House Office of Science and Technology Policy published a Blueprint for an AI Bill of Rights (“Blueprint”). Much like the Paper, the Blueprint sets out an initial framework for how US authorities, technology companies, and the public can work to ensure AI is implemented in a safe and accountable manner. The US anticipate setting out principles that will be used to help guide organisations to manage and (self-) regulate the use of AI, but without the level of directional control that the UK anticipate passing down to sector specific regulators. Essentially the US position will be to avoid direct intervention into state or federal level regulations which will be left to others to decide. It remains to be seen how the concepts framed in the Blueprint might eventually translate into powers for US regulators.

A Push for Global Interoperability

While the Government seeks to capitalise upon the UK’s strategic position as third in the world for number of domestic AI companies, it also recognises the importance of collaboration with international partners. Focus moving forward will therefore be directed to supporting global opportunities while protecting the public against cross-border risks. The Government intends to promote interoperability between the UK approach and differing standards and approaches across jurisdictions. This will ensure that the UK’s regulatory framework encourages the development of a compatible system of global AI governance that will allow organisations to pursue ventures across jurisdictions, rather than being isolated by jurisdiction-specific regulations. The approach is expected to leverage existing proven and agreed upon assurance techniques and international standards play a key role in the wider regulatory ecosystem. Doing so is therefore expected to support cross-border trade by setting out internationally accepted ‘best practices’ that can be recognised by external trading partners and regulators.

Next steps

The Government acknowledges that AI continues to develop at pace, and new risk and opportunities continue to emerge. To continue to strengthen the UK’s position as a leader in AI, the Government is already working in collaboration with regulators to implement the Paper’s principles and framework. It anticipates that it will continue to scale up these activities at speed in the coming months.

In addition to allowing for responses to their consultation (until 21 June 2023), the Government has staggered its next steps into three phases: i) within the first 6 months from publication of the Paper; ii) 6 to 12 months from publication; and iii) beyond 12 months from publication.

Find out more

You can find out more on AI and the law and stay up to date on the UK’s push towards regulating AI at Technology’s Legal Edge, DLA Piper’s tech-sector blog.

For more information on AI and the emerging legal and regulatory standards, visit DLA Piper’s focus page on AI.

You can find a more detailed guide on the AI Act and what’s in store for AI in Europe in DLA Piper’s AI Regulation Handbook.

To assess your organisation’s maturity on its AI journey (and check where you stand against sector peers) you can use DLA Piper’s AI Scorebox tool.

DLA Piper continues to monitor updates and developments of AI and its impacts on industry across the world. For further information or if you have any questions, please contact the authors or your usual DLA Piper contact.

Keeping an ‘AI’ on your data: UK data regulator recommends lawful methods of using personal information and artificial intelligence

Magda Zmorka — Tue, 08 Nov 2022 12:25:21 +0000

Authors: Jules Toynton, Coran Darling

Data is often the fuel that powers AI used by organisations. It tailors search parameters, spots behavioural trends, and predicts future possible outcomes (to highlight a just a few uses). In response, many of these organisations seek to accumulate and use as much data as possible, in order to make their systems work that little bit faster or more accurately.

In many cases, providing the data is not subject to copyright or other such restrictions, this is without many issues – organisations are able to amass large quantities of data that can be used initially to train their AI systems, or, after deployment, continue to update their datasets to ensure the latest and most accurate data is used.

Where this becomes a potential issue, is when the data being collected and used is personal information. For example, the principle of ‘data minimisation’ requires that only the necessary amount and type of personal data is used to develop an AI system. This is at odds with the ‘data hoarding’ corporate mentality described above, which seeks to know as much detail as possible. Furthermore, the principle of ‘purpose limitation’ places several restrictions on the re-use of historic data sets to train AI systems. This may cause particular headaches when working with an AI vendor that wishes to further commercialise the AI which has benefited from the learnings and developments of your data in a way that is beyond the purpose for which the data was originally provided.

It is however acknowledged by the Information Commissioner’s Office (“ICO”), the UK’s data regulator, that AI and personal data will forever be interlinked – unavoidably so in certain situations. In response, in November 2022, the ICO released a set of guidance on how organisations can use AI and personal data appropriately and lawfully, in accordance with the data privacy regime of the UK. The guidance is also supplemented by a number of frequently raised concerns when combining AI with personal data, including: should I carry out an impact assessment, do outputs need to comply with the principle of accuracy, and do organisations need permission to analyse personal data.

In this article we discuss some of the key recommendations in the context of the wider regulatory landscape for data and AI.

Key Recommendations:

The guide offers eight methods organisations can use to improve their handling of AI and personal information.

Take a risk-based approach when developing and deploying AI:

A first port of call for organisations should be an assessment of whether AI is needed for what is sought to be deployed. Most AI will typically fall within the remit of ‘high-risk’ if it engages with personal information for the purposes of the proposed EU AI Regulation (“AI Act”) (and likely a similar category within the developing UK framework). This will result in additional obligations and measures that will be required to be followed by the organisation in its deployment of the AI. A less technical and more privacy preserving alternative is therefore recommended by the ICO where possible.

Should AI be chosen after this, a data privacy impact assessment should be carried out to identify and minimise data risks that the AI poses to data subjects, as well as mitigating the harm it may cause. At this stage the ICO also recommends consulting different groups who may be impacted using AI in this context to better understand the potential risks.

Consider how decisions can be explained to the individuals affected:

As the ICO notes, it can be difficult to explain how AI arrives at certain decisions and outputs, particularly in the case of machine learning and complex algorithms where input values and trends change based on the AI’s ability to learn and teach itself based on the data it is fed.

Where possible, the ICO recommends that organisations:

be clear and open with subjects on how and why personal data is being used;
consider what explanation is needed in the context that the AI will be deployed;
assess what explanations are likely to be expected;
assess the potential impact of AI decisions to understand the detail required in explanations; and
consider how individual rights requests will be handled.

The ICO have acknowledged that this is a difficult area of data privacy and has provided detailed guidance, co-badged with the Alan Turing Institute, on “Explaining decisions made with AI”.

Limit data collection to only what is needed:

Contrary to several held beliefs by organisations, the ICO recommend that data is kept to a minimum where possible. This does not mean that data cannot be collected, but rather appropriate consideration must be given to the data that is collected and retained.

Organisations should therefore:

ensure that the personal data you use is accurate, adequate, relevant and limited, based on the context of the use of the AI; and
consider which techniques can be used to preserve privacy as much as practical. For example, as the ICO notes, synthetic data or federated learning could be used to minimise the personal data being processed.

It should be noted that data protection’s accuracy principle does not mean that an AI system needs to be 100% statistically accurate (which is unlikely to be practically achievable). Instead organisations should factor in the possibility of inferences/decisions being incorrect, and ensure that there are processes in place to ensure fairness and overall accuracy of outcome.

Address risks of bias and discrimination at an early stage:

A persistent concern throughout many applications of AI, particularly those interacting with sensitive data, is bias and discrimination. This is made worse in instances where too much of one trend of data is used, as the biases present in such data will form part of the essential decision-making process of the AI, thereby ‘hardwiring’ bias into the system. All steps should therefore be taken to (to the extent that it reflects the wider trend accurately) get as much variety within data used to train AI systems as possible.

To greater understand this issue, the ICO recommends that organisations:

assess whether the data gathered is accurate, representative, reliable, relevant, and up-to-date with the population or different sets of people with which the AI will be applied; and
map out consequences of the decisions made by the AI system for different groups and assess whether these are acceptable from a data privacy regulatory standpoint as well as internally.

Where AI does produce biased or discriminatory decisions, this is likely to conflict with the requirement for processing of personal data to be fair, as well as obligations of several other more specific regulatory frameworks. A prime example of this is the Equality Act, which ensures that discrimination on the grounds of protected characteristics, by AI or otherwise, is prohibited. Care should be taken by organisations to ensure that decisions are made in such a way that prevents repercussions from the wider data privacy and AI regimes, as well as those specific to the sectors and activities in which they are involved.

Dedicate time and resources to preparing data:

As noted above, the quality of an AI’s output is only going to be as good as the data it is fed and trained with. Organisations should therefore ensure sufficient resources are dedicated to preparing the data to be used.

As part of this process, organisations should expect to:

create clear criteria and lines of accountability about the labelling of data involving protected characteristics and/or special category data;
consult members of protected groups where applicable to define the labelling criteria; and
involve multiple human labellers to ensure consistency of categorisation and delineation and to assist with fringe cases.

Ensure AI systems are made and kept secure:

It should be of little surprise that the addition of new technologies can create new security risks (or exacerbate current ones). In the context of the AI Act and UK data privacy regulation (and indeed when a more established UK AI regime emerges), organisations are/will be legally required to implement appropriate technical and organisational measures to ensure suitable security protocols are in place for the risk associated with the information.

In order to do this, organisations could:

complete security risk assessments to create a baseline understanding of where risks are present;
complete regular model debugging on a regular basis; and
proactively monitor the system and investigate any anomalies (in some cases, the AI Act and any future UK AI framework may require human oversight as an additional protective measure regardless of the data privacy requirement).

Human review of AI outcomes should be meaningful:

Depending on the purpose of the AI, it should be established early on whether the outputs are being used to support a human decision-maker or whether decisions are solely autonomous. As the ICO highlights, data subjects deserve to know whether decisions with their data have been made purely autonomously, or with the assistance of AI. In instances where they are being used to assist a human, the ICO recommends that they are reviewed in a meaningful way.

This would therefore require that reviewers are:

adequately trained to interpret and challenge outputs made by AI systems;
sufficiently senior to have the authority to override automated decisions; and
accounting for other additional factors that weren’t included as part of the initial input data.

Data subjects have the right under the UK GDPR not to be subject to a solely automated decision, where that decision has a legal or similarly significant effect, and also have the right to receive meaningful information about the logic involved in the decision. Therefore, although worded as a recommendation, where AI is making significant decisions, meaningful human review becomes a requirement (or at least must be available on request).

Work with external suppliers involved to ensure that AI is used appropriately:

A final recommendation offered by the ICO is that where AI is procured from a third party, it is done so with their involvement. While it is usually the organisation’s responsibility (as controller) to comply with all regulations, this can be achieved more effectively with the involvement of those who create and supply the technology.

In order to comply with the obligations of both the AI Act and relevant data privacy regulations, organisations would therefore be expected to:

choose a supplier by carrying out the appropriate due diligence ahead of procurements;
work with the supplier to carry out assessments prior to deployment, such as impact assessments;
agree and document roles and responsibilities with the external supplier, such as who will answer individual rights requests;
request documentation from the external supplier that demonstrates they implemented a privacy by design approach; and
consider any international transfers of personal data.

When working with some AI providers, for example, with larger providers who may develop AI for a large range of applications as well as offer services to tailor their AI solutions for particular customers (and to commercialise these learnings), it may not be clear whether they are a processor or controller (or even a joint controller with the client for some processing). Where that company has enough freedom to use its expertise to decide what data to collect and how to apply its analytic techniques, it is likely to be a data controller as well.

Get in touch

For more information on AI and the emerging legal and regulatory standards visit DLA Piper’s focus page on AI.

You can find a more detailed guide on the AI Regulation and what’s in store for AI in Europe in DLA Piper’s AI Regulation Handbook.

To assess your organisation’s maturity on its AI journey in (and check where you stand against sector peers) you can use DLA Piper’s AI Scorebox tool.

You can find more on AI, technology, data privacy, and the law at Technology’s Legal Edge, DLA Piper’s tech-sector blog and Privacy Matters, DLA Piper’s Global Privacy and Data Protection resource.

DLA Piper continues to monitor updates and developments of AI and its impacts on industry in the UK and abroad. For further information or if you have any questions, please contact the authors or your usual DLA Piper contact.

EUROPE: Data protection regulators publish myth-busting guidance on machine learning

Magda Zmorka — Mon, 10 Oct 2022 08:31:58 +0000

Authors: Coran Darling, James Clark

In its proposed AI Regulation (“AI Act”), the EU recognises AI as one of the most important technologies of the 21^st century. It is often forgotten, however, that AI is not one specific type of technology. Instead, it is an umbrella term for a range of technologies capable of imitating certain aspects of human intelligence and decision-making – ranging from basic document processing software through to advanced learning algorithms.

One branch of the AI family is machine learning (“ML”), which uses models trained by datasets to resolve an array of complicated problems. The specific form and function of an ML system depends on the tasks it is intended to complete. For example, the ML system could be used to determine likely trends of categories of persons to default on loan agreements through the processing of financial default information. During the development and training of their algorithms, ML systems begin to adapt and recognise patterns within its data. They can then use this training to interpret new data and form outputs based on the intended process.

The use of ML systems gives rise to several questions which lawyers and compliance professionals may be uncertain about answering. For example: How is the data interpreted? How can an outcome be verified as accurate? Does the use of large datasets remove any chance of bias in my decision making?

In attempt to resolve some of these issues, the Agencia Española de Protección de Datos (Spain’s data regulator) (“AEPD”) and the European Data Protection Supervisor (“EDPS”) have jointly published a report addressing a number of notable misunderstandings about ML systems. The joint report forms part of a growing trend among European data protection regulators to openly grapple with AI issues, recognising the inextricable link between AI systems – which are inherently creatures of data – and data protection law.

In this article we elaborate on some of these clarifications in the context of a world of developing AI and data privacy frameworks.

Causality requires more than finding correlations

As a brief reminder to ourselves, causality is “the relationship that exists between cause and effect” whereas correlation is “the relationship between two factors that occur or evolve with some synchronisation”.

ML systems are very good at distinguishing correlations with datasets but typically lack the ability to accurately infer a causal relationship between data and outcomes. The example given in the report is that, given certain data, a system could reach the conclusion that tall people are smarter than shorter people simply by finding a correlation between height and IQ scores. As we are all aware, correlation does not imply causation (despite the spurious inferences from data that can be made, for example from the correlation between the per capita consumption of mozzarella cheese and the number of civil engineering doctorates awarded).

It is therefore necessary to ensure data is suitably vetted throughout the initial training period, and at the point of output, to ensure that the learning process within the ML system has not resulted in it attributing certain outcomes with correlating, but non-causal, information. Having some form of human supervision to determine when certain variables are being overweighted within the decision process may assist in doing so and allow intervention when bias is detected at an early stage in processing.

Training datasets must meet accuracy and representativeness thresholds

Contrary to belief, a greater variety in data does not necessarily mean that it is a better dataset or better able to mitigate bias. It is instead better to have a focused dataset that accurately reflects the trend being investigated. For example, having data on all types of currency in relation to their conversion to dollars is not helpful when seeking to find patterns and trends in the fluctuation in conversion between dollars and pounds sterling.

Furthermore, the addition of too much of certain data may lead to inaccuracies and bias in outcomes. For example, as noted in the report, the use of light-skinned male images in a dataset used to train facial recognition software will be largely unhelpful in correcting any existing biases for ethnicity or gender within the system.

The General Data Protection Regulation (“GDPR”) requires that processing of personal data be proportionate to its purpose. Care should therefore be taken when seeking to increase the amount of data in a dataset. Substantial increases in data used to produce a minimal correction in a training dataset, for example, may not be deemed proportionate and lead to breach of the requirements of the Regulation.

Well-performing machine learning systems require datasets above a certain quality threshold

It is equally not necessary that training datasets be completely error-free. Often, this is not possible or commercially feasible. Instead, datasets should be held to a certain quality that allows for a comprehensive and sufficiently accurate description of data. Providing that the average result is accurate to the overall trend, ML systems are typically able to deal with a low level of inaccuracy.

As the report notes, some models are even created and trained using synthetic data (artificially generated datasets – described in greater detail in our earlier article on the subject) that replicate the outcome of real data. In some cases, synthetic data may even be aggregated data from actual datasets which retains the benefit of accurate data trends while removing many compliance issues associated with personally identifiable information.

This is not to say that organisations should not strive to attain an accurate data set and, in fact, under the AI Act it is a mandatory requirement that a system’s data be accurate and robust. Ensuring accuracy and, where relevant, currency of personal data is also a requirement under the GDPR. However, it is important to remember that ‘accuracy’ in the context of ML need not be an absolute value.

Federated and distributed learning allows the development of systems without sharing training data sets

One approach proposed for the developing of accurate ML systems, in the absence of synthetic data, is the creation of large data-sharing repositories, often held in substantial cloud computing infrastructure. Under another limb of the EU’s digital strategy – the Data Governance Act – the Commission is attempting to promote data sharing frameworks through trusted and certified ‘data intermediation services’. Such services may have a role to play in supporting ML. The report highlights that while this means of centralised learning is an effective way of collating large quantities of data, this method comes with its own challenges.

For example, in instances of personal data, the controller and processor of the data must consider the data in the context of their obligations under the GDPR and other data protection regulations. Requirements regarding purpose limitation, accountability, and international transfers may all therefore become applicable. Furthermore, the collation of sensitive data increases the interest of other parties, particularly those with malevolent intent, in gaining access. Without suitable protections put in place, a centralised dataset with large quantities of data may become a honeypot for hackers and corporate parties seeking to gain an upper hand.

The report offers, as an alternative, the use of distributed on-site and federated learning. Distributed on-site learning involves the data controller downloading a generic or pre-trained model to a local server. The server then uses its own dataset to train and improve the downloaded model. After this is completed, there is no further need for the generic model. By comparison, with federalised learning the controller trains a model with its own data and then sends only its parameters to a central server for aggregation. It should be noted however, that often this is not the most efficient method and may even be a barrier to entry or development for smaller organisations in the ML sector, due to cost and expertise restrictions.

Once deployed, machine learning models performance may deteriorate until further trained

Unlike other technologies, ML models are not plug in and forget systems. The nature of ML is that the system adapts and evolves over time. Consequently, once deployed, an ML system must be consistently tested to ensure it remains capable of solving the problems for which it was created. Once mature, a model may no longer provide accurate results if it does not evolve with its subject matter. For example, a ML model aimed at predicting futures prices of coffee beans will deteriorate if it is not fed new and refreshed data.

The result of this, should the data not be updated for some time, is an inaccurate model that will produce tainted, biased, or completely incorrect judgements and outcomes (a situation known as data drift). This may also occur in instances where the interpretation of the data changes within the algorithm while the general distribution does not (known as concept drift). As the report notes, it is therefore necessary to monitor the ML system to detect any deterioration in the model and act on its decay.

A well-designed machine learning model can produce decisions understandable to relevant stakeholders

Perhaps the fault of popular media, there is a recurring belief that the automatic decisions taken by ML algorithms cannot be explained. While this may be the case for a select few models, a well-designed model will typically produce decisions that can be readily understood by stakeholders.

Some of the factors which are important in terms of explainability are understanding which parameters were considered and their weighting in decision making. The degree of ‘explainability’ demanded from a model is likely to vary based on the data involved and the likelihood of a decision to impact the lives of data subjects (if any). For example, far greater explainability would be expected from a model that deals with credit scoring or employment applications than those tasked with predicting futures markets.

It is possible to provide meaningful transparency to users without harming IP rights

A push towards transparency and explainability has naturally led many to question how to effectively protect trade secrets and IP when everyone can see how their models and ML systems behave. As the report highlights, transparency and the protection of IP are not incompatible, and there are several methods of providing transparency to users without harming proprietary know-how or IP. While users should be provided with sufficient information to know what their data (particularly personal data) is being used for, this does not necessarily mean that specific technical details need disclosed.

The report compares the requirement to the provision of advisory leaflets with medicine. It is necessary to alert users to what may happen when using the medicine (or model/system) without providing an explanation of how this is specifically achieved. In cases of personal information, further explanation may be required to comply with the principles set out in applicable data protection regulation. At a minimum, data processors and controllers should properly inform users and subjects of the impacts of the ML and its decision-making on their daily lives.

Further protections for individuals may be achieved through certification in accordance with international standards, overt limitations of system behaviour, or the use of human moderators with appropriate technical knowledge.

Machine learning systems are subjects to different types of biases

It is often assumed that bias is an inherently human thing. While it is correct to say that a ML system is not in itself biased, the system will perform as it is taught. This means that while ML systems can be free from human bias in many cases, this is entirely subject to its inherent and learned characteristics. Where training or subsequent data is heavily one-sided or too much weight is ascribed to certain data points, the model may interpret this ‘incorrectly’ therefore leading to ‘biased’ results.

The inherent lack of ‘humanity’ in these systems does however have its drawback. As the report notes, ML systems have a limited ability to adapt to soft-contextual changes and unforeseen circumstances, such as changes in market trends due to new legislation or social norms. This point further highlights the need for appropriate human oversight of the functioning of ML systems.

Predictions are only accurate when future events reproduce past trends

‘ML systems are capable of predicting the future’ is perhaps one of the most common misconceptions with the technology. Rather, they can only predict possible future outcomes to the extent that they reflect the trends of previous data. The likelihood that you buy coffee on a Monday morning when you have habitually done so since starting your job indicates that it is certainly likely that you will do so this coming Monday, but it does not guarantee that you will do so, or that an unforeseen event will prevent you from doing so.

Applying this to the context of commerce, a ML system may be able to (with relative accuracy) predict the long-term trend of a particular futures market but cannot guarantee with absolute certainty that market behaviour will follow suit, particularly in the case of black swan events, such as droughts or unexpected political decisions.

To increase the chances of a more accurate outcome, organisations should seek to obtain as large a data set as possible, with as many variables considered as obtainable, while maintaining factual accuracy to the trends of data they utilise. This will allow the ML system to better predict behavioural responses to certain data and therefore produce more accurate prediction outcomes.

A system’s ability to find non-evident correlations in data can end up with the discovery of new data, unknown to the data subject

A simultaneous advantage and risk of ML systems is their capacity to map data points and establish correlations previously unanticipated by the system’s human designers. In short, it is therefore not always possible to anticipate the outcomes they may produce. Consequently, systems may identify trends in data that were not previously sought, such as predispositions to diseases. While this may be beneficial in certain circumstances, such as health, it may equally be unnecessary or inappropriate in other contexts.

Where these ML systems begin processing personal data beyond the scope of their original purpose, considerations of lawfulness, transparency, and purpose limitation under the GDPR will be engaged. Failure to appropriately justify the processing of personal data in this manner without clear purpose may result in breach of the Regulation and the subsequent penalties that accompany it.

Get in touch

For more information on AI and the emerging legal and regulatory standards visit DLA Piper’s focus page on AI.

You can find a more detailed guide on the AI Regulation and what’s in store for AI in Europe in DLA Piper’s AI Regulation Handbook.

To assess your organisation’s maturity on its AI journey in (and check where you stand against sector peers) you can use DLA Piper’s AI Scorebox tool.