| Privacy Matters

EU: EDPB Opinion on AI Provides Important Guidance though Many Questions Remain

James Clark, Heidi Waem, John Magee and Rachel de Souza — Tue, 14 Jan 2025 13:53:05 +0000

A much-anticipated Opinion from the European Data Protection Board (EDPB) on AI models and data protection has not resulted in the clear or definitive guidance that businesses operating in the EU had hoped for. The Opinion emphasises the need for case-by-case assessments to determine GDPR applicability, highlighting the importance of accountability and record-keeping, while also flagging ‘legitimate interests’ as an appropriate legal basis under specific conditions. In rejecting the proposed Hamburg thesis, the EDPB has stated that AI models trained on personal data should be considered anonymous only if personal data cannot be extracted or regurgitated.

Introduction

On 17 December 2024, the EDPB published a much-anticipated Opinion on AI models and data protection. The Opinion includes the EDPB’s view on the following key questions: does the development and use of an AI model involve the processing of personal data; and if so, what is the correct legal basis for that processing?

As is sometimes the case with EDPB Opinions, which necessarily represent the consensus view of the supervisory authorities of 27 different Member States, the Opinion does not provide many clear or definitive answers. Instead, the EDPB offers indicative guidance and criteria, calling for case-by-case assessments of AI models to understand whether, and how, they are impacted by the GDPR. In this context, the Opinion repeatedly highlights the importance of accountability and record-keeping by businesses developing or using AI, so that the applicability of data protection laws, and the business’ compliance with those laws, can be properly assessed.

Whilst the equivocation of the Opinion might be viewed as unhelpful by European businesses looking for regulatory certainty, it is also a reflection of the complexities inherent in this intersection of law and technology.

In summary, the answers given by the EDPB to the four questions in the Opinion are as follows:

Can an AI model, which has been trained using personal data, be considered anonymous? Yes, but only in some cases. It must be impossible, using all means reasonably likely to be used, to obtain personal data from the model, either through attacks which aim to extract the original training data from the model itself, or through interactions with the AI model (i.e., personal data provided in responses to prompts / queries).
Is ‘legitimate interests’ an appropriate legal basis for the training and development of an AI model? In principle yes, but only where the processing of personal data is necessary to develop the AI model, and where the ‘balancing test’ can be resolved in favour of the controller. In particular, the issue of data minimisation, and the related issue of web-scraping / indiscriminate capture of data, will be relevant here.
Is ‘legitimate interests’ an appropriate legal basis for the deployment of an AI model? In principle yes, but only where the processing of personal data is necessary to deploy the AI model, and where the ‘balancing test’ can be resolved in favour of the controller. Here, the impact on the data subject of the use of the AI model is of predominant importance.
If an AI Model has been found to have been created, updated or developed using unlawfully processed personal data, how does this impact the subsequent use of that AI model? This depends in part on whether the AI model was first anonymised before being disclosed to the deployer of that model (see Question 1). Otherwise, the deployer of the model may need to assess the lawfulness of the development of the model as part of its accountability obligations.

Background

The Opinion was issued by the EDPB under Article 64 of the GDPR, in response to a request from the Irish Data Protection Commission. Article 64 requires the EDPB to publish an opinion on matters of ‘general application’ or which ‘produce effects in more than one Member State’.

In this case, the Irish DPC asked the EDPB to provide an opinion on the above-mentioned questions – a request that is not surprising given the general importance of AI models to businesses across the EU, but also in light of the large number of technology companies developing those models who have established their European operations in Ireland.

In order to understand the Opinion, it helps to be familiar with certain concepts and terminology relating to AI.

First, the Opinion distinguishes between an ‘AI system’ and an ‘AI model’. For the former, the EDPB relies on the definition given in the EU AI Act. In short: a machine-based system operating with some degree of autonomy that infers, from inputs, how to produce outputs such as predictions, content, recommendations, or decisions. An AI model, meanwhile, is a component part of an AI system. Colloquially, it is the ‘brain’ of the AI system – an algorithm, or series of algorithms (such as in the form of a neural network), that recognises patterns in data. AI models require the addition of further components, such as a user interface, to become AI systems. To take a common example – the generative AI system known as Chat GPT is a software application comprised of an AI model (the GPT Large Language Model) connected to a chatbot-style user interface that allows the user to submit queries (or ‘prompts’) to the model in the form of natural language questions. Whilst the Opinion is notionally concerned only with AI models, at times the Opinion appears to blur the distinction between the model and the system, in particular, when discussing the significance of model outputs that are only rendered comprehensible to the user through an interface that sits outside of the model.

Second, the Opinion relies on an understanding of a typical ‘AI lifecycle’, pursuant to which an AI model is first developed by training the model on large volumes of data. This training may happen in a number of phases which become increasingly refined (referred to as ‘fine-tuning’). Only after an AI model is developed can it be used, or ‘deployed’, in a live setting, as part of an AI system. Often, the developer of an AI model will not be the same person as the deployer. This is relevant because the Opinion variously addresses both development and deployment phases.

The significance of the ‘Hamburg thesis’

With respect to the key question of whether AI models can be considered anonymous, the Opinion follows in the wake of a much-discussed paper published in July 2024 by the data protection authority for the German state of Hamburg. The paper took the position that AI models (specifically, Large Language Models) are, in isolation, anonymous – they do not involve the processing of personal data.

In order to reach that conclusion, the paper decoupled the model itself from: (i) the prior training of the model (which may involve the collection and further processing of personal data as part of the training dataset); and (ii) the subsequent use of the model, whereby a prompt/input may contain personal data, and an output may be used in a way that means it constitutes personal data.

Looking only at the AI model itself, the paper decided that the tokens and values which make up the ‘inner workings’ of a typical AI model do not, in any meaningful way, relate to or correspond with information about identifiable individuals. Consequently, the model itself was found to be anonymous, even if the development and use of the model involves the processing of personal data.

The Hamburg thesis was welcomed for several reasons, not least because it resolved difficult questions such as how data subject rights could be understood in relation to an AI model (if someone asks for their personal data to be deleted, then what can this mean in the context of an AI model?), and the question of the lawful basis for ‘storing’ personal data in an AI model (as distinct from the lawful basis for collecting and preparing data to train the model).

However, as we go on to explain, the EDPB Opinion does not follow the relatively simple and certain framework presented by the Hamburg thesis. Instead, it introduces uncertainty by asserting that there are, in fact, scenarios where an AI model contains personal data, but that this must be determined on a case-by-case basis.

Are AI models anonymous?

First, the Opinion is only concerned with AI models that have been trained using personal data. Therefore, AI models trained using solely non-personal data (such as statistical data, or financial data relating to businesses) can, for the avoidance of doubt, be considered anonymous. However, in this context the broad scope of ‘personal data’ under the GDPR must be remembered, and the Opinion does not suggest any de minimis level of personal data that needs to be involved in the training of the AI model for the question of GDPR applicability to arise.

Where personal data is used in the training phase, the next question is whether the model is specifically designed to provide personal data regarding individuals whose personal data were used to train the model. If so, the AI model will not be anonymous. For example, an AI model that is trained to provide a user, on request, with biographical information and contact details for directors of public companies, or a generative AI model that is trained on the voice recordings of famous singers so that it can, in turn, mimic the voices of those singers. In each case, the model is trained on personal data of specific individuals, in order to be able to produce other personal data about those individuals as an output.

Finally, there is the intermediary case of AI models that are trained on personal data, but that are not designed to provide personal data related to the training data as an output. It is this use case that the Opinion focuses on. The conclusion is that AI models in this category may be anonymous, but only if the developer of the model can demonstrate that information about individuals whose personal data was used to train the model cannot be ‘obtained from’ the model, using all means reasonably likely to be used. Notwithstanding that personal data used for training the model no longer exists within the model in its original form (but rather it is “represented through mathematical objects“), that information is, in the eyes of the EDPB, still capable of constituting personal data.

The following question then arises: how does someone ‘obtain’ personal data from an AI model? In short, the Opinion posits two possibilities. First, that training data is ‘extracted’ via deliberate attacks. The Opinion refers to an evolving field of research in this area and makes reference to techniques such as ‘model inversion’, ‘reconstruction attacks’, and ‘attribute and membership inference’. These are techniques that can be deployed to trick the model into revealing training data, or otherwise reconstruct that training data, in some cases relying on privileged access to the model itself. Second, is the risk of accidental or inadvertent ‘regurgitation’ of personal data as part of an AI model’s outputs.

Consequently, a developer must be able to demonstrate that its AI model is resistant both to attacks that extract personal data directly from the model, as well as to the risk of regurgitation of personal data in response to queries: “In sum, the EDPB considers that, for an AI model to be considered anonymous, using reasonable means, both (i) the likelihood of direct (including probabilistic) extraction of personal data regarding individuals whose personal data were used to train the model; as well as (ii) the likelihood of obtaining, intentionally or not, such personal data from queries, should be insignificant for any data subject“.

Which criteria should be used to evaluate whether an AI model is anonymous?

Recognising the uncertainty in its conclusion that the AI models ‘may or may not‘ be anonymous, the EDPB provides a list of criteria that can be used to assess the likelihood of a model being found to contain personal data. These include:

Steps taken to avoid or limit the collection of personal data during the training phase.
Data minimisation or masking measures (e.g., pseudonymisation) applied to reduce the volume and sensitivity of personal data used during the training phase.
The use of methodologies during model development that reduce privacy risks (e.g., regularisation methods to improve model generalisation and reduce overfitting, and appropriate and effective privacy-preserving techniques, such as differential privacy).
Measures that reduce the likelihood of obtaining personal data from queries (e.g., ensuring the AI system blocks the presentation to the user of outputs that may contain personal data).
Document-based audits (internal or external) undertaken by the model developer that include an evaluation of the chosen measures and of their impact to limit the likelihood of identification.
Testing of the model to demonstrate its resilience to different forms of data extraction attacks.

What is the correct legal basis for AI models?

When using personal data to train an AI model, the preferred legal basis is normally the ‘legitimate interests’ of the controller, under Article 6(1)(f) GDPR. This is for practical reasons. Whilst, in some circumstances, it may be possible to obtain GDPR-compliant consent from individuals authorising the use of their data for AI training purposes, in most cases this will not be feasible.

Helpfully, the Opinion accepts that legitimate interests is, in principle, a viable legal basis for processing personal data to train an AI model. Further, the Opinion also suggests that it should be straightforward for businesses to identify a lawful legitimate interest. For example, the Opinion cites “developing an AI system to detect fraudulent content or behaviour” as a sufficiently precise and real interest.

However, where businesses may have more difficulty is in showing that the processing of personal data is necessary to realise their legitimate interest, and that their legitimate interest is not outweighed by any impact on the rights and freedoms of data subjects (the ‘balancing test’). Whilst this is fundamentally just a restatement of existing legal principles, the following sentence should nevertheless cause some concern for businesses developing AI models, in particular Large Language Models: “If the pursuit of the purpose is also possible through an AI model that does not entail processing of personal data, then processing personal data should be considered as not necessary“. Technically speaking, it may often be the case that personal data is not essential for the training of an AI model – however, this does not mean that it is straightforward to systematically remove all personal data from a training dataset, or otherwise replace all identifying elements with ‘dummy’ values.

With respect to the balancing test, the EDPB asks businesses to consider a data subject’s interest in self-determination and in maintaining control over their own data when considering whether it is lawful to collect personal data for model training purposes. In particular, it may be more difficult to satisfy the balancing test if a developer is scraping large volumes of personal data (especially including any sensitive data categories) against their wishes, without their knowledge, or otherwise in contexts that would not be reasonably expected by the data subject.

When it comes to the separate purpose of deploying an AI model, the EDPB asks businesses to consider the impact on the data subject’s fundamental rights that arise from the purpose for which the AI model is used. For example, AI models that are used to block content publication may adversely affect a data subject’s fundamental right to freedom of expression. However, conversely the EDPB recognises that the deployment of AI models may have a positive impact on a data subject’s rights and freedoms – for example, an AI model that is used to improve accessibility to certain services for people with disabilities). In line with Recital 47 GDPR, the EDPB reminds controllers to consider the ‘reasonable expectations’ of data subjects in relation to both training and deployment uses of personal data.

Finally, the Opinion discusses a range of ‘mitigating measures’ that may be used to reduce risks to data subjects and therefore tip the balancing test in favour of the controller. These include:

Technical measures to reduce the volume or sensitivity of personal data at use (e.g., pseudonymisation, masking).
Measures to facilitate the exercise of data subject rights (e.g., providing an unconditional right for data subjects to opt-out of the use of their personal data for training or deploying the model; allowing a reasonable period of time to elapse between collection of training data and its use).
Transparency measures (e.g., public communications about the controller’s practices in connection with the use of personal data for AI model development).
Measures specific to web-scraping (e.g., excluding publications that present particular risks; excluding certain data categories or sources; excluding websites that clearly object to web scraping).

Notably, the EDPB observes that, to be effective, these mitigating measures must go beyond mere compliance with GDPR obligations (for example, providing a GDPR compliant privacy notice, which a controller would in any case be required to do, would not be an effective transparency measure for these purposes).

When are companies liable to non-compliant AI models?

In its final question, the DPC sought clarification from the EDPB on how a deployer of an AI model might be impacted by any unlawful processing of personal data in the development phase of the AI model.

According to the EDPB, such ‘upstream’ unlawful processing may impact a subsequent deployer of an AI model in the following ways:

Corrective measures taken against the developer may have a knock-on effect on the deployer – for example, if the developer is ordered to delete personal data unlawfully collected for training purposes, the developer would not be allowed to subsequently process this data. However, this raises an important practical question about how such data could be identified in, and deleted from, the AI model, taking into account the fact that the model does not retain training data in its original form.
Unlawful processing in the development phase may impact the legal basis for the deployment of the model – in particular, if the deployer of the AI model is relying on ‘legitimate interests’, it will be more difficult to satisfy the balancing test in light of the deficiencies associated with the collection and use of the training data.

In light of these risks, the EDPB recommends that deployers take reasonable steps to assess the developer’s compliance with data protection laws during the training phase. For example, can the developer explain the sources of data used, steps taken to comply with the minimisation principle, and any legitimate interest assessments conducted for the training phase? For certain AI models, the transparency obligations imposed in relation to AI systems under the AI Act should assist a deployer in obtaining this information from a third party AI model developer. While the opinion provides a useful framework for assessing GDPR issues with AI systems, businesses operating in the EU may be frustrated with the lack of certainty or definitive guidance on many key questions relating to this new era of technology innovation.

Ireland: Increased regulatory convergence of AI and data protection: X suspends training of AI chatbot with EU user data after Irish regulator issues High Court proceedings

Mon, 19 Aug 2024 12:23:43 +0000

The Irish Data Protection Commission (DPC) has welcomed X’s agreement to suspend its processing of certain personal data for the purpose of training its AI chatbot tool, Grok. This comes after the DPC issued suspension proceedings against X in the Irish High Court. The DPC described this as the first time that any Lead Supervisory Authority had taken such an action, and the first time that it had utilised these particular powers.

Section 134 of the Data Protection Act 2018 allows the DPC, where it considers there is an urgent need to act to protect the rights and freedoms of data subjects, to make an application to the High Court for an order requiring a data controller to suspend, restrict, or prohibit the processing of personal data.

The High Court proceedings were issued on foot of a complaint to the DPC raised by consumer rights organisations Euroconsumers, and Altroconsumo on behalf of data subjects in the EU/EEA. The complainants argued that the Grok chatbot was being trained with user data in a manner that did not sufficiently explain the purposes of data processing, and that more data than necessary was being collected. They further argued that X may have been handling sensitive data without sufficient reasons for doing so.

Much of the complaint stemmed from X’s initial approach of having data sharing automatically turned on for users in the EU/EEA, which it later mitigated by adding an opt-out setting. X claimed that it had relied on the lawful basis of legitimate interest under the GDPR, but the complainants argued that X’s privacy policy – dating back to September 2023 – was insufficiently clear as to how this applied to the processing of user data for the purposes of training AI models such as Grok.

This development follows a similar chain of events involving Meta in June. Complaints from privacy advocacy organisation NOYB were made against Meta’s reliance on ‘legitimate interest’ in relation to the use of data to train AI models. This led to engagement with the DPC and the eventual decision in June by Meta to pause relevant processing (without the need for the authority to invoke s134).

The DPC and other European supervisory authorities strive to emphasise the principles of lawfulness, fairness and transparency at the heart of the GDPR, and their actions illustrate that any activities which purport to threaten these values will be dealt with directly.

The DPC has previously taken the approach of making informal requests and has stated that the exercise of its powers in this case comes after extensive engagement with X on its model training. The High Court proceedings highlight the DPC’s willingness to escalate action where there remains a perceived risk to data subjects.

The DPC has, in parallel, stated that it intends to refer the matter to the EDPB although there has been no confirmation of such referral as of this date.

Such referral will presumably form part of a thematic examination of AI processing by data controllers. The topic is also the subject of debate from individual DPAs, as evidenced by the Discussion Paper on Large Language Models and Personal Data recently published by the Hamburg DPA.

The fact much of the high profile activity relating to regulation of AI is coming from the data protection sphere will no doubt bolster the EDPB’s recommendation in a statement last month that Data Protection Authorities (DPAs) are best placed to regulate high risk AI.

It is expected that regulatory scrutiny and activity will only escalate and accelerate in tandem with the increase in integration of powerful AI models into existing services by ‘big tech’ players to enrich data. This is particularly the case where it is perceived that data sets are being re-purposed and further processing is taking place. In such circumstances, it is essential that an appropriate legal basis is being relied upon – noting the significant issues that can arise if there is an over-reliance on legitimate interest. The DPC and other regulators are likely to investigate, engage and ultimately intervene where it believes that data subjects’ rights under the GDPR are threatened. Perhaps in anticipation of more cross-border enforcement activity, last month, the European Commission proposed a new law to streamline cooperation between DPAs when enforcing the GDPR in such cases.

A fundamental lesson from these developments is that, in the new AI paradigm, ensuring there is a suitable legal basis for any type of processing and the principles of fairness and transparency are complied with should be an absolute priority.

HONG KONG: Artificial Intelligence – Model Personal Data Protection Framework

Carolyn Bigg and Gwyneth To — Thu, 13 Jun 2024 11:37:18 +0000

In the rapid development of artificial intelligence (“AI”), regulators are playing catch up in creating frameworks to aid and regulate its development.

As the AI landscape begins to mature, different jurisdictions have begun to publish guidance and frameworks. Most recently, on 11 June 2024, Hong Kong’s Office of the Privacy Commissioner for Personal Data (“PCPD”) published the Artificial Intelligence: Model Personal Data Protection Framework (“Model Framework”) as a step to provide organisations with internationally recognised practical recommendations and best practices in the procurement and implementation of AI.

Summary of the Model Framework

The key underlying theme for Hong Kong’s Model Framework lies in the ethical procurement, implementation and use of AI systems, in compliance with data protection under the Personal Data Privacy Ordinance (“PDPO”).

The non-binding Model Framework seeks to promote organisation’s internal governance measures. As such, the Model Framework focuses on four key areas for organisations to take measures throughout the lifecycle of the deployment of AI:

Establishing an AI strategy and governance – to formulate an internal strategy and governance considerations in the procurement of AI solutions.

Conducting risk assessment with human oversight – undergo risk assessments and tailor risk management with respect to the organisation’s use of AI, including the decision of the level of human oversight in automated decision making.

Customising AI models and implementation and management of AI systems – preparation and management of data in the use of AI systems to ensure data and system security.

Communicating and engaging with stakeholders – communicate with relevant stakeholders (e.g. suppliers, customers, regulators) to promote transparency and trust in the use of AI.

It is worth noting that the Model Framework makes reference to the 2021 Guidance on the Ethical Development and Use of Artificial Intelligence (“Guidance”), also issued by the PCPD. The Model Framework, which focuses on the procurement of AI solutions, complements the earlier Guidance which is primarily aimed at AI solution providers and vendors.

As a recap, the Guidance recommends three data stewardship values of being respectful, beneficial and fair, as well as seven ethical principles of accountability, human oversight, transparency and interpretability, data privacy, beneficial AI, reliability robustness and security, and fairness – which are not foreign concepts for organisations from a data protection perspective.

Comparison with other jurisdictions

With different jurisdictions each grappling with their own AI regulatory framework, the common theme is the goal of ensuring the responsible use of AI. That said, there are slight nuances in the focus of each regulator.

For instance, the AI Act of the European Union considers AI systems in terms of their risk level, whereby serious AI incidents must be reported to relevant market surveillance authorities. Hong Kong’s Model Framework differs in that its approach to AI incidents mirrors the PDPO’s non-compulsory reporting of general personal data incidents.

Meanwhile in Singapore, the regulatory framework also touches on the responsible use of AI in personal data protection. That said, compared to the Hong Kong Model Framework’s personal data protection focus, the Singapore’s regulatory framework is a more general, broader governance model for generative AI applications.

Next steps

The publication of the Model Framework is a welcomed move, as it provides more clarity as to the direction and focus of Hong Kong regulators on the use of AI. We expect more standards and guidance to be gradually published, with personal data protection as a central theme to the compliance of such.

Whilst different global regulators differ slightly in their focus – the central goal of responsible use of AI remains. As such, organisations currently using or considering to use AI in their operations – be it for internal or external purposes – should focus on designing a global internal strategy and governance rules, in order to understand and mitigate the risks associated with their use of AI.

As a first step, organisations should understand the extent and use of AI in their operations (i.e. whether this is a procurement of AI solutions, or the implementation and training of the organisation’s own AI model). With this, organisations should then undergo an internal data audit to understand the scope and extent of information involved in the deployment of AI, in order to assess and mitigate risks accordingly.

Please contact Carolyn Bigg (Carolyn.Bigg@dlapiper.com) if you would like to discuss what these latest developments mean for your organisation.

This article was not generated by artificial intelligence.

We’re now seamlessly global. Here’s what to expect.

Juliet McNulty — Tue, 12 Sep 2023 21:29:52 +0000

Dear subscriber,

Thank you for subscribing and being a part of DLA Piper’s Data Protection, Privacy and Cybersecurity community. We appreciate your continued engagement with our insights and the evolving nature of the landscape.

Our goal for this blog is to help you navigate all aspects of data protection, privacy, and cybersecurity laws, while considering the ever-expanding geographic footprint of businesses. Here at DLA Piper, we understand how compliance across jurisdictions makes it even harder to solve today’s most pressing privacy problems and prepare for future cybersecurity threats.

We’re excited to announce that Privacy Matters will be bringing you more on privacy and cybersecurity issues at home and abroad.

Find quickly shared updates on relevant topics on global data protection, privacy, and cybersecurity issues
Get perspectives from leading DLA Piper professionals from around the world
Easily navigate our improved, user-friendly layout

With this change, you’ll need to add privacymatters@comms.com to your contacts to ensure blog updates make it to your inbox.

Thanks for reading,

The Data Privacy team at DLA Piper

A Pro-Innovation Approach: UK Government publishes white paper on the future of governance and regulation of artificial intelligence

Magda Zmorka — Fri, 31 Mar 2023 10:02:29 +0000

Authors: James Clark, Coran Darling, Andrew Dyson, Gareth Stokes, Imran Syed & Rachel de Souza

In November 2021, the UK Government (“Government”) issued the National Artificial Intelligence (AI) Strategy, with the ambition of making the UK a global AI superpower over the next decade. The strategy promised a thriving ecosystem, supported by Government policy that would look at establishing an effective regulatory framework; a new governmental department focussed on AI and other innovative technologies; and collaboration with national regulators.

On 29 March 2023 the Government published the long-awaited white paper (“Paper”) setting out how the UK anticipates it will achieve the first, and most important, of these goals – the creation of a blueprint for future governance and regulation of AI in the UK. The Paper is open for consultation until 21 June 2023.

The Paper, headed “A pro-innovation approach”, recognises the importance of building a framework that engenders trust and confidence in responsible use of AI (noting the key risks to health, security, privacy, and more, that can arise through an unregulated approach), but cautions against ‘overbearing’ regulation which may adversely impact innovation and investment.

This theme runs throughout the Paper and expands into recommendations that support a relatively light touch, and arguably a more organic regulatory approach, than we have seen in other jurisdictions. This is most notably the case when compared to the approach of the EU, where the focus has been on development of a harmonizing AI-specific law and supporting AI-specific regulatory regime.

The Paper contends that effective AI regulation can be constructed without the need for new cross-sectoral legislation. Instead, the UK is aiming to establish “a deliberately agile and iterative approach” that avoids the risk of “rigid and onerous legislative requirements on businesses”. This ambition should be largely achieved by co-opting regulators in regulated sectors to effectively take direct responsibility for the establishment, promotion, and oversight of responsible AI in their respective regulated domains. This would then be supported by the development of non-binding assurance schemes and technical standards.

Core Principles

This approach may be different in execution from the proposals we are seeing come out of Europe with the AI Act. If we look beneath the surface, however, we find the Paper committing the UK to core principles for responsible AI which are consistent across both regimes:

Safety, security, and robustness: AI should function in a secure, safe, and robust manner, where risks can be suitably monitored and mitigated;
Appropriate transparency and explainability: organisations developing and deploying AI should be able to communicate the method in which it is used and be able to adequately explain an AI system’s decision-making process;
Fairness: AI should be used in ways that comply with existing regulation and must not discriminate against individuals or create unfair commercial outcomes;
Accountability and governance: appropriate measures should be taken to ensure there is appropriate oversight of AI systems and there are adequate measures to follow accountability; and
Contestability and redress: there must be clear routes to dispute harmful outcomes or decisions generated by AI.

The Government intends to use the principles as a universal guardrail to guide the development and use of AI by companies in the UK. This approach that aligns with international thinking that can be traced back to the OECD AI Principles (2019), the Council of Europe’s 2021 paper on a legal framework for artificial intelligence, and recent Blueprint for an AI Bill of Rights proposed by the White House’s Office of Science and Technology Policy.

Regulator Led Approach

The UK does not intend to codify these core principles into law, at least for the time being. Rather, the UK intends to lean on the supervisory and enforcement powers of existing regulatory bodies, charging them with ensuring that the core principles are followed by organisations for whom they have regulatory responsibility.

Regulatory bodies, rather than lawmakers or any ‘super-regulator’, will therefore be left to determine how best to promote compliance in practice. This means, for example, that the FCA will be left to regulate AI across financial services; the MHRA to consider what is appropriate in the field of medicines and medical devices; and the SRA for legal service professionals. This approach is already beginning to play out in some areas. For example, in October 2022, the Bank of England and FCA jointly released a Discussion Paper on Artificial Intelligence and Machine Learning (DP5/22), which is intended to progress the debate on how regulation and policy should play a role in use of AI in financial services.

To enable this to work, the Paper contemplates a new statutory duty on regulators which requires them to have due regard to the principles in the performance of their tasks. Many of these duties already exist in other areas, such as the so-called ‘growth duty’ that came into effect in 2017 which requires regulators to have regard to the desirability of promoting economic growth. Regulators will be required by law to ensure that their guidance, supervision, and enforcement of existing sectoral laws takes account of the core principles for responsible AI. Precisely what that means in practice remains to be seen.

Coordination Layer

The Paper recognises that there are risks with a de-centralised framework. For example, regulators may establish conflicting requirements, or fail to address risks that fall between gaps.

To address this, the Paper announces the Government’s intention to create a ‘coordination layer’ that will cut across sectors of the economy and allow for central coordination on key issues of AI regulation. The coordination layer will consist of several support functions, provided from within Government, including:

assessment of the effectiveness of the de-centralised regulatory framework – including a commitment to remain responsive and adapt the framework if necessary;
central monitoring of AI risks arising in the UK;
public education and awareness-raising around AI; and
testbeds and sandbox initiatives for the development of new AI-based technologies.

The Paper also recognises the likely importance of technical standards as a way of providing consistent, cross-sectoral assurance that AI has been developed responsibly and safely. To this end, the Government will continue to invest in the AI Standards Hub, formed in 2022, whose role is to lead the UK’s contribution to the development of international standards for the development of AI systems.

This standards-based approach may prove particularly useful for those deploying AI in multiple jurisdictions and has already been recognised within the EU AIA, which anticipates compliance being established by reference to common technical standards published by recognised standards bodies. It seems likely that over time this route (use of commonly recognised technical standards) will become the de facto default route to securing practical compliance to the emerging regulatory regimes. This would certainly help address the concerns many will have about the challenge of meeting competing regulatory regimes across national boundaries.

International comparisons

EU Artificial Intelligence Act

The proposed UK framework will inevitably attract comparisons with the different approach taken by the EU AIA. Where the UK intends to take a sector-by-sector approach to regulating AI, the EU has opted for a horizontal cross-sector regulation-led approach. Further, the EU clearly intends exactly the same single set of rules to apply EU-wide. The EU AIA is framed as a directly-effective Regulation whereby the EU AIA applies directly as law across the bloc, rather than the ‘EU Directive’ method, which would require Member States to develop domestic legislation to comply with the adopted framework.

The EU and UK approaches each have potential benefits. The EU’s single horizontal approach of regulation across the bloc ensures that organisations engaging in regulated AI activities will, for the most part, only be required to understand and comply with the AI Act’s single framework and apply a common standard based on the use to which AI is being put, regardless of sector.

The UK’s approach provides a less certain legislative framework, as companies may find that they are regulated differently in different sectors. While this should be mitigated through the ‘coordination layer’, it will likely lead to questions about exactly what rules apply when, and the risk of conflicting areas of regulatory guidance. This additional complexity will no doubt be a potential detractor for the UK, but if adopted effectively the benefits of having a regime that is agile to evolving needs and technologies, could trump the EU with its more codified approach. In theory, it should be much easier for the UK to implement changes via regulatory standards, guidance, or findings than it would be for the EU to push amendments through a relatively static legislative process.

US Approach

There are clear parallels between the UK and the likely direction of travel in the US, where a sector-by-sector approach to the regulation of AI is the preferred choice. In October 2022, the White House Office of Science and Technology Policy published a Blueprint for an AI Bill of Rights (“Blueprint”). Much like the Paper, the Blueprint sets out an initial framework for how US authorities, technology companies, and the public can work to ensure AI is implemented in a safe and accountable manner. The US anticipate setting out principles that will be used to help guide organisations to manage and (self-) regulate the use of AI, but without the level of directional control that the UK anticipate passing down to sector specific regulators. Essentially the US position will be to avoid direct intervention into state or federal level regulations which will be left to others to decide. It remains to be seen how the concepts framed in the Blueprint might eventually translate into powers for US regulators.

A Push for Global Interoperability

While the Government seeks to capitalise upon the UK’s strategic position as third in the world for number of domestic AI companies, it also recognises the importance of collaboration with international partners. Focus moving forward will therefore be directed to supporting global opportunities while protecting the public against cross-border risks. The Government intends to promote interoperability between the UK approach and differing standards and approaches across jurisdictions. This will ensure that the UK’s regulatory framework encourages the development of a compatible system of global AI governance that will allow organisations to pursue ventures across jurisdictions, rather than being isolated by jurisdiction-specific regulations. The approach is expected to leverage existing proven and agreed upon assurance techniques and international standards play a key role in the wider regulatory ecosystem. Doing so is therefore expected to support cross-border trade by setting out internationally accepted ‘best practices’ that can be recognised by external trading partners and regulators.

Next steps

The Government acknowledges that AI continues to develop at pace, and new risk and opportunities continue to emerge. To continue to strengthen the UK’s position as a leader in AI, the Government is already working in collaboration with regulators to implement the Paper’s principles and framework. It anticipates that it will continue to scale up these activities at speed in the coming months.

In addition to allowing for responses to their consultation (until 21 June 2023), the Government has staggered its next steps into three phases: i) within the first 6 months from publication of the Paper; ii) 6 to 12 months from publication; and iii) beyond 12 months from publication.

Find out more

You can find out more on AI and the law and stay up to date on the UK’s push towards regulating AI at Technology’s Legal Edge, DLA Piper’s tech-sector blog.

For more information on AI and the emerging legal and regulatory standards, visit DLA Piper’s focus page on AI.

You can find a more detailed guide on the AI Act and what’s in store for AI in Europe in DLA Piper’s AI Regulation Handbook.

To assess your organisation’s maturity on its AI journey (and check where you stand against sector peers) you can use DLA Piper’s AI Scorebox tool.

DLA Piper continues to monitor updates and developments of AI and its impacts on industry across the world. For further information or if you have any questions, please contact the authors or your usual DLA Piper contact.

Keeping an ‘AI’ on your data: UK data regulator recommends lawful methods of using personal information and artificial intelligence

Magda Zmorka — Tue, 08 Nov 2022 12:25:21 +0000

Authors: Jules Toynton, Coran Darling

Data is often the fuel that powers AI used by organisations. It tailors search parameters, spots behavioural trends, and predicts future possible outcomes (to highlight a just a few uses). In response, many of these organisations seek to accumulate and use as much data as possible, in order to make their systems work that little bit faster or more accurately.

In many cases, providing the data is not subject to copyright or other such restrictions, this is without many issues – organisations are able to amass large quantities of data that can be used initially to train their AI systems, or, after deployment, continue to update their datasets to ensure the latest and most accurate data is used.

Where this becomes a potential issue, is when the data being collected and used is personal information. For example, the principle of ‘data minimisation’ requires that only the necessary amount and type of personal data is used to develop an AI system. This is at odds with the ‘data hoarding’ corporate mentality described above, which seeks to know as much detail as possible. Furthermore, the principle of ‘purpose limitation’ places several restrictions on the re-use of historic data sets to train AI systems. This may cause particular headaches when working with an AI vendor that wishes to further commercialise the AI which has benefited from the learnings and developments of your data in a way that is beyond the purpose for which the data was originally provided.

It is however acknowledged by the Information Commissioner’s Office (“ICO”), the UK’s data regulator, that AI and personal data will forever be interlinked – unavoidably so in certain situations. In response, in November 2022, the ICO released a set of guidance on how organisations can use AI and personal data appropriately and lawfully, in accordance with the data privacy regime of the UK. The guidance is also supplemented by a number of frequently raised concerns when combining AI with personal data, including: should I carry out an impact assessment, do outputs need to comply with the principle of accuracy, and do organisations need permission to analyse personal data.

In this article we discuss some of the key recommendations in the context of the wider regulatory landscape for data and AI.

Key Recommendations:

The guide offers eight methods organisations can use to improve their handling of AI and personal information.

Take a risk-based approach when developing and deploying AI:

A first port of call for organisations should be an assessment of whether AI is needed for what is sought to be deployed. Most AI will typically fall within the remit of ‘high-risk’ if it engages with personal information for the purposes of the proposed EU AI Regulation (“AI Act”) (and likely a similar category within the developing UK framework). This will result in additional obligations and measures that will be required to be followed by the organisation in its deployment of the AI. A less technical and more privacy preserving alternative is therefore recommended by the ICO where possible.

Should AI be chosen after this, a data privacy impact assessment should be carried out to identify and minimise data risks that the AI poses to data subjects, as well as mitigating the harm it may cause. At this stage the ICO also recommends consulting different groups who may be impacted using AI in this context to better understand the potential risks.

Consider how decisions can be explained to the individuals affected:

As the ICO notes, it can be difficult to explain how AI arrives at certain decisions and outputs, particularly in the case of machine learning and complex algorithms where input values and trends change based on the AI’s ability to learn and teach itself based on the data it is fed.

Where possible, the ICO recommends that organisations:

be clear and open with subjects on how and why personal data is being used;
consider what explanation is needed in the context that the AI will be deployed;
assess what explanations are likely to be expected;
assess the potential impact of AI decisions to understand the detail required in explanations; and
consider how individual rights requests will be handled.

The ICO have acknowledged that this is a difficult area of data privacy and has provided detailed guidance, co-badged with the Alan Turing Institute, on “Explaining decisions made with AI”.

Limit data collection to only what is needed:

Contrary to several held beliefs by organisations, the ICO recommend that data is kept to a minimum where possible. This does not mean that data cannot be collected, but rather appropriate consideration must be given to the data that is collected and retained.

Organisations should therefore:

ensure that the personal data you use is accurate, adequate, relevant and limited, based on the context of the use of the AI; and
consider which techniques can be used to preserve privacy as much as practical. For example, as the ICO notes, synthetic data or federated learning could be used to minimise the personal data being processed.

It should be noted that data protection’s accuracy principle does not mean that an AI system needs to be 100% statistically accurate (which is unlikely to be practically achievable). Instead organisations should factor in the possibility of inferences/decisions being incorrect, and ensure that there are processes in place to ensure fairness and overall accuracy of outcome.

Address risks of bias and discrimination at an early stage:

A persistent concern throughout many applications of AI, particularly those interacting with sensitive data, is bias and discrimination. This is made worse in instances where too much of one trend of data is used, as the biases present in such data will form part of the essential decision-making process of the AI, thereby ‘hardwiring’ bias into the system. All steps should therefore be taken to (to the extent that it reflects the wider trend accurately) get as much variety within data used to train AI systems as possible.

To greater understand this issue, the ICO recommends that organisations:

assess whether the data gathered is accurate, representative, reliable, relevant, and up-to-date with the population or different sets of people with which the AI will be applied; and
map out consequences of the decisions made by the AI system for different groups and assess whether these are acceptable from a data privacy regulatory standpoint as well as internally.

Where AI does produce biased or discriminatory decisions, this is likely to conflict with the requirement for processing of personal data to be fair, as well as obligations of several other more specific regulatory frameworks. A prime example of this is the Equality Act, which ensures that discrimination on the grounds of protected characteristics, by AI or otherwise, is prohibited. Care should be taken by organisations to ensure that decisions are made in such a way that prevents repercussions from the wider data privacy and AI regimes, as well as those specific to the sectors and activities in which they are involved.

Dedicate time and resources to preparing data:

As noted above, the quality of an AI’s output is only going to be as good as the data it is fed and trained with. Organisations should therefore ensure sufficient resources are dedicated to preparing the data to be used.

As part of this process, organisations should expect to:

create clear criteria and lines of accountability about the labelling of data involving protected characteristics and/or special category data;
consult members of protected groups where applicable to define the labelling criteria; and
involve multiple human labellers to ensure consistency of categorisation and delineation and to assist with fringe cases.

Ensure AI systems are made and kept secure:

It should be of little surprise that the addition of new technologies can create new security risks (or exacerbate current ones). In the context of the AI Act and UK data privacy regulation (and indeed when a more established UK AI regime emerges), organisations are/will be legally required to implement appropriate technical and organisational measures to ensure suitable security protocols are in place for the risk associated with the information.

In order to do this, organisations could:

complete security risk assessments to create a baseline understanding of where risks are present;
complete regular model debugging on a regular basis; and
proactively monitor the system and investigate any anomalies (in some cases, the AI Act and any future UK AI framework may require human oversight as an additional protective measure regardless of the data privacy requirement).

Human review of AI outcomes should be meaningful:

Depending on the purpose of the AI, it should be established early on whether the outputs are being used to support a human decision-maker or whether decisions are solely autonomous. As the ICO highlights, data subjects deserve to know whether decisions with their data have been made purely autonomously, or with the assistance of AI. In instances where they are being used to assist a human, the ICO recommends that they are reviewed in a meaningful way.

This would therefore require that reviewers are:

adequately trained to interpret and challenge outputs made by AI systems;
sufficiently senior to have the authority to override automated decisions; and
accounting for other additional factors that weren’t included as part of the initial input data.

Data subjects have the right under the UK GDPR not to be subject to a solely automated decision, where that decision has a legal or similarly significant effect, and also have the right to receive meaningful information about the logic involved in the decision. Therefore, although worded as a recommendation, where AI is making significant decisions, meaningful human review becomes a requirement (or at least must be available on request).

Work with external suppliers involved to ensure that AI is used appropriately:

A final recommendation offered by the ICO is that where AI is procured from a third party, it is done so with their involvement. While it is usually the organisation’s responsibility (as controller) to comply with all regulations, this can be achieved more effectively with the involvement of those who create and supply the technology.

In order to comply with the obligations of both the AI Act and relevant data privacy regulations, organisations would therefore be expected to:

choose a supplier by carrying out the appropriate due diligence ahead of procurements;
work with the supplier to carry out assessments prior to deployment, such as impact assessments;
agree and document roles and responsibilities with the external supplier, such as who will answer individual rights requests;
request documentation from the external supplier that demonstrates they implemented a privacy by design approach; and
consider any international transfers of personal data.

When working with some AI providers, for example, with larger providers who may develop AI for a large range of applications as well as offer services to tailor their AI solutions for particular customers (and to commercialise these learnings), it may not be clear whether they are a processor or controller (or even a joint controller with the client for some processing). Where that company has enough freedom to use its expertise to decide what data to collect and how to apply its analytic techniques, it is likely to be a data controller as well.

Get in touch

For more information on AI and the emerging legal and regulatory standards visit DLA Piper’s focus page on AI.

You can find a more detailed guide on the AI Regulation and what’s in store for AI in Europe in DLA Piper’s AI Regulation Handbook.

To assess your organisation’s maturity on its AI journey in (and check where you stand against sector peers) you can use DLA Piper’s AI Scorebox tool.

You can find more on AI, technology, data privacy, and the law at Technology’s Legal Edge, DLA Piper’s tech-sector blog and Privacy Matters, DLA Piper’s Global Privacy and Data Protection resource.

DLA Piper continues to monitor updates and developments of AI and its impacts on industry in the UK and abroad. For further information or if you have any questions, please contact the authors or your usual DLA Piper contact.

EUROPE: Data protection regulators publish myth-busting guidance on machine learning

Magda Zmorka — Mon, 10 Oct 2022 08:31:58 +0000

Authors: Coran Darling, James Clark

In its proposed AI Regulation (“AI Act”), the EU recognises AI as one of the most important technologies of the 21^st century. It is often forgotten, however, that AI is not one specific type of technology. Instead, it is an umbrella term for a range of technologies capable of imitating certain aspects of human intelligence and decision-making – ranging from basic document processing software through to advanced learning algorithms.

One branch of the AI family is machine learning (“ML”), which uses models trained by datasets to resolve an array of complicated problems. The specific form and function of an ML system depends on the tasks it is intended to complete. For example, the ML system could be used to determine likely trends of categories of persons to default on loan agreements through the processing of financial default information. During the development and training of their algorithms, ML systems begin to adapt and recognise patterns within its data. They can then use this training to interpret new data and form outputs based on the intended process.

The use of ML systems gives rise to several questions which lawyers and compliance professionals may be uncertain about answering. For example: How is the data interpreted? How can an outcome be verified as accurate? Does the use of large datasets remove any chance of bias in my decision making?

In attempt to resolve some of these issues, the Agencia Española de Protección de Datos (Spain’s data regulator) (“AEPD”) and the European Data Protection Supervisor (“EDPS”) have jointly published a report addressing a number of notable misunderstandings about ML systems. The joint report forms part of a growing trend among European data protection regulators to openly grapple with AI issues, recognising the inextricable link between AI systems – which are inherently creatures of data – and data protection law.

In this article we elaborate on some of these clarifications in the context of a world of developing AI and data privacy frameworks.

Causality requires more than finding correlations

As a brief reminder to ourselves, causality is “the relationship that exists between cause and effect” whereas correlation is “the relationship between two factors that occur or evolve with some synchronisation”.

ML systems are very good at distinguishing correlations with datasets but typically lack the ability to accurately infer a causal relationship between data and outcomes. The example given in the report is that, given certain data, a system could reach the conclusion that tall people are smarter than shorter people simply by finding a correlation between height and IQ scores. As we are all aware, correlation does not imply causation (despite the spurious inferences from data that can be made, for example from the correlation between the per capita consumption of mozzarella cheese and the number of civil engineering doctorates awarded).

It is therefore necessary to ensure data is suitably vetted throughout the initial training period, and at the point of output, to ensure that the learning process within the ML system has not resulted in it attributing certain outcomes with correlating, but non-causal, information. Having some form of human supervision to determine when certain variables are being overweighted within the decision process may assist in doing so and allow intervention when bias is detected at an early stage in processing.

Training datasets must meet accuracy and representativeness thresholds

Contrary to belief, a greater variety in data does not necessarily mean that it is a better dataset or better able to mitigate bias. It is instead better to have a focused dataset that accurately reflects the trend being investigated. For example, having data on all types of currency in relation to their conversion to dollars is not helpful when seeking to find patterns and trends in the fluctuation in conversion between dollars and pounds sterling.

Furthermore, the addition of too much of certain data may lead to inaccuracies and bias in outcomes. For example, as noted in the report, the use of light-skinned male images in a dataset used to train facial recognition software will be largely unhelpful in correcting any existing biases for ethnicity or gender within the system.

The General Data Protection Regulation (“GDPR”) requires that processing of personal data be proportionate to its purpose. Care should therefore be taken when seeking to increase the amount of data in a dataset. Substantial increases in data used to produce a minimal correction in a training dataset, for example, may not be deemed proportionate and lead to breach of the requirements of the Regulation.

Well-performing machine learning systems require datasets above a certain quality threshold

It is equally not necessary that training datasets be completely error-free. Often, this is not possible or commercially feasible. Instead, datasets should be held to a certain quality that allows for a comprehensive and sufficiently accurate description of data. Providing that the average result is accurate to the overall trend, ML systems are typically able to deal with a low level of inaccuracy.

As the report notes, some models are even created and trained using synthetic data (artificially generated datasets – described in greater detail in our earlier article on the subject) that replicate the outcome of real data. In some cases, synthetic data may even be aggregated data from actual datasets which retains the benefit of accurate data trends while removing many compliance issues associated with personally identifiable information.

This is not to say that organisations should not strive to attain an accurate data set and, in fact, under the AI Act it is a mandatory requirement that a system’s data be accurate and robust. Ensuring accuracy and, where relevant, currency of personal data is also a requirement under the GDPR. However, it is important to remember that ‘accuracy’ in the context of ML need not be an absolute value.

Federated and distributed learning allows the development of systems without sharing training data sets

One approach proposed for the developing of accurate ML systems, in the absence of synthetic data, is the creation of large data-sharing repositories, often held in substantial cloud computing infrastructure. Under another limb of the EU’s digital strategy – the Data Governance Act – the Commission is attempting to promote data sharing frameworks through trusted and certified ‘data intermediation services’. Such services may have a role to play in supporting ML. The report highlights that while this means of centralised learning is an effective way of collating large quantities of data, this method comes with its own challenges.

For example, in instances of personal data, the controller and processor of the data must consider the data in the context of their obligations under the GDPR and other data protection regulations. Requirements regarding purpose limitation, accountability, and international transfers may all therefore become applicable. Furthermore, the collation of sensitive data increases the interest of other parties, particularly those with malevolent intent, in gaining access. Without suitable protections put in place, a centralised dataset with large quantities of data may become a honeypot for hackers and corporate parties seeking to gain an upper hand.

The report offers, as an alternative, the use of distributed on-site and federated learning. Distributed on-site learning involves the data controller downloading a generic or pre-trained model to a local server. The server then uses its own dataset to train and improve the downloaded model. After this is completed, there is no further need for the generic model. By comparison, with federalised learning the controller trains a model with its own data and then sends only its parameters to a central server for aggregation. It should be noted however, that often this is not the most efficient method and may even be a barrier to entry or development for smaller organisations in the ML sector, due to cost and expertise restrictions.

Once deployed, machine learning models performance may deteriorate until further trained

Unlike other technologies, ML models are not plug in and forget systems. The nature of ML is that the system adapts and evolves over time. Consequently, once deployed, an ML system must be consistently tested to ensure it remains capable of solving the problems for which it was created. Once mature, a model may no longer provide accurate results if it does not evolve with its subject matter. For example, a ML model aimed at predicting futures prices of coffee beans will deteriorate if it is not fed new and refreshed data.

The result of this, should the data not be updated for some time, is an inaccurate model that will produce tainted, biased, or completely incorrect judgements and outcomes (a situation known as data drift). This may also occur in instances where the interpretation of the data changes within the algorithm while the general distribution does not (known as concept drift). As the report notes, it is therefore necessary to monitor the ML system to detect any deterioration in the model and act on its decay.

A well-designed machine learning model can produce decisions understandable to relevant stakeholders

Perhaps the fault of popular media, there is a recurring belief that the automatic decisions taken by ML algorithms cannot be explained. While this may be the case for a select few models, a well-designed model will typically produce decisions that can be readily understood by stakeholders.

Some of the factors which are important in terms of explainability are understanding which parameters were considered and their weighting in decision making. The degree of ‘explainability’ demanded from a model is likely to vary based on the data involved and the likelihood of a decision to impact the lives of data subjects (if any). For example, far greater explainability would be expected from a model that deals with credit scoring or employment applications than those tasked with predicting futures markets.

It is possible to provide meaningful transparency to users without harming IP rights

A push towards transparency and explainability has naturally led many to question how to effectively protect trade secrets and IP when everyone can see how their models and ML systems behave. As the report highlights, transparency and the protection of IP are not incompatible, and there are several methods of providing transparency to users without harming proprietary know-how or IP. While users should be provided with sufficient information to know what their data (particularly personal data) is being used for, this does not necessarily mean that specific technical details need disclosed.

The report compares the requirement to the provision of advisory leaflets with medicine. It is necessary to alert users to what may happen when using the medicine (or model/system) without providing an explanation of how this is specifically achieved. In cases of personal information, further explanation may be required to comply with the principles set out in applicable data protection regulation. At a minimum, data processors and controllers should properly inform users and subjects of the impacts of the ML and its decision-making on their daily lives.

Further protections for individuals may be achieved through certification in accordance with international standards, overt limitations of system behaviour, or the use of human moderators with appropriate technical knowledge.

Machine learning systems are subjects to different types of biases

It is often assumed that bias is an inherently human thing. While it is correct to say that a ML system is not in itself biased, the system will perform as it is taught. This means that while ML systems can be free from human bias in many cases, this is entirely subject to its inherent and learned characteristics. Where training or subsequent data is heavily one-sided or too much weight is ascribed to certain data points, the model may interpret this ‘incorrectly’ therefore leading to ‘biased’ results.

The inherent lack of ‘humanity’ in these systems does however have its drawback. As the report notes, ML systems have a limited ability to adapt to soft-contextual changes and unforeseen circumstances, such as changes in market trends due to new legislation or social norms. This point further highlights the need for appropriate human oversight of the functioning of ML systems.

Predictions are only accurate when future events reproduce past trends

‘ML systems are capable of predicting the future’ is perhaps one of the most common misconceptions with the technology. Rather, they can only predict possible future outcomes to the extent that they reflect the trends of previous data. The likelihood that you buy coffee on a Monday morning when you have habitually done so since starting your job indicates that it is certainly likely that you will do so this coming Monday, but it does not guarantee that you will do so, or that an unforeseen event will prevent you from doing so.

Applying this to the context of commerce, a ML system may be able to (with relative accuracy) predict the long-term trend of a particular futures market but cannot guarantee with absolute certainty that market behaviour will follow suit, particularly in the case of black swan events, such as droughts or unexpected political decisions.

To increase the chances of a more accurate outcome, organisations should seek to obtain as large a data set as possible, with as many variables considered as obtainable, while maintaining factual accuracy to the trends of data they utilise. This will allow the ML system to better predict behavioural responses to certain data and therefore produce more accurate prediction outcomes.

A system’s ability to find non-evident correlations in data can end up with the discovery of new data, unknown to the data subject

A simultaneous advantage and risk of ML systems is their capacity to map data points and establish correlations previously unanticipated by the system’s human designers. In short, it is therefore not always possible to anticipate the outcomes they may produce. Consequently, systems may identify trends in data that were not previously sought, such as predispositions to diseases. While this may be beneficial in certain circumstances, such as health, it may equally be unnecessary or inappropriate in other contexts.

Where these ML systems begin processing personal data beyond the scope of their original purpose, considerations of lawfulness, transparency, and purpose limitation under the GDPR will be engaged. Failure to appropriately justify the processing of personal data in this manner without clear purpose may result in breach of the Regulation and the subsequent penalties that accompany it.

Get in touch

For more information on AI and the emerging legal and regulatory standards visit DLA Piper’s focus page on AI.

You can find a more detailed guide on the AI Regulation and what’s in store for AI in Europe in DLA Piper’s AI Regulation Handbook.

To assess your organisation’s maturity on its AI journey in (and check where you stand against sector peers) you can use DLA Piper’s AI Scorebox tool.

UK: New National Strategy for Health Data

Magda Zmorka — Wed, 13 Jul 2022 12:59:07 +0000

Author: James Clark

The UK’s Department for Health and Social Care (“DHSC”) has published a major strategy document (‘Data saves lives: reshaping health and social care with data’) outlining the government’s plans for the regulation and use of data in healthcare.

In this post, we look at some of the most interesting proposals outlined in the strategy and consider what they might mean for the future regulation of data and technology in UK healthcare.

Secure Data Environments

The NHS will step up its investment in and use of ‘secure data environments’ (sometimes referred to as ‘trusted research environments’). In simple terms, these are specially designated, secure servers on which a third party researcher’s access to health data can be properly controlled and monitored. These will become the default route for NHS organisations to provide access to their de-identified data for research and analysis. This creates opportunities for providers of secure data platforms and the privacy enhancing technologies on which these platforms depend. It also highlights the need for companies working with the NHS to increase their own familiarisation with, and investment in, secure data environments.

Secure data environments are a hot topic in data circles. For example, they also emerge in the EU’s new Data Governance Act, in the form of its creation of ‘data intermediation services’ – i.e., services that provide a secure environment in which companies or individuals can share data.

Fair Terms for Data Partnerships

The strategy also contains proposals for the data sharing agreements that NHS bodies use when providing access to health data. Supposedly responding to public concerns about data sharing partnerships with the private sector, the Government will:

Require data sharing arrangements to embody 5 core principles (for example, any use of NHS data not available in the public domain must have an explicit aim to improve the health, welfare or care of patients in the NHS, or the operation of the NHS, and any data sharing arrangement must be transparently and clearly communicated to the public).
Develop commercial principles to ensure that partnerships for access to data contain appropriate contractual safeguards. This will lead to a review and likely update of NHS Digital’s template data sharing and data access agreements by December 2023.

Consequently, those organisations accessing NHS datasets are likely to see changes in the contractual terms on which that access is provided, and greater scrutiny of the overall arrangement to ensure adherence with principles designed to encourage public trust and confidence in such arrangements.

Trust and Transparency

On a similar theme, the strategy contains a range of other proposals designed to improve the public’s trust in the use of health data.

Alongside the investment in secure data environments, the Government also publicly commits to increase investment in a wider range of privacy enhancing technologies (or ‘PETs’), such as homomorphic encryption (a technology that allows functions to be performed on encrypted data without ever having to decrypt it) and synthetic data (artificially manufactured data which strongly mimics real-world data, but without the privacy consequences). The ICO has written supportively about some of these technologies in its updated draft guidance on anonymisation, and consequently there seems to be a concerted push towards the adoption of technical solutions to privacy concerns in an ever more data-dependent world.

The Government also plans to further improve transparency and understanding around how it uses health data (public confusion surrounding changes to the National Data-Opt Out regime in 2021 is admitted as an example of the sort of failing the Government wants to avoid in the future). Developments on this front will include a ‘Data Pact’ (a high-level charter outlining core guarantees towards the public in terms of fair use of health data) and an online hub, with a transparency statement explaining how publicly held health and care data is used in practice.

Improving Access to Health Data

Alongside the focus on public trust and transparency, the strategy is also concerned with promoting greater access to health data in the public interest. This is a theme that has been prominent internationally following the Covid pandemic – a renewed understanding of the importance of health data for research and development purposes, leading to a demand to break down unnecessary barriers to accessing and combining datasets for these purposes.

The Government plans to do this partly through major investment in (of up to £200 million) in NHS data infrastructure to make research-ready data available to researchers. DHSC envisages a ‘vibrant hub of genomics, imaging, pathology, and citizen generated data, where AI-enabled tools and technologies can be deployed’.

On the legislative front, it’s likely that this part of the strategy will also be supported by the Government’s impending Data Reform Bill, which amongst other things, is making changes to the research provisions of UK data protection law to, for example, provide a clearer definition of scientific research, a broader form of consent where used as a lawful basis for research, and a more concrete privacy notice exemption where data is repurposed for scientific research purposes. All of these changes are expressly intended to promote greater use of personal data, including health data, for responsible research purposes.

There are strong parallels here with the EU’s proposals for a European Health Data Space, which will promote access to electronic health data for secondary purposes.

Encouraging AI Innovation

No data strategy in 2022 would be complete without consideration of Artificial Intelligence (AI). On this front, DHSC:

Commits to working with the Office of AI (OAI) on its developing plans for the regulation of AI in the United Kingdom. The OAI’s White Paper on the governance and regulation of AI is expected imminently and will be closely scrutinised as the UK’s response to the EU’s draft AI Act. The health sector is one of the most sensitive and important in an AI context and the NHS’ work on this will be led by a newly created NHS AI Lab.
Will develop unified standards for the efficacy and safety testing of AI systems, working with the Medicines and Healthcare products Regulatory Agency (MHRA) and the National Institute for Clinical Excellence (NICE). Safety standards that can be used by development teams building AI systems are an important part of the regulatory framework for safe AI, and this is likely to be a welcome step.
Will, through the NHS AI Lab, develop a methodology for evaluating the AI safety of market-authorised products in healthcare.

In summary, the strategy contains an ambitious set of proposals that are intended to cement the UK’s position as a world leader in healthcare informatics and data-driven health research. Notably, they are clearly designed to balance and reconcile competing demands for greater access to and use of health data, with the protection of trust, privacy and security in that data.