top of page

UK Copyright and AI Consultation Closes – An In-Depth Analysis

  • Writer: James Hall
    James Hall
  • Mar 13
  • 9 min read

On Tuesday 25th February, the UK Intellectual Property Office (the official government body responsible for managing the law of intellectual property) closed its public consultation on how best to shape the rules on generative AI and copyright law. At the point of the consultation’s closure, many parties found it uncertain what the future of copyright law may look like.. This longer piece takes a look at this consultation and the current rules, and discusses the many problems inherent in them.


Why is copyright relevant to generative AI?


Generative AI models are those which produce novel output from a user-based prompt. The output can be literary, visual, or musical. ChatGPT, which has become the eight-most visited website in the world, gives perhaps the most famous example of a generative AI tool.


There exists a very pronounced debate around whether copyright can subsist in the output of these tools – after all, they are literary, musical, or artistic works that flow from the intellectual creation of their author, and are therefore theoretically as eligible for copyright protection as any other such work under S.1(1)(a) Copyright, Designs and Patents Act 1988 (CDPA 1988). This issue around the output of generative AI, though major, is not the primary focus of this consultation. Rather, this scheme is aimed at developing rules from a more input-based angle, looking at the process by which the AI models are trained.


Generative AI models are trained on immense sets of data. The nature of this data depends on the AI model in need of training, but it generally consists of text, images, or music and sounds. This data is then used to teach the AI model how to make links, spot and replicate patterns, and create content in a way that does not only mimic older material, but creates something new. In essence, generative AI models are entirely dependent on these large data sets. The issue arises because much of the material included in these data sets is protected by copyright.


What’s the problem?


Copyright, though it has a variety of functions, is a way of guaranteeing remuneration of some kind for authors of pieces like literary, artistic, and musical works. Because their work is protected, should another party wish to use it, they must negotiate some sort of licence and corresponding payment in order to receive authorisation to do so. This means that the author receives, through copyright, a sort of guarantee that the use of their work will result in payment for them.

Meanwhile, AI model providers are using all this data without having negotiated any licenses with the rights holders in the works used. Potentially billions of works are therefore being made use of without writers’, artists’, and musicians’ consent and without any money as payment. This has therefore caused uproar from the creative industries, who are particularly concerned that their work is not receiving adequate remuneration, while AI companies are making extraordinary amounts of money for their services. Notably, on the day of the consultation’s closure, a series of musicians including Kate Bush, Eurythmics’ Annie Lennox, and Blur’s Damon Albarn released a silent album in protest at this current situation.


Are they right to complain?


The argument put forward by many AI companies and academics is that the practice of training an AI model does not actually invoke copyright at all, meaning that there is simply no need to pay the authors.. This is based on the idea that copyright protects definite expression, not ideas. This idea has its roots in Art 2(1) Berne Convention, and has seen support in UK cases like Norowzian v Arks (No 2) [1999]. AI training works not by using or exploiting the actual expressions of works, but rather by reducing them to pure data to spot patterns, trends, and meaning. Copyright has never protected mere data and patterns, and therefore it could sensibly be argued that nothing in what AI companies are doing requires the negotiation of licences with authors to ensure their payment.


This argument is fairly persuasive, and is a very sensible one to make in defence of these companies. The issue lies more, however, in the copyright concept of reproduction (or copying). As per s.17(1) CDPA 1988, copying any copyright work amounts to copyright infringement – a fairly self-explanatory rule. To create these large data sets, the relevant works will have to be copied into it, and therefore an infringement is potentially committed. Some data set creators like LAION, rather than copying the material, included hyperlinks to it. However following the European Court of Justice judgement in GS Media, it is not entirely clear whether this conduct is permissible either. In this confusing and logically muddled case, the court held that the process of providing a hyperlink to an infringing copyright work online required determination of whether the act was done for financial gain and with knowledge of infringement.


Given that this is the case, it seems that AI companies are therefore required to license the material they use to train their AI models, and therefore authors are right to complain that their interests are not being respected. However, as the AI companies have pointed out, licensing potentially billions of works is impossible, and even if such works were licensed, the amount payable to each author would be so infinitesimally small as to be pointless. Therefore, a new regime is necessary.


The TDM Rules


AI companies and data set creators gather the materials that they need to train their models through a process known as text and data mining (TDM), where automated ‘trawlers’ crawl through the internet collecting material to be used in training.


S.29A CDPA 1988 currently provides an exception to copyright infringement for text and data mining. This provision provides that a body, for non-commercial research purposes in carrying out computational analysis, does not infringe copyright by copying a work. S.29A(5) provides that the owner of a work cannot opt-out of having this exception apply to their work, meaning that any copyright holder has to allow this particular use of their work, whether they like it or not. This therefore strengthens the arguments made by the AI companies that their actions do not require them to license the works they are using.


The consultation which closed recently was aimed at providing a solution to this that all stakeholders – authors, AI companies, and the general public – could benefit from. The consultation was open to anyone, and came off the back of another, earlier consultation, the results of which were published in December.


The First Consultation


The first consultation proposed a significant amendment to s.29A, in that it suggested replacing s.29A(5) with a rule that states that authors can opt-out of this exception, meaning that where they request, their works will not be used for TDM purposes for AI training.


In theory, this balances the interests of AI companies and rights holders: the former still have access to a huge corpus of work on which to train their AI models, while the latter can refuse to have their work handled in a way that gets them no remuneration.


Reasons for the Second Consultation


This proposal, however, has caused a number of major issues, predominantly for the rights holders. As such, the second consultation – the one that has just closed – aimed to find a solution to these issues. Yet, the problems with the proposal can be summarised into eight points.


First, this proposal does nothing to ensure that copyright holders get any remuneration for their works. As mentioned above, AI companies can still perform TDM on a wealth of other materials, meaning that they do not need to license works that have been opted-out of this provision. Authors are therefore still left penniless; the only change is that they are now penniless by choice. AI companies can continue in largely the same way as before.


Second, it is unclear how this opt-out will actually work. Currently, as mentioned above, TDM is carried out by ‘trawlers’, computer programs that trawl the internet gathering material on which to train an AI model. Currently, the only way to opt-out is through the use of something called metadata – that is, data attached to the work. This metadata indicates whether the author of the work to which it is attached would like to opt out or opt in to the TDM copyright rules. The problem is that the metadata protocol currently used, named robots.txt, is simply not well-developed enough to create a proper opt-out. In the CDPA 1988, there are almost 70 other exceptions to copyright, around half of which have similar opt-out mechanisms. The problem with robots.txt as metadata lies in its binary nature: one can either opt into all of them, or opt out of all of them. This means that if an author wants to opt out of the TDM rules to stop their work being used for AI training, they must also opt out of every other exception, which form major sources of publicity for a work. The current state of metadata therefore means that authors are left in a position where they are effectively incentivised to opt in to the TDM exception, allowing their work to be used for free by AI companies.


Third, even in light of this, there exists another technical constraint with metadata that makes this exception deeply flawed. Imagine that X writes a poem and publishes it to the internet. X uses metadata to opt out of the TDM exception, because they do not want their poem to be used to train AI. Y runs a poetry blog, and copies and pastes a stanza of X’s poem for the purpose of reviewing and critiquing it. Under the law of copyright, this falls under one of the main exceptions to copyright (s.30 CDPA 1988), and is therefore a totally lawful use of the copyright poem. Y publishes their review on their blog, but does not mind if their work is used to train AI, so does not opt out from the TDM exception. The problem with the metadata used by X for their original poem is that it does not carry over when copied. An internet trawler will therefore come across X’s original poem and understand that it cannot be used; however, when it comes across Y’s review containing part of the poem, it will use that review and pick up the quoted poem with it, even though it had originally been opted out from. This renders the entire rule useless, since it cannot apply properly.


Fourth, in the first place this would require all authors publishing works online to use metadata, a complex and often costly process. This therefore represents a hurdle that has to be cleared should the author want their work not to be used by AI companies (notwithstanding the fact that the metadata is clearly useless anyway).


Fifth, collecting societies have usually been used to represent their members’ interests and negotiate licences. For example, collecting societies exist for musicians, writers, and artists. However, there is no way that they could sensibly monitor, regulate, and enforce the metadata for all of their members’ works, meaning that the implementation of this rule is very challenging to carry out.


Sixth, this rule would put the UK out of step with the EU. The EU rules on this are contained within Arts 3-4 CDSM Directive, passed on the eve of Brexit and therefore never implemented in the UK. The rules state, broadly, that TDM does not amount to copyright infringement if carried out for scientific research by research or cultural organisations. Anyone else can carry out TDM, but in this instance, they must only use works which have not been opted out of the scheme. This regime is similar, but subtly different to the proposed UK system, since any opt-out does not apply to research and cultural organisations, who can carry out TDM as they please. The EU’s AI Act, however, complicates things. Art 51(3)(c) mandates that AI companies publish a policy showing that they are respecting the copyright law of the EU (namely the above rules from the CDSM Directive), but Recital 106 confirms that even companies carrying out their TDM in a non-EU country, like the UK, must still publish this policy if they put their AI tools on the EU market. Therefore, by adopting an approach out of line with that of the EU, UK AI companies are therefore under a greater burden to comply with a more complex regime – surely it would be simplest to just follow the EU in this approach.


Seventh, the enforcement mechanisms for these rules are very poor. It is highly unlikely that a copyright holder will ever know if their work has been used wrongfully to train AI, meaning that they cannot effectively enforce their right in a court. If they do manage to do so, the remedies they would receive would be limited to an injunction (which has little real significance in this instance) or an extraordinarily small sum of damages. Therefore, the enforcement of these rules is fairly meaningless.


Eighth, in a related vein, the method adopted solve this is conceptually flawed. The proposal suggests fining non-compliant AI companies where breaches are revealed, but this does not seem appropriate. Copyright is a private tool, designed for individual authorial action against an infringer of individual rights. The benefits of the copyright flow to the individual who holds it. Fining companies is a public law measure, aimed at making sweeping punitive measures for regulatory non-compliance in a way that fails to deliver justice, and remuneration, to the authors. Taking a public law regulatory approach to private copyright is surely inappropriate.


Where now?


The second consultation has hopefully picked up enough material to solve these issues, though solutions are not obviously forthcoming. Authors, AI companies, and lawyers await the results of this consultation with bated breath, hoping that a sensible solution can be taken from it.


In the meantime, many stakeholders in this matter are left high and dry. Law firms can do their best to advise on ways to protect and exploit works in the meantime, but in a landscape so uncertain and conceptually flawed there is not always a guarantee of a solution. 


References and Further Reading


“A Deeper Look into the EU Text and Data Mining Exceptions” (Thomas Margoni and Martin Kretschmer)


“Artists release silent album in protest against AI using their work” (BBC News)


Berne Convention


Case C-160/15 GS Media v Sanoma [2016] ECLI:EU:C:2016:644


CDSM Directive


Copyright and AI: Consultation (GOV.UK)


Copyright, Designs, and Patents Act 1988


EU AI Act


“Generative AI, Copyright and the AI Act” (João Pedro Quintais)


Kneschke v LAION (2024, Hamburg District Court)


Norowzian v Arks (No 2) [1999]

Comments


  • Spotify
  • Instagram
  • LinkedIn

Subscribe 

Join our email list to get our articles straight in your inbox

Thanks for submitting!

bottom of page