
It’s coming up to three years since ChatGPT was released to the public in November 2022. In April 2024, we considered the impact of generative artificial intelligence (GenAI) on professional writers; the “double whammy” of having their copyright material used for training LLMs (without receiving payment) whilst being made redundant as a result of these models becoming increasingly powerful. In this article we’ll explore relevant developments over the past year, focusing on the threat to the copyright regime itself in the absence of any effective regulations which directly tackle this issue.
Data (Use and Access) Act 2025
One of the first major flashpoints of the debate around how to regulate AI companies’ training models in relation to copyright content, came in a protracted game of Parliamentary “ping pong” between the Commons and Lords during the ascension of the Data (Use and Access) Act. In fact, the content of the Bill itself wasn’t the source of the controversy; instead the arguments brewed over a proposed Lords amendment to the Bill, which would have compelled AI companies to reveal which copyrighted material is used in their models. This “transparency” clause, which was tabled by crossbench peer (and sometime film director – whose credits include Bridget Jones) Lady Beeban Kidron, came on the back of protests by various high profile figures in the creative industries, including Kate Bush and Elton John, who fear that AI is chipping away at the very fabric of copyright protection of artists. Eventually the Lords relented, albeit managing to secure a minor concession of an agreement by the government to publish a report on the use of copyright works in the development of AI systems. In practice, the report will likely simply reflect the forthcoming government response to a broader consultation on AI and copyright.
Training or theft?
The core argument, which was heatedly debated in Parliament, hinges on the question of whether the fundamental process of training an LLM on copyright content essentially constitutes a breach of copyright. In supporting Lady Kidron’s proposed amendment, the Earl of Dundee was very clear:
“It surely goes without saying that our United Kingdom copyright law has to counter the increasing theft of intellectual property by artificial intelligence companies.”
But the answer is not straightforward. One of the issues relates to what an LLM is trained on: many copyright works are illegally made publicly available on the internet, and if this content is crawled by LLMs then this could potentially implicate the AI company in taking advantage of a copyright breach. But the more fundamental question is how an LLM is trained: this tends to involve making a copy of any external content (such as a copyright work), in an internal database, which is then used to train (or re-train) the LLM. Whether this process of making an internal copy, for the sole purpose of training breaches copyright, is a question that the courts are gradually tackling.
Two recent (June 2025) judgments by US courts shed some light on the legal dilemmas at hand. In the first case, involving Anthropic (behind Claude AI, whose CEO Dario Amodei recently predicted that AI will wipe out 50% of entry level white collar jobs), millions of pirated books were downloaded, alongside many purchased books (some of which overlapped with those which had been pirated), and an internal database containing all the texts (both pirated and legitimate) was created to train the LLM. The judge ruled that the training process was protected by the “fair use” doctrine (which is a defence to breach of copyright in the US, similar to fair dealing in the UK), although a further trial would need to consider the separate issue about pirated copies being used.
The second case, brought against Meta by several authors including Sarah Silverman (covered in our 2024 article) who accused the social media company of harming their earning potential by training its LLM (Llama) on their copyright works, also saw the fair use doctrine successfully invoked. However, in this case the judge noted that the plaintiffs had failed to capitalise on a “potentially winning argument” that the LLM could potentially “flood the market with similar works, causing market dilution”. He explained that the fair use defence “typically doesn’t apply to copying that will significantly diminish the ability of copyright holders to make money from their works” and warned that, in most cases, the training of LLMs on copyright material would be considered illegal.
Although both cases went in favour of the AI providers, the Silverman judgment opens up the possibility of future litigants being successful if they focus on the loss of future income argument.
Carrot and stick
Last year we wrote about how the New York Times (NYT) was suing OpenAI and Microsoft for training ChatGPT on millions of its copyright articles, seeking “billions of dollars in statutory and actual damages”. Whilst this case is still ongoing, NYT has signed a licensing deal with Amazon, permitting the big tech giant to train its own generative AI products (such as Alexa+) on the editorial content of the NYT. This “carrot and stick” approach, which nudges AI companies to pay to train their LLMs on copyright material to avoid the threat of litigation, is followed by other mainstream media organisations. For example, News Corp is suing Perplexity AI whilst at the same time working in partnership with OpenAI.
Balancing AI gains against creative losses
As AI technology continues to make huge strides, becoming more sophisticated whilst hoovering up the totality of human knowledge and creativity, it’s increasingly important that governments implement guardrails to prevent the copyright regime being gradually eroded to the point of irrelevance. A careful balance will need to be struck between promoting AI development for economic and social benefits, whilst avoiding the loss of the creative industries.
Due to the borderless nature of AI tools available online, it will ultimately be beneficial to implement a set of global standards on the application of copyright law to AI, but in the meantime governments will be looking to each other for different ideas. For example, Denmark has recently announced that it will be legislating to clamp down on AI generated deepfakes, sending “an unequivocal message that everybody has the right to their own body, their own voice and their own facial features”. It will be interesting to see the approach of the UK government, which may be indicated in their response to the aforementioned broader AI consultation. Watch this space!
Alex Heshmaty is technology editor for the Newsletter. He runs Legal Words, a legal copywriting agency based in the Silicon Gorge. Email alex@legalwords.co.uk.
Photo CC0 Public Domain from PxHere.