$3,000 per book: AI company Anthropic pays $1.5 billion to authors in copyright dispute
Xpert Pre-Release
Available in 27 languages 📢
Prefer Xpert.Digital on GoogleⓘPublished on: September 7, 2025 / Updated on: September 7, 2025 – Author: Konrad Wolfenstein

$3,000 per book: AI company Anthropic pays $1.5 billion to authors in copyright dispute – Image: Xpert.Digital
Anthropic and the billion-dollar settlement: A paradigm shift in AI copyright law
What does the Anthropic case mean for the AI industry?
Why did the AI company Anthropic agree to a $1.5 billion settlement with authors, even though training AI models with copyrighted works might be legal? This question is currently preoccupying the entire technology industry, as the case could mark a turning point in the conflict between AI developers and copyright holders.
The case is particularly noteworthy because Anthropic, the provider of the Claude chatbot, was not sued for using copyrighted books to train its AI, but rather for the way in which this data was obtained. The US court determined that while training an AI with copyrighted texts might, under certain circumstances, be covered by the American fair use doctrine, downloading the content from illegal sources was not. Crucially, Anthropic was demonstrably aware of the illegal origin of the data.
Suitable for:
How did this historic agreement come about?
What were the specific allegations against Anthropic? The authors accused the company of downloading approximately 500,000 books and texts without permission from two copyright-infringing online databases. This data was then used to train the AI chatbot Claude, considered one of the main competitors to OpenAI's ChatGPT.
The settlement stipulates that Anthropic will pay approximately $3,000 in compensation for each affected work – roughly equivalent to €2,500. This sum is four times the minimum statutory damages under US copyright law. In addition, Anthropic must destroy the pirated documents and all copies, but retains the rights to legally acquired and scanned books.
Why did Anthropic agree to this settlement? The company wanted to avoid a lawsuit that could have resulted in fines of up to $150,000 per book. With 500,000 works affected, this would have led to a potential payment of up to $75 billion – a crippling sum even for a company that had recently raised $13 billion.
What are the differences between the legal situation in the USA and in Germany?
How would a similar case be judged in Germany? Unlike American law, German copyright law does not recognize a fair use doctrine that allows for flexible case-by-case assessment. Instead, specific limitations and exceptions are firmly defined for particular purposes, restricting the rights of copyright holders.
With the implementation of the EU Copyright Directive, Germany created Section 44b of the Copyright Act, which regulates so-called text and data mining (TDM). This provision permits the automated analysis of large datasets—whether text or images—to extract information. The training of AI generally falls under this regulation.
What restrictions apply to commercial providers? The TDM license has one crucial drawback: copyright holders can object to the use of their works for commercial TDM. This so-called usage reservation must be in machine-readable form, for example, in the metadata or the terms of service of a website.
The EU DSM Directive distinguishes between two types of text and data mining: Article 3 permits TDM for scientific research purposes by research institutions and cultural heritage institutions, provided they have lawful access to the works. This exception is mandatory and cannot be excluded by contractual clauses. Article 4, on the other hand, permits general TDM for any purpose, including commercial ones, but with the important restriction of an opt-out procedure.
What technical aspects play a role in the legal assessment?
Why is the technical workings of AI training so important for legal assessment? A recent study by the Copyright Initiative, conducted by Professor Tim W. Dornis and Professor Sebastian Stober, sheds light on the black box of AI training. The researchers conclude that, technically speaking, the training of generative AI models is not classic text and data mining, but rather a form of copyright infringement.
What happens technically when training AI models? The process involves several steps relevant to copyright: First, the data is systematically collected, which already constitutes reproduction under copyright law. Then, the collected data is stored on servers and prepared for training. Finally, the AI model analyzes the data and extracts patterns, styles, and information.
A particularly critical point is so-called memorization: The training data is partially or completely memorized by current generative models and can therefore be regenerated and thus replicated by end users with appropriate prompts. This goes far beyond mere analysis, which is the focus of classic text and data mining.
How does Claude position itself in competition with ChatGPT?
What impact does the copyright dispute have on Anthropic's market position? Despite the legal issues, Claude has established itself as a serious competitor to ChatGPT. According to current market analyses, Anthropic now holds 32 percent of the market share for Large Language Models in enterprises, while OpenAI is in second place with 25 percent.
Anthropic's position is particularly strong in the field of programming: with a 42 percent market share, the company is by far the largest provider, more than twice as strong as OpenAI with 21 percent. Claude owes this dominance primarily to its impressive context window of 200,000 tokens, which enables the processing of complete business reports in a single pass.
What are Claude's specific strengths compared to ChatGPT? Claude is frequently praised for its more "human" communication style and nuanced understanding of complex concepts. Anthropic's focus on ethical AI development and security has established it as a trusted provider for companies that place particular emphasis on responsible practices in sensitive applications.
Anthropic relies on Constitutional AI, a method that integrates ethical guidelines directly into the models. This helps prevent harmful or biased spending and builds a high level of user trust. While OpenAI is also active in AI security, Anthropic's explicit commitment to developing ethically sound AI models gives it a significant advantage.
What other lawsuits are affecting the AI industry?
Is the Anthropic case just the tip of the iceberg? In fact, over 40 lawsuits are pending in the US against AI technology providers for copyright infringement. OpenAI, for example, was sued by the New York Times, and further lawsuits are underway against Anthropic following this settlement, including those from music publishers and the online platform Reddit.
Apple has also recently become the target of copyright lawsuits: Authors have sued the technology company, alleging that it unlawfully used their copyrighted books to train its AI systems. The plaintiffs accuse Apple of copying the protected works without permission, attribution, or compensation.
In Germany, GEMA became the first collecting society worldwide to file a lawsuit against OpenAI for the unlicensed use of copyrighted musical works. GEMA accuses OpenAI of reproducing copyrighted song lyrics by German authors without having acquired licenses or compensated the authors.
How is the opt-out issue developing?
What does the opt-out procedure mean in practice for rights holders? Under German law, authors and rights holders can declare a machine-readable usage reservation to exclude their works from TDM use. Sony Music Group, for example, has published a “Declaration of AI Training Opt Out” to protect its content from unauthorized AI use.
However, the practical implementation of the opt-out mechanism is complex: How exactly such a reservation must be declared in a technically and legally effective manner, and how AI developers should handle it, has not yet been definitively clarified. There is concern that a widely used opt-out could significantly restrict the training data for AI models in Europe.
AI companies must respect these usage restrictions and may not circumvent them. If a work is to be included in the training data corpus despite these restrictions, the developer must enter into license negotiations with the rights holder. This leads to a new licensing market, which, however, is not yet established.
A new dimension of digital transformation with 'Managed AI' (Artificial Intelligence) - Platform & B2B Solution | Xpert Consulting

A new dimension of digital transformation with 'Managed AI' (Artificial Intelligence) – Platform & B2B Solution | Xpert Consulting - Image: Xpert.Digital
Here you will learn how your company can implement customized AI solutions quickly, securely, and without high entry barriers.
A Managed AI Platform is your all-round, worry-free package for artificial intelligence. Instead of dealing with complex technology, expensive infrastructure, and lengthy development processes, you receive a turnkey solution tailored to your needs from a specialized partner – often within a few days.
The key benefits at a glance:
⚡ Fast implementation: From idea to operational application in days, not months. We deliver practical solutions that create immediate value.
🔒 Maximum data security: Your sensitive data remains with you. We guarantee secure and compliant processing without sharing data with third parties.
💸 No financial risk: You only pay for results. High upfront investments in hardware, software, or personnel are completely eliminated.
🎯 Focus on your core business: Concentrate on what you do best. We handle the entire technical implementation, operation, and maintenance of your AI solution.
📈 Future-proof & Scalable: Your AI grows with you. We ensure ongoing optimization and scalability, and flexibly adapt the models to new requirements.
More about it here:
Licensing market for AI data: Opportunity for publishers or risk for startups?
What role does the EU AI regulation play?
How does the new EU AI Regulation affect copyright? While the AI Regulation does not contain any new provisions regarding exceptions to copyright, it clarifies that the use of copyrighted content requires the permission of the rights holder, unless a limitation applies.
All providers of general-purpose AI models must comply with comprehensive documentation requirements. This includes a detailed description of the data used for training, including the type and origin of the data and the processing methods. In particular, they must ensure the identification and compliance with legal reservations under the DSM Directive.
The Office for Artificial Intelligence monitors compliance with these provisions, but does not examine copyright infringements work by work. By referring to the DSM Directive, the AI Regulation clarifies that the legislator assumes the applicability of the text and data mining exception to the training of generative AI models.
Suitable for:
How do scientific and commercial uses differ?
What special regulations apply to scientific research? In a landmark ruling, the Hamburg Regional Court decided that research organizations may, under certain conditions, use copyrighted works for training artificial intelligence. The case concerned the use of a copyrighted image by a research organization that had created a large image-text dataset for training generative AI models.
The court ruled that creating an AI training dataset can fall under the freedom of research, even if commercial companies later use the data obtained in this way. The crucial factor is that the initial creation of the dataset serves the purpose of gaining knowledge. The concept of scientific research is interpreted broadly in this context.
Section 60d of the German Copyright Act (UrhG) permits scientific text and data mining by research institutions such as universities for non-commercial scientific research. Consent from the copyright holders is not required for this. This contrasts with commercial use, where an opt-out procedure applies.
What international differences exist?
How do other countries deal with the AI copyright issue? Japan is considered particularly innovation-friendly and already amended its copyright law in 2018. Article 30-4 of the Japanese Copyright Act introduces a flexible exception for uses that do not serve the “enjoyment” of the work. This is often interpreted to mean that it can also include the training of AI models, as long as the goal is data analysis and not the consumption of the work itself.
The United Kingdom has pursued its own path since Brexit. Consultations were held regarding copyright exemptions for AI developers, particularly for TDM. However, the proposals met with significant concerns from the creative industries, leaving the future direction unclear.
China amended its copyright law in 2020 and is generally strengthening intellectual property protection. Specific regulations for AI training are still under development, but the country recognizes the strategic importance of AI and is expected to seek pragmatic solutions.
What does this case mean for other AI companies?
What lessons can other AI companies learn from the Anthropic case? The comparison shows that the origin of training data is crucial. While training with legally acquired data may be covered by fair use or TDM limitations, using illegally obtained data can lead to substantial fines.
AI companies have increasingly entered into licensing agreements with copyright holders to gain access to content. OpenAI, for example, has struck deals with various media companies, and other providers are following suit. The Anthropic trial could accelerate this trend and lead to an established licensing market.
For providers of AI models and AI systems, it is crucial to use trustworthy providers for generating training data, as these providers respect the intellectual property of others when creating the data. Even without knowledge of copyright infringement, legal consequences may arise.
How will the market for AI training data develop?
Is a new licensing market emerging for AI training data? The Anthropic case and similar lawsuits suggest that a structured market for licensing content for AI training could be developing. Publishers, authors, and other rights holders are increasingly recognizing the value of their content for AI development.
At the same time, AI companies face the challenge of acquiring high-quality and legally compliant training data. The costs for such licenses can be substantial, especially for smaller companies that lack the resources of Anthropic or OpenAI.
The development of specialized data providers that create and license legally compliant training datasets is a logical consequence of this trend. These providers could act as intermediaries between rights holders and AI developers, ensuring that all legal requirements are met.
What impact will this have on innovation and competition?
Does the stricter legal framework hinder innovation in AI development? This question is the subject of much debate. Proponents of strict copyright rules argue that creators and rights holders should be adequately compensated for the use of their works. However, the large amount of training data required and the associated licensing costs could lead to a concentration of the market in the hands of a few large providers.
Smaller companies and startups might not be able to afford the necessary licenses, which would limit their ability to develop competitive AI models. Paradoxically, this could lead to less innovation and less competition, as only well-funded companies like Anthropic, OpenAI, or Google can raise the necessary resources.
On the other hand, the need to pay licensing fees could lead to more efficient training methods. AI developers might invest more in techniques that require less data or use synthetic data to reduce their reliance on licensed content.
How do rights holders and creative professionals position themselves?
What strategies are authors, publishers, and other rights holders pursuing? The Copyright Initiative and similar organizations are calling for greater consideration of copyright in AI training. They argue that it constitutes “large-scale intellectual property theft” when AI companies use copyrighted works without consent or compensation.
Many copyright holders are increasingly relying on opt-out mechanisms to protect their works from unwanted AI use. At the same time, they are exploring ways to profit from AI development through licensing agreements. This leads to a complex mix of legal disputes and business opportunities.
GEMA's lawsuit against OpenAI shows that collecting societies also play an active role in this dispute. As collectives, they could represent the interests of their members and conduct licensing negotiations with AI companies.
What are the long-term prospects?
How might the legal landscape develop in the coming years? The Anthropic case may only be the beginning of a wave of settlements and court rulings that redefine the rules for AI training. In the US, further cases could clarify the fair use doctrine regarding AI, while in Europe the practical application of TDM limitations continues to be refined.
The EU AI Regulation will likely provide further clarification regarding documentation requirements and copyright compliance. This could lead to a harmonization of practices within the EU, but also to differences compared to other jurisdictions.
Technological development will be a key factor: If AI models can be effectively trained with less data or with synthetic data in the future, this could alleviate copyright issues. At the same time, new techniques for detecting and compensating for the use of copyrighted content could be developed.
The Anthropic case marks a significant turning point in the development of the AI industry. It demonstrates that the legal framework for training AI models is not yet fully clear and that both AI companies and rights holders must find new ways to reconcile their interests. The $1.5 billion settlement could be the beginning of a new era in which the use of copyrighted content for AI training is conducted on a fairer and more transparent basis.
EU/DE Data Security | Integration of an independent and cross-data-source AI platform for all business needs
AI Game Changer: The most flexible AI platform - Tailor-made solutions that reduce costs, improve your decisions and increase efficiency
Independent AI platform: Integrates all relevant company data sources
- Rapid AI integration: Tailor-made AI solutions for businesses in hours or days, instead of months
- Flexible infrastructure: Cloud-based or hosting in your own data center (Germany, Europe, free choice of location)
- Maximum data security: its use in law firms is irrefutable proof
- Deployment across a wide variety of enterprise data sources
- Choice of own or different AI models (DE, EU, USA, CN)
More about it here:
We are there for you - advice - planning - implementation - project management
☑️ SME support in strategy, consulting, planning and implementation
☑️ Creation or realignment of the AI strategy
☑️ Pioneer Business Development
I would be happy to serve as your personal advisor.
You can contact me by filling out the contact form below or simply call me on +49 7348 4088 965 (Munich) .
I'm looking forward to our joint project.
Xpert.Digital - Konrad Wolfenstein
Xpert.Digital is a hub for industry with a focus on digitalization, mechanical engineering, logistics/intralogistics and photovoltaics.
With our 360° business development solution, we support well-known companies from new business to after sales.
Market intelligence, smarketing, marketing automation, content development, PR, mail campaigns, personalized social media and lead nurturing are part of our digital tools.
You can find out more at: www.xpert.digital - www.xpert.solar - www.xpert.plus

























