$3,000 per book: AI company Anthropic pays $1.5 billion to authors in copyright dispute

Konrad Wolfenstein

2 months ago

$3,000 per book: AI company Anthropic pays $1.5 billion to authors in copyright dispute – Image: Xpert.Digital

Anthropic and the Billion-Dollar Settlement: A Paradigm Shift in AI Copyright

What does the Anthropic case mean for the AI industry?

Why did the AI company Anthropic agree to pay authors $1.5 billion, even though training AI models with copyrighted works might be legal? This question is currently occupying the entire technology industry, as the case could mark a turning point in the conflict between AI developers and rights holders.

The case is particularly noteworthy because Anthropic, the provider of the Claude chatbot, was not sued for using copyrighted books to train its AI, but for the way it obtained this data. The US court found that while training an AI with copyrighted texts could, under certain circumstances, be covered by the American Fair Use principle, downloading the content from illegal sources was not. The crucial point was that Anthropic demonstrably knew of the illegal origin of the data.

Suitable for:

Anthropic and the Claude AI: The Rise to AI Giant – Evaluation, Competition and Ethical Visions

How did this historic agreement come about?

What were the specific allegations against Anthropic? The authors accused the company of downloading around 500,000 books and texts without permission from two copyright-infringing online databases. This data was then used to train the AI chatbot Claude, which is considered one of the main competitors to OpenAI's ChatGPT.

The settlement stipulates that Anthropic will pay approximately $3,000 in compensation for each affected work—approximately €2,500. This amount is four times the statutory minimum damages under U.S. copyright law. In addition, Anthropic must destroy the pirated documents and all copies, but retains the rights to legally acquired and scanned books.

Why did Anthropic agree to this settlement? The company wanted to avoid a lawsuit that could have resulted in fines of up to $150,000 per book. With 500,000 books affected, this would have resulted in a potential payment of up to $75 billion—a life-threatening sum even for a company that recently raised $13 billion.

What is the difference between the legal situation in the USA and Germany?

How would a similar case be assessed in Germany? Unlike American law, German copyright law does not have a fair use doctrine that allows for flexible case-by-case assessment. Instead, specific limitations are clearly defined for specific purposes that restrict the rights of authors.

With the implementation of the EU Copyright Directive, Germany created Section 44b of the Copyright Act (UrhG), which regulates so-called text and data mining (TDM). This provision permits the automated analysis of large amounts of data—whether text or images—to extract information from them. The training of an AI generally falls under this regulation.

But what restrictions apply to commercial providers? The TDM permit has one crucial catch: Rights holders can object to the use of their works for commercial TDM. This so-called reservation of use must be made in a machine-readable form, for example, in the metadata or the terms of use of a website.

The EU DSM Directive distinguishes between two types of text and data mining: Article 3 permits TDM for scientific research purposes by research institutions and cultural heritage institutions, provided they have lawful access to the works. This exception is mandatory and cannot be excluded by contractual clauses. Article 4, on the other hand, permits general TDM for any purpose, including commercial ones, but with the important restriction of the opt-out procedure.

Which technical aspects play a role in the legal assessment?

Why is the technical functioning of AI training so important for legal assessment? A recent study by the Copyright Initiative, conducted by Professor Tim W. Dornis and Professor Sebastian Stober, sheds light on the black box of AI training. The researchers conclude that, technically speaking, training generative AI models is not classic text and data mining, but rather constitutes a form of copyright infringement.

What happens technically during the training of AI models? The process involves several steps that are subject to copyright law: First, the data is systematically collected, which already constitutes duplication under copyright law. Then, the collected data is stored on servers and prepared for training. Finally, the AI model analyzes the data and extracts patterns, styles, and information.

A particularly critical point is so-called memorization: The training data is memorized in whole or in part by current generative models and can therefore be regenerated and replicated with appropriate prompts from end users. This goes far beyond pure analysis, which is the focus of traditional text and data mining.

How does Claude position itself in competition with ChatGPT?

What impact does the copyright dispute have on Anthropic's market position? Despite the legal issues, Claude has established itself as a serious competitor to ChatGPT. According to current market analyses, Anthropic now holds 32 percent of the market share for large language models in enterprises, while OpenAI ranks second with 25 percent.

Anthropic's position is particularly strong in the programming space: With a 42 percent market share, the company is by far the largest provider there, more than double OpenAI's 21 percent. Claude owes this dominance primarily to its impressive context window of 200,000 tokens, which enables the processing of entire business reports in a single pass.

What are Claude's specific strengths compared to ChatGPT? Claude is often praised for its more "human" communication and nuanced understanding of complex concepts. Its focus on ethical AI development and security has established Anthropic as a trusted provider for companies that place particular emphasis on responsible behavior in sensitive applications.

Anthropic relies on Constitutional AI, a process that integrates ethical guidelines directly into its models. This helps prevent harmful or biased output and builds a high level of trust with users. While OpenAI also works in AI safety, Anthropic's explicit commitment to developing ethically oriented AI models gives it a notable advantage.

What other lawsuits are affecting the AI industry?

Is the Anthropic case just the tip of the iceberg? In fact, there are over 40 lawsuits pending against AI technology providers in the US for copyright infringement. OpenAI, for example, was sued by the New York Times, and further lawsuits are pending against Anthropic following this settlement, including from music publishers and the online platform Reddit.

Apple has also recently become the target of copyright lawsuits: Authors have sued the technology company for allegedly illegally using their copyrighted books to train its AI systems. The plaintiffs accuse Apple of copying the copyrighted works without consent, attribution, or compensation.

In Germany, GEMA is the first collecting society worldwide to file a lawsuit against OpenAI for unlicensed use of protected musical works. GEMA accuses OpenAI of reproducing protected song lyrics by German authors without acquiring licenses or compensating the authors.

How is the opt-out issue developing?

What does the opt-out procedure mean in practice for rights holders? Under German law, authors and rights holders can declare a machine-readable usage restriction to exclude their works from TDM use. For example, Sony Music Group has published a "Declaration of AI Training Opt Out" to protect its content from unauthorized AI use.

However, the practical implementation of the opt-out mechanism is complex: How exactly such a reservation must be declared technically and legally effective, and how AI developers should handle it, has not yet been conclusively clarified. There are concerns that a widely used opt-out could significantly restrict the training data for AI models in Europe.

AI companies must respect these usage restrictions and may not circumvent them. If a work is to become part of the training data corpus despite the restrictions, the developer must enter into licensing negotiations with the rights holder. This will create a new licensing market, which, however, is not yet established.

A new dimension of digital transformation with 'Managed AI' (Artificial Intelligence) - Platform & B2B Solution | Xpert Consulting

A new dimension of digital transformation with 'Managed AI' (Artificial Intelligence) – Platform & B2B Solution | Xpert Consulting - Image: Xpert.Digital

Here you will learn how your company can implement customized AI solutions quickly, securely, and without high entry barriers.

A Managed AI Platform is your all-round, worry-free package for artificial intelligence. Instead of dealing with complex technology, expensive infrastructure, and lengthy development processes, you receive a turnkey solution tailored to your needs from a specialized partner – often within a few days.

The key benefits at a glance:

⚡ Fast implementation: From idea to operational application in days, not months. We deliver practical solutions that create immediate value.

🔒 Maximum data security: Your sensitive data remains with you. We guarantee secure and compliant processing without sharing data with third parties.

💸 No financial risk: You only pay for results. High upfront investments in hardware, software, or personnel are completely eliminated.

🎯 Focus on your core business: Concentrate on what you do best. We handle the entire technical implementation, operation, and maintenance of your AI solution.

📈 Future-proof & Scalable: Your AI grows with you. We ensure ongoing optimization and scalability, and flexibly adapt the models to new requirements.

More about it here:

The Managed AI Solution - Industrial AI Services: The key to competitiveness in the services, industrial and mechanical engineering sectors

Licensing market for AI data: opportunity for publishers or risk for startups?

What role does the EU AI Regulation play?

How does the new EU AI Regulation affect copyright law? While the AI Regulation does not contain any new provisions regarding exceptions to copyright law, it does clarify that the use of copyrighted content requires the permission of the rights holder, unless a limitation applies.

All providers of general-purpose AI models must comply with comprehensive documentation requirements. This includes a detailed description of the data used for training, including the type and origin of the data and the processing methods. In particular, they must ensure the identification and compliance with legal reservations under the DSM Directive.

The Office for Artificial Intelligence monitors compliance with these provisions, but does not examine copyright infringements on a work-by-work basis. By referring to the DSM Directive, the AI Regulation clarifies that the legislator assumes the text and data mining exception applies to the training of generative AI models.

Suitable for:

Anthropic Claude Gov: Exciting AI development for US national security

How do scientific and commercial use differ?

What special regulations apply to scientific research? In a groundbreaking ruling, the Hamburg Regional Court ruled that research organizations may, under certain conditions, use copyrighted works for training artificial intelligence. The case concerned the use of a copyrighted image by a research organization that had created a comprehensive image-text dataset for training generative AI models.

The court ruled that the creation of an AI training dataset can fall under freedom of research, even if commercial companies later use the data obtained in this way. The crucial point is that the initial creation of the dataset serves to gain knowledge. The term "scientific research" is interpreted broadly in this context.

Section 60d of the German Copyright Act (UrhG) permits scientific text and data mining by research institutions such as universities for non-commercial scientific research. Consent from the rights holders is not required. This contrasts with commercial use, where the opt-out procedure applies.

What international differences exist?

How are other countries addressing the AI copyright issue? Japan is considered particularly innovation-friendly and already amended its copyright law in 2018. Article 30-4 of the Japanese Copyright Act introduces a flexible exception for uses that are not for the "enjoyment" of the work. This is often interpreted to include the training of AI models, as long as the goal is data analysis and not the consumption of the work itself.

The United Kingdom has taken its own path after Brexit. There were consultations on copyright exceptions for AI developers, particularly for TDM. However, the proposals met with significant concerns from the creative industries, so the future direction remains uncertain.

China amended its Copyright Law in 2020, generally strengthening intellectual property protection. Specific regulations for AI training are still under development, but the country recognizes the strategic importance of AI and is expected to seek pragmatic solutions.

What does the case mean for other AI companies?

What lessons can other AI companies learn from the Anthropic case? The comparison shows that the provenance of training data is crucial. While training with legally acquired data may be covered by fair use or TDM barriers, the use of illegally obtained data can result in significant fines.

AI companies have increasingly entered into licensing agreements with copyright holders to gain access to content. OpenAI, for example, has signed deals with various media companies, and other providers are following suit. The Anthropic settlement could accelerate this development and lead to an established licensing market.

It is crucial for providers of AI models and AI systems to use trusted providers when generating training data, who respect the intellectual property rights of others when creating the data. Even if there is no knowledge of copyright infringement, legal consequences may arise.

How will the market for AI training data develop?

Is a new licensing market for AI training data emerging? The Anthropic case and similar lawsuits suggest that a structured market for licensing content for AI training could be emerging. Publishers, authors, and other rights holders are increasingly recognizing the value of their content for AI development.

At the same time, AI companies face the challenge of acquiring high-quality and legally sound training data. The costs of such licenses can be significant, especially for smaller companies that lack the resources of Anthropic or OpenAI.

The emergence of specialized data providers that create and license legally compliant training datasets is a logical consequence of this development. These providers could act as intermediaries between rights holders and AI developers, ensuring that all legal requirements are met.

What impact does this have on innovation and competition?

Does the tightened legal situation hinder innovation in AI development? This question is controversial. Proponents of strict copyright rules argue that creatives and rights holders should be fairly compensated for the use of their works. However, the large amount of training data required and the associated licensing costs could lead to a concentration of the market among a few large providers.

Smaller companies and startups would be unable to afford the necessary licenses, limiting their ability to develop competitive AI models. Paradoxically, this could lead to less innovation and less competition, as only well-funded companies like Anthropic, OpenAI, or Google can provide the necessary resources.

On the other hand, the need to pay licensing fees could lead to more efficient training methods. AI developers could invest more in techniques that require less data or use synthetic data to reduce their dependence on licensed content.

How do rights holders and creatives position themselves?

What strategies are authors, publishers, and other rights holders pursuing? The Copyright Initiative and similar organizations are calling for greater consideration of copyright in AI training. They argue that AI companies using copyrighted works without consent and compensation constitutes "large-scale intellectual property theft."

Many rights holders are increasingly relying on opt-out mechanisms to protect their works from unwanted AI use. At the same time, they are exploring ways to profit from AI development through licensing agreements. This is leading to a complex mix of legal disputes and business opportunities.

The GEMA lawsuit against OpenAI demonstrates that collecting societies are also playing an active role in this dispute. As collectives, they could pool the interests of their members and conduct licensing negotiations with AI companies.

What are the long-term prospects?

How might the legal situation develop in the coming years? The Anthropic case may be just the beginning of a wave of settlements and court rulings redefining the rules for AI training. In the US, further cases could clarify the fair use doctrine with regard to AI, while in Europe, the practical application of the TDM limitations is being further refined.

The EU AI Regulation is likely to provide further clarifications regarding documentation requirements and copyright compliance. This could lead to harmonization of practices within the EU, but also to differences with other jurisdictions.

Technological development will be an important factor: If AI models can be effectively trained with less data or with synthetic data in the future, this could mitigate the copyright issue. At the same time, new techniques for detecting and compensating for the use of copyrighted content could be developed.

The Anthropic case marks an important turning point in the development of the AI industry. It demonstrates that the legal framework for training AI models is not yet fully clarified and that both AI companies and rights holders must find new ways to align their interests. The $1.5 billion settlement could mark the beginning of a new era in which the use of copyrighted content for AI training takes place on a fairer and more transparent basis.

EU/DE Data Security | Integration of an independent and cross-data source AI platform for all business needs

Independent AI platforms as a strategic alternative for European companies - Image: Xpert.Digital

Ki-Gamechanger: The most flexible AI platform-tailor-made solutions that reduce costs, improve their decisions and increase efficiency

Independent AI platform: Integrates all relevant company data sources

Fast AI integration: tailor-made AI solutions for companies in hours or days instead of months
Flexible infrastructure: cloud-based or hosting in your own data center (Germany, Europe, free choice of location)

Highest data security: Use in law firms is the safe evidence
Use across a wide variety of company data sources
Choice of your own or various AI models (DE, EU, USA, CN)

More about it here:

Independent AI platforms vs. hyperscalers: Which solution is right for you?

We are there for you - advice - planning - implementation - project management

☑️ SME support in strategy, consulting, planning and implementation

☑️ Creation or realignment of the AI strategy

☑️ Pioneer Business Development

Konrad Wolfenstein

I would be happy to serve as your personal advisor.

You can contact me by filling out the contact form below or simply call me on +49 89 89 674 804 (Munich) .

I'm looking forward to our joint project.

Write to me

➡️ Video call request 👩👱

Xpert.Digital - Konrad Wolfenstein

Xpert.Digital is a hub for industry with a focus on digitalization, mechanical engineering, logistics/intralogistics and photovoltaics.

With our 360° business development solution, we support well-known companies from new business to after sales.

Market intelligence, smarketing, marketing automation, content development, PR, mail campaigns, personalized social media and lead nurturing are part of our digital tools.

You can find out more at: www.xpert.digital - www.xpert.solar - www.xpert.plus

Keep in touch