OpenAI’s defeat in the GEMA Case is a landmark moment for AI Copyright Law.

German Court Upholds GEMA’s Copyright Claims Against OpenAI

In a pivotal ruling, a German court sided with GEMA, the music rights organization, affirming that OpenAI’s use of copyrighted German songs to train its large language models (LLMs) was unlawful. This verdict contrasts sharply with ongoing unresolved cases in the United States, such as Anthropic vs. Universal Music Group (UMG) and The New York Times vs. OpenAI and Microsoft, where courts have yet to deliver definitive judgments.

Background: GEMA’s Lawsuit Against OpenAI

GEMA, representing composers, lyricists, and music publishers in Germany, initiated legal action after discovering that OpenAI incorporated nine popular German songs into its AI training datasets, including models like GPT-4. To test the AI’s knowledge, GEMA disabled ChatGPT’s web search and prompted it with requests such as “What are the lyrics to [song]?” or “Can you provide the chorus of [song]?” The AI reproduced exact lyrics, including 25 consecutive words from the hit “36 Grad” and over 70 words from “Über den Wolken”, both widely recognized German tracks.

To put this in perspective, imagine if an AI model trained on iconic English-language hits like “Bohemian Rhapsody” by Queen or “Shape of You” by Ed Sheeran and then reproduced their lyrics verbatim. Such replication would clearly infringe on copyright protections.

Legal Reasoning: Memorization Versus Reproduction Under EU Law

The court’s decision hinged on two critical concepts: the AI’s memorization of copyrighted content during training and its subsequent reproduction of that content in outputs. Under the European Union’s Text and Data Mining (TDM) exceptions, temporary data storage and analysis for training purposes are permitted, provided the data is not permanently retained or reproduced.

AI training typically involves three phases:

  1. Data collection and aggregation from various sources to build a comprehensive dataset.
  2. Model training, where the AI analyzes and learns patterns from the data.
  3. Generation of outputs based on the learned information.

While memorization during training is expected and allowed under TDM exemptions, the court found that OpenAI crossed the line by reproducing exact song lyrics in its responses, which constitutes permanent retention and violates EU copyright laws.

Comparative Case: Getty Images vs. Stable Diffusion

A similar dispute arose when Getty Images accused the AI model Stable Diffusion of unlawfully using its copyrighted photographs for training. However, the court ruled in favor of Stable Diffusion, emphasizing that the AI did not reproduce any copyrighted images in its outputs. This distinction underscores that mere memorization of copyrighted material during training does not equate to infringement-only direct reproduction does.

To illustrate, memorizing Shakespeare’s sonnets is not illegal, but publishing them verbatim under one’s own name is. This principle forms the crux of copyright infringement in AI contexts.

Ethical Dilemmas: When AI Mimics Without Copying

Although the court’s ruling permits the use of copyrighted works for training as long as no direct reproduction occurs, this creates a murky ethical landscape. For instance, an AI trained exclusively on Shakespeare’s complete works could generate new content that stylistically mirrors the Bard without copying exact lines. While legally permissible, this raises questions about the appropriation of an artist’s unique creative identity.

Similarly, if an AI trained on Ed Sheeran’s discography produces new songs that sound strikingly similar to his style, it blurs the line between inspiration and unauthorized imitation. This gray area challenges existing copyright frameworks and calls for nuanced regulation.

Contrasting Approaches: EU Versus US AI Copyright Litigation

The European Union’s AI Act and TDM exceptions aim to balance innovation with the protection of copyright holders. In contrast, the United States is grappling with prolonged legal battles and ambiguous rulings that often favor AI developers.

Anthropic vs. Universal Music Group (UMG)

UMG, along with other music publishers, sued Anthropic for allegedly using up to 500 copyrighted songs to train its Claude AI model. Anthropic defended its actions under the doctrine of fair use. Although the court allowed the lawsuit to proceed, it denied UMG’s request for a preliminary injunction, permitting Anthropic to continue using the disputed content during litigation. This decision effectively allows AI companies to operate in a legal gray zone, treating potential penalties as operational costs.

The New York Times vs. OpenAI and Microsoft

The New York Times filed suit against OpenAI and Microsoft, accusing them of training AI models on copyrighted news articles without permission. The defendants argued that only publicly accessible content was used. The case remains in pre-trial stages, with no injunctions issued, enabling continued use of the contested materials. This pattern reflects a broader trend in US courts, where copyright holders face uphill battles in halting AI companies’ practices.

The Scarlett Johansson Voice Controversy

In 2024, OpenAI introduced a voice feature for ChatGPT named “Sky,” which closely resembled actress Scarlett Johansson’s voice. Despite Johansson’s refusal to license her voice, OpenAI proceeded with a version allegedly performed by a voice actress. Public backlash was swift, highlighting concerns over unauthorized use of personal likenesses. Unlike the EU, the US lacks comprehensive legislation governing the misuse of personal attributes by AI, resulting in limited legal recourse.

Looking Forward: Navigating the Intersection of AI Innovation and Copyright Protection

Artificial intelligence undeniably represents a transformative technological frontier. However, the ethical and legal challenges surrounding the use of copyrighted materials in AI training cannot be overlooked. The European Union’s proactive stance offers a framework that respects creators’ rights while fostering innovation, whereas the United States continues to wrestle with legislative gaps and slow judicial processes.

Ultimately, independent artists, indigenous creators, and lesser-known copyright holders bear the brunt of these unresolved issues, often lacking the resources to defend their work or negotiate fair compensation. Strengthening copyright protections and clarifying AI-related laws, especially in the US, is imperative to ensure equitable treatment for all stakeholders in the digital age.

About the Author

Krishi is an experienced technology journalist specializing in artificial intelligence, consumer electronics, and PC hardware. With over four years of expertise, Krishi is dedicated to delivering clear, insightful, and accessible content that empowers readers to understand complex tech topics. His work has appeared in leading industry publications, and he maintains a keen interest in emerging trends, financial markets, and cricket.

More from this stream

Recomended