When AI Meets Copyright: How the Meta Lawsuit Challenges Open-Source Software Foundations

Fernando Adrián García Marc

CLO @ Fossity

Published Jun 19, 2025

In a landmark case with wide-reaching implications for artificial intelligence, open-source development, and copyright law, Meta is facing a lawsuit from a group of authors—including Sarah Silverman and Richard Kadrey—who allege the company used pirated books to train its LLaMA large language models (LLMs). At the heart of the dispute is not only the question of whether training AI on copyrighted materials constitutes “fair use,” but also how this intersects with the open-source community’s ethos and legal frameworks.

While Meta released LLaMA under a source-available license (not truly open source per OSI definitions), the model has been widely adopted and repurposed by open-source developers. This case raises serious concerns: Is the open-source community unintentionally building on legally questionable foundations? And if so, who is responsible?

The Lawsuit: Authors vs. Meta

The class-action lawsuit, Kadrey et al. v. Meta Platforms, Inc., was filed in 2023 in the U.S. District Court for the Northern District of California. The plaintiffs argue that Meta trained its models on a dataset derived from “Books3,” a large corpus of text sourced from pirated versions of copyrighted books found on shadow libraries like Library Genesis (LibGen) and Z-Library.

The core claims center around copyright infringement. The authors assert that their works were reproduced and used without permission in a manner that was neither transformative nor fair. Meta, on the other hand, argues that training a model on large datasets falls within the boundaries of “fair use” because the model doesn’t retain or reproduce the text directly—only “learns” patterns from it.

The outcome of this case could redefine how copyright law applies to machine learning. But what makes it particularly urgent is how tightly it is now entangled with the open-source world.

The Open-Source Connection

Although Meta itself did not release LLaMA under an OSI-approved open-source license (its terms restrict commercial use), the models were leaked online shortly after release—and quickly became central to open-source AI innovation. Developers and researchers across the globe used them as base models for open projects like Alpaca, Vicuna, and Mistral. This created a paradox: a flourishing open-source ecosystem built upon potentially infringing data.

This legal uncertainty threatens to undermine trust and sustainability in the open-source AI space. Developers typically rely on transparency, permissive licensing, and legal clarity to build and share their work. If foundational models are later found to be legally problematic, that undermines not only projects directly based on them but also derivative works, forks, and downstream commercial applications.

It also exposes developers and companies to potential legal risk—even if they had no role in the original data collection. In a worst-case scenario, entire open-source model ecosystems could face takedown requests, IP disputes, or compliance obligations retroactively.

Legal Implications for Open-Source Projects

If the court sides with the authors and finds Meta's training methods infringing, it could signal a broader reckoning for how datasets are collected and used in AI training—especially within the open-source context. Key consequences may include:

Due diligence for datasets: Open-source developers may need to scrutinize training data origins more carefully before building or fine-tuning models.
Increased risk for downstream users: Developers who build on open models like LLaMA or Falcon may face legal questions about the lineage of the code or data.
Calls for transparent data sourcing: There will likely be a stronger push for datasets that are open, verifiable, and copyright-compliant to serve as safe foundations for OSS-based AI.

It may also pressure organizations to adopt true OSI-approved licenses and clearer data usage disclosures to protect both themselves and their contributors.

A Defining Moment for Open Source in AI

This lawsuit is more than just a copyright clash between Big Tech and authors—it’s a stress test for the open-source AI ecosystem. Can the principles of openness, legality, and collaboration survive when the underlying foundations may be tainted?

The path forward lies in transparency, responsible data sourcing, and alignment between the open-source and legal communities. As AI continues to reshape the software landscape, the need for legally sound and ethically sourced open models becomes not just a preference—but a necessity.

Note: The preceding text is provided for informational purposes only and does not constitute legal nor business advice. The views expressed in the text are solely those of the writer and do not necessarily represent the views of any organization or entity.

When AI Meets Copyright: How the Meta Lawsuit Challenges Open-Source Software Foundations

Fernando Adrián García Marc

CLO @ Fossity

The Lawsuit: Authors vs. Meta

The Open-Source Connection

Legal Implications for Open-Source Projects

A Defining Moment for Open Source in AI

OpenScope

2,737 followers

More articles by this author

Others also viewed

The AI Copyright Battle: OpenAI vs. The New York Times

Creative Commons' Q2 Policy Review

Copyright in the Generative AI Era

Copyright Battle Over AI Training

GENERATIVE AI AND COPYRIGHT: THE EUROPEAN PARLIAMENT FACES THE DILEMMA OF DIGITAL CREATIVITY

Navigating Legal Uncertainty in Global Copyright Fragmentation in the Age of AI: A Game Theory Perspective

🚨AI & Tech Legal Digest || June 27, 2025

UFRF Copyright Infringement Analysis

Meta Lawyers Up: The AI Copyright Battle Intensifies

Creative Pattern IP: A New Framework for Protecting Artists in the Age of AI

Explore topics

The Lawsuit: Authors vs. Meta

The Open-Source Connection

Legal Implications for Open-Source Projects

A Defining Moment for Open Source in AI

OpenScope

2,737 followers

Why OSS Audits Matter in Insured M&A Deals

Aug 8, 2025

ISO/IEC 5230: What You Need to Know for Open-Source Software Audits

Jul 31, 2025

The OSS Compliance Checklist Every Startup Should Have by Series A

Jul 24, 2025

SaaS ≠ Safe: The Open-Source Software Compliance Risks Lurking in Your Dependencies

Jul 17, 2025

IBM + HashiCorp: Can an Acquisition Kill (or Save) Open-Source?

Jul 10, 2025

Why Spree Went AGPL — and What It Means for Open-Source Business Models

Jul 3, 2025

Open Collaboration, Stronger Code: How Big Companies Are Powering the Future of Automotive Software through Open Source

Jun 26, 2025

Open-Source Software Audit Essentials: What CEOs Need to Know to Protect Their Company

Jun 12, 2025

When Free Isn't Sustainable: Rethinking Open-Source Licensing and Monetization

Jun 5, 2025

From Chaos to Code Control: The OSPO Every Tech Company Needs

May 29, 2025

Others also viewed

The AI Copyright Battle: OpenAI vs. The New York Times

Creative Commons' Q2 Policy Review

Copyright in the Generative AI Era

Copyright Battle Over AI Training

GENERATIVE AI AND COPYRIGHT: THE EUROPEAN PARLIAMENT FACES THE DILEMMA OF DIGITAL CREATIVITY

Navigating Legal Uncertainty in Global Copyright Fragmentation in the Age of AI: A Game Theory Perspective

🚨AI & Tech Legal Digest || June 27, 2025

UFRF Copyright Infringement Analysis

Meta Lawyers Up: The AI Copyright Battle Intensifies

Creative Pattern IP: A New Framework for Protecting Artists in the Age of AI

Explore topics