Serving technology enthusiasts for more than 25 years. TechSpot is the place to go for tech advice and analysis.
A hot-button: Meta has been embroiled in an AI lawsuit that could revolutionize the way courts view copyright laws. From the plaintiffs’ perspective, it seems that the case is closed. If a judge decides otherwise, this could set a precedent that allows corporations to use copyrighted materials to train AI systems.
A group of writers sued Meta in California in January 2024 for using their work to train different versions of the Llama Large Language Model. Meta has admitted using the Book3 dataset. This 37GB compilation contains 195,000 copyrighted works that developers have been using to train LLMs. The company defends their actions by citing the Fair Use Doctrine. The court revealed documents earlier this year that showed Meta had used torrenting for its AI training data.
The authors filed a partial summary judgement in a California U.S. District Court on Monday, arguing that Meta’s alleged use pirated data leaves no legal ambiguity. The plaintiffs claim that Meta’s use torrenting to obtain copyrighted texts for artificial intelligence training is a clear copyright violation. The authors stated
“Whatever the merits of generative artificial intelligence, or GenAI, stealing copyrighted works off the Internet for one’s own benefit has always been unlawful,” in their filing.
According the documents unsealed, Meta first tried to download pirated book individually, but that process was too slow, and put excessive strain on their networks. The company then allegedly used torrenting, a file-sharing method that has been associated with copyright violations for years, to obtain terabytes worth of copyrighted book in bulk. This was far beyond the scope and size of the Books3 dataset.
Motion For Partial Summary Judgement via Ars Technica.
According to the authors, Meta was aware of all legal risks and took deliberate actions to conceal its activities. The company allegedly ran its torrent client via Amazon Web Services instead of Meta’s infrastructure, which is not a standard practice for the social network giant.
Ars Technica obtained the heavily redacted motionwhich points out that torrent users usually download (leech), and upload (seed), chunks of a document to allow faster downloads. If the files contain copyrighted materials, seeding and leeching are considered illegal. Meta may also have actively encouraged piracy through the distribution of copyrighted material by seeding torrents.
Plaintiffs feel a trial is not necessary and are seeking immediate judgment. The authors claim that Meta’s actions are clearly in violation of copyright laws and fall far outside the fair-use defense. A decision in Meta’s favor could set an extremely dangerous precedent that would extend far beyond books and allow AI developers to violate copyrights without compensating IP owners. The motion argues that
“[The court] should nevertheless grant summary judgment under the four fair use factors regarding Meta’s decision to make available to other P2P pirates millions of copyrighted books in exchange for faster download speed,” .
Although it appears to be a fairly straightforward case, presiding Judge Vince Chhabria admitted he did not understand torrenting or related terminology such as seeding and leeching. Judge Chhabria could deny the summary judgment motion, preferring to hear expert testimony and explain the case in order to make an honest and fair ruling.
No matter how the case ends, the final decision will be groundbreaking. If Meta wins, it will open the door for other AI designers to use pirated books, images or videos to train models. If the authors win, this sets a precedent for similar cases including those that are currently in the legal system. This could also lead to a further copyright reform similar to the Digital Millennium Copyright Act.