New day, new controversy surrounding artificial intelligence. This time Meta has been accused using pirated torrent content to train its large-language model (LLM) Llama which powers Meta AI. This was one of first copyright suits filed against a tech firm for training AI.
Documents reveal Meta AI was trained using pirated content
as reported by Wiredwas sued in 2023 by Meta for allegedly using pirated content to train Llama, its LLM. The case was renamed “Kadrey and others”. Richard Kadrey, Christopher Golden and other novelists filed a lawsuit against Meta Platforms, claiming that Meta had used copyrighted material without authorization.
Meta had previously provided documents to the court with redacted information, but Judge Vince Chhabria of the United States District Court for the Northern District of California ruled that the original documents be made public. This is what happened. The documents
reveal conversations between Meta staff about Meta AI and Llama. In one conversation, an engineer said that “torrenting on a [Meta-owned] company laptop doesn’t feel like it,” which confirms that the company used pirated material to train its AI. Another conversation suggests “MZ” was Mark Zuckeberg, who authorized the use pirated material.
Evidence indicates that Meta used content from LibGen – a large library of pirated academic articles, magazines, and books. LibGen, a Russian “piracy hub” created in 2008, has been the subject of multiple copyright lawsuits ever since. Meta also reportedly used material from other “shadow library” for AI training.
According to the company, it used public materials in accordance with the legal doctrine of ‘fair use’ which allows copyrighted material to be used without permission under certain circumstances. These are evaluated on a case by case basis. Meta claims it is simply “using text to statistically modify language and generate original expression.”