OpenAI and The New York Times discuss copyright infringement by AI tech companies during the first trial arguments.

The trial for copyright infringement between The New York Times, OpenAI began in a federal hearing on Tuesday.

The judge heard arguments from both parties regarding a motion for dismissal brought by OpenAI, its financial backer Microsoft. The New York Times, as well as The New York Daily News, and the Center for Investigative Reporting have all filed lawsuits against OpenAI. OpenAI and Microsoft claim they used the content of publishers to train their large-language models that power their generative AI bots. They claim that by doing so, the tech companies are in competition with the publishers, using the content of those publishers to answer users’ questions. This takes away the incentive for users to visit those sites for this information, and ultimately hurts their ability to monetize these users through digital advertising or subscriptions.

OpenAI, Microsoft and others claim that what they are doing is covered under “fair use,” which allows the use copyrighted materials to create something new that does not compete with the original. The outcome of the lawsuit will have a significant impact on the entire digital media industry, as it will determine whether generative AI tools can use copyrighted works without the consent of the publisher for training. Here are the main arguments that were made during the trial.

The New York Times argument

Copyrighted Content

OpenAI uses The New York Times content to train large language models. Sometimes, this is done by making copies of the content, claim the plaintiffs. Sometimes entire articles or several paragraphs from the training dataset are returned to a user in response. In some cases, the LLM will also respond to a prompt with fresh content that it didn’t use in its training (due to a cut-off time). Plaintiffs provided examples of outputs with verbatim language, or summaries of articles from The New York Times without attribution.

LLMs are unable to process information as humans can

People can read something and understand the underlying information, but that is not considered copying. The New York Times lawyers claim that LLMs are not able to do this because they are machines. The models only absorb the “expression” and not the actual facts. This should be considered a copyright infringement.

A generative search engine is different from a conventional search engine.

Instead of providing links to the original sources, which a publisher could monetize through advertising or subscriptions, a generative engine provides the answers to questions with the sources in the footnotes. The New York Times lawyers claim that footnotes can contain multiple sources, which makes it difficult for publishers to attract users to their website.

Avoiding paywalls

OpenAI offers custom GPTs with products to help users remove paywalls. OpenAI removed the product SearchGPT after it became aware that products were being misused to infringe. “Users posted on Reddit forums and other social media about how they had gotten around paywalls using a product named SearchGPT,” said Ian Crosby. He is a partner with Susman-Godfrey, and The New York Times lead counsel.

Time sensitive content is stripped without attribution.

The New York Times lawyers claimed that content from The Times’ Wirecutter product recommendation site was being used without appropriate attribution. This meant Wirecutter would lose revenue from people who did not click through to the site or on affiliate links. This stripped-down content was often time-sensitive. For example, product recommendations around Black Friday. They claim that the content should be protected under a “hot-news” doctrine, which is part of copyright laws and protects time sensitive news from being used. The lawyers argued that ChatGPT misrepresented some products as being endorsed by Wirecutter, damaging the brand’s image.

OpenAI’s and Microsoft’s arguments.

The fair use doctrine.

Lawyers for OpenAI said that the copyrighted material in question is allowed under the fair use doctrine. AI companies are staunch supporters of the doctrine that allows copyrighted material to be used without permission, as long as it is used in a non-commercial context and not in a manner that would harm the owner of the copyright.

Annette Hurst is an attorney for Microsoft. She said that LLMs can adapt language and ideas for “everything, from curing cancer, to national security”: “The plaintiffs have alleged in their own words that this technology could be commercialized for billions of dollars, without regard to how.”

The LLMs themselves

Defense lawyers also disagreed with plaintiff attorneys when it came to explaining how large language models worked. OpenAI’s lawyer, for example, said that the company’s LLMs do not store copyrighted material, but rely instead on the weights derived from training data.

If I tell you, ‘Yesterday, all my troubles appeared so,’ then we will all think [think] ‘far away’ because we’ve been exposed to this text so many times, said Joe Gratz of Morrison & Foerster, who represented OpenAI. “That doesnโ€™t mean you have a song somewhere in your head.”

Statute of limitations.

The lawyers claimed that the lawsuit should not be allowed due to the three-year statue of limitations for copyright cases. The Times’ attorneys note that it was not possible to know in April 2021 whether OpenAI would use the publishers’ content for a purpose that would harm them.

“Misleading” examples

Lawyers from the Times claim they have found millions of examples that support their case. OpenAI, however, argued that plaintiffs were misleading by using examples of how ChatGPT duplicates copyrighted material and how AI-generated answers cite the Times. Defense lawyers claim that the Times also exploited ChatGPT to create AI content that violated OpenAI terms. Lawyers also noted that OpenAI had sought to address these weaknesses.

There is no proof of harm.

The Times claims that OpenAI removed copyright management information, such as mastheads and author bylines. OpenAI and Microsoft claim that the plaintiffs have not proven how their rights were violated by removing CMI. They also claim that plaintiffs haven’t proven that OpenAI and Microsoft knowingly infringed copyrighted works. Plaintiff lawyers, however, said that past court rulings recognized copying of copyrighted material as infringement without the need to prove distribution or economic loss.

Gratz said, “Their biggest issue is that they don’t know how they would benefit if the CMI that they claim was removed were actually removed.” “… The world would not be better for the CMI they claim was removed if it was never removed.

The Times’ lawsuit, which is one of many that OpenAI faces, is not the only one. OpenAI won one case in November. Other ongoing lawsuits include complaints from a Canadian group of news publishersa U.S. newspaper group owned by Alden Capital and a class-action lawsuit filed by an author group. (OpenAI Perplexity, and Microsoft were all dragged into the ongoing Google antitrust lawsuit when Google sent subpoenas.)

There are other major tech giants and startups that have their own legal battles relating to AI and copyright. Meta is facing a class-action lawsuit filed by a group including Sarah Silverman. Google faces a lawsuit by the Authors Guild. Perplexity was named as a defendant by News Corp in an October lawsuit.

The exact date that U.S. judge Sidney Stein decides whether or not to allow the case to proceed is unknown. Megan Gray, founder of GrayMatters Law & Policy and an attorney, attended the hearing and noted that Stein seemed “in it for a long haul” and was unlikely to dismiss the case this early. Gray stated that despite his age and lack technical sophistication, Judge Stein was curious and engaged. He is a very good judge. He knows the cases and the positions. He doesn’t normally provide an audio line for the public and the fact that he did so here indicates that he is well familiar with the import of the case and its impact on society.”

https://digiday.com/?p=565500

www.aiobserver.co

More from this stream

Recomended


Notice: ob_end_flush(): Failed to send buffer of zlib output compression (0) in /home2/mflzrxmy/public_html/website_18d00083/wp-includes/functions.php on line 5464