by Marty Swant * 20 January 2025 *
The last week saw two high-profile AI battles in the spotlight. Updates on separate copyright cases were made against Meta and OpenAI.
Court records unsealed in a copyright case against Meta have raised new questions regarding the use of e books from a book piracy website Library Genesis (LibGen). These documents raise new questions as to how much Mark Zuckerberg, Meta’s CEO, and other Meta executives knew about the use of pirated content by Meta teams to train their Llama models.
Court Documents claim that Meta employees tried to remove copyright information, including headers and identifiers, from various materials. One filing shows a Meta internal document that suggests removing lines containing words such as “ISBN,” copyright,” and “all right reserved.” The filing contains messages between employees discussing the desire to compete against other AI rivals. This includes beating OpenAI’s GPT-4, while also describing French competitor Mistral as “peanuts.” The testimony of Zuckerberg from his December deposition is available here. Zuckerberg said that broad descriptions make the use pirated content “seem like a bad thing”but added that Meta’s teams “think through this carefully, because there are often nuances more than is kind of obvious at first.” (Meta didn’t reply to Digiday’s request for comments about the court documents.
The LibGen dataset includes titles by top authors such as Ta-Nehisi coates and Sarah Silverman who are among those who filed the lawsuit. Zuckerberg claimed to not be familiar with LibGen. The plaintiff’s lawyer then asked Meta if it would do business with an organization that boasts about using pirated material.
Zuckerberg said that if someone was broadcasting loudly about doing something illegal, it would be a big red flag. I’d want to examine this closely before engaging with that person in any way.
When asked if Meta shouldn’t be downloading materials from sites known to have pirated material, Zuckerberg replied that YouTube hosts “some percentage” of pirated content, even if the majority of the content is “kinda good and they have license to do.” “But even then I don’t believe that I would have said that I wouldn’t have wanted people at Meta to not use YouTube at that point.” So — I don’t really know.
Documents indicate Meta executives were aware that Llama’s training data contained LibGen and other copyrighted material from sources such as CommonCrawl. Documents suggest Meta teams were aware of the potential for blowback and fines under the EU AI Act if LibGen was discovered. One document mentioned Meta team suggesting datasets be red-team to filter out information about bio-weapons or harmful stereotypes.
NYT V. OpenAI and Microsoft.
The Meta case comes as tech companies are under more scrutiny about the types of content they use to train large language model. In a separate case between The New York Times vs. OpenAI, attorneys presented oral arguments to the court that outlined the key points each side is preparing as part of their case. Plaintiffs in both cases claim that tech companies stripped copyright data from content used to build AI models.
Steven Lieberman, a lawyer representing the New York Daily News who filed a separate lawsuit against OpenAI and Microsoft, said: “You leave people open to massive copyright infringement, without the ability of tracing it.” Publishers sign new AI deals beyond court
Last week Axios announced a new partnership with OpenAI that included funding for new local Axios Newsrooms in four major cities, including Pittsburgh, Pa., and Kansas City, Mo. Axios also gains access to OpenAI’s technology to create new AI products, systems and processes. In a blog Axios CEO Jim VanderHei stated in a post that the three-year agreement will also grant all Axios employees access to OpenAI Enterprise Version.
This wasn’t last week’s news about AI-powered journalism. The Associated Press announced a partnership with Google that will see the AP feed real-time news to Google’s Gemini application. The blog posts did not disclose the details of the deal, but they did note that it will help “enhance usefulness of results” in the Gemini app. Kristin Heitmann is the chief revenue officer of the AP. According to the updates are part and parcel of the ongoing relationship between the two companies and “based on the working together to provide timely and accurate news and information for global audiences.” Another company beginning with “A” also took a step away. Apple’s announcement last week was a major one. After criticism of the inaccuracy of AI-summarized notifications, suspended its use. A new ‘AI-summarized notification’ has been launched. DoubleVerify’s report detailed a network of over 200 websites that generated revenue. “AI slop”, which mimics real publishers, while misleading adtech providers and buyers.
Other AI news and announcements — Prompts and products
- Anthrologic is a new startup founded by former MediaMonks executives with the aim of helping brands create AI Agents.
- Adobe has launched a new generative tool for its Firefly platform, which aims to provide retailers with more ways to scale customized content.
- In the U.S. Supreme Court, a ban on TikTok was upheld unless it is sold to a U.S. company.
- Competition Markets Authority of the U.K. The CMA announced that it would be conducting a new investigation to determine whether Google has “strategic status” under the newly enacted UK competition law. The CMA is investigating to ensure that AI startups can compete fairly with Google’s AI products and services.
- FTC has been investigating Snapchat’s My AI chatbot. The U.S. Justice Department has been notified that the investigation has been referred to them. According to the FTC, the investigation covers “alleged risks and harms” to young users. The FTC has decided to make the referral public, even though it is not usually done.
Other AI stories from Digiday
- As influencer vetting tools evolve, agencies are also discovering the limitations of the technology
- What happens to marketers once the cultural “cheat code” of TikTok disappears?
- OpenAI, The New York Times debate copyright infringement of AI tech companies in trial arguments
- Brands are seeing an influx of traffic from ChatGPT and Google Gemini
- What the agentic AI era means for ad agencies, with Omnicom’s Jonathan Nelson
- Media Briefing: Dotdash Meredith’s Jon Roberts on the AI agenda in 2025
- CES Briefing: Agentic AI era heralds SEO overhaul, Q&A with Mastercard’s Raja Rajamannar & Dotdash Meredith’s OpenAI ad assist
- Media Buying Briefing: Looks like brand safety’s back on the menu
https://digiday.com/?p=566091