AI benchmarking organization criticized for waiting to disclose funding from OpenAI

January 20, 2025

OpenAI criticised for not disclosing funding to AI benchmarking organization

A math benchmarking organization for AI did not disclose its funding from OpenAI until a relatively recent date, leading to accusations of impropriety by some in the AI world.

EpochAI, a nonprofit funded primarily by Open Philanthropy – a research and grants foundation – revealed on December 20, that OpenAI had contributed to the creation of FrontierMath. FrontierMath was a test that uses expert-level mathematical problems to measure an AI’s mathematical abilities. It was one of the benchmarks OpenAI utilized to demonstrate its upcoming flagship AI o3.

The o3 flagship AI was demonstrated using the benchmark FrontierMath. In a post in the forum LessWrong by a contractor for Epoch AI, going by the name “Meemi”the contractor claims that many contributors to FrontierMath’s benchmark were not informed of OpenAI involvement until the news was made public. Meemi wrote

: “The communication regarding this has not been transparent.” “In my opinion, Epoch AI should’ve disclosed OpenAI funding and contractors should be able to see the potential for their work to be used as capabilities when choosing whether or not to work on a standard. Some Users expressed concern that the secrecy might erode FrontierMath’s reputation as an objective standard. OpenAI, in addition to backing FrontierMath had visibility into many of its problems and solutions — a fact Epoch AI did not divulge before December 20, when o3 announced.

The following is a description of the o3 benchmark. Carina Hong, a Stanford PhD mathematics student, wrote in a post on X that OpenAI had privileged access FrontierMath because of its arrangement with Epoch AI. This is not sitting well with some contributors. Hong stated that six mathematicians, who contributed significantly to the FrontierMath Benchmark [to me] … were unaware that OpenAI would have exclusive access to that benchmark (and other won’t), but that they did not know that OpenAI had this arrangement. “Most say they’re not sure if they would have contributed if they knew.”

Tamay Besiroglu is an associate director at Epoch AI, and one of its co-founders. He said that FrontierMath’s integrity had not been compromised but that Epoch AI made a “mistake” by not being more transparent.

Besiroglu wrote, “We were not allowed to disclose the partnership until the launch of o3, and in retrospect we should have negotiated more to be transparent as soon as possible to the benchmark contributors.” “Our mathematicians deserve to know who may have access to their work.” We should have made transparency a non-negotiable requirement of our agreement with OpenAI, even though we were contractually restricted in what we could say.

Besiroglu said that while OpenAI had access to FrontierMath it has a verbal agreement with Epoch AI to not use FrontierMath to train its AI. (Training AI on FrontierMath is similar to training a human). Teaching to the TestBesiroglu stated that Epoch AI has a separate holdout set which serves as a safeguard for independent verification FrontierMath benchmarks results. Besiroglu wrote that “OpenAI has… been fully supportive” of the decision to maintain an unseen, separate holdout set.

However muddying the water, Epoch AI lead mathematics Ellot Glazer In a Reddit post it was noted that Epoch AI had not been able to independently confirm OpenAI’s FrontierMath results.

Glazer stated, “My personal opinion that [OpenAI’s] is legit is that they didn’t use the dataset to train and they have no incentive for lying about internal benchmarking performance.” “However we cannot vouch for them before our independent evaluation is completed.”

This saga is another example of how difficult it is to develop empirical benchmarks for AI and secure the resources necessary for benchmark development, without creating the impression of conflict of interest.

Kyle Wiggers, a senior reporter for TechCrunch who has a special interest on artificial intelligence, is a reporter with a particular focus on the field. His writings have appeared in VentureBeat, Digital Trends and a variety of gadget blogs, including Android Police and Android Authority, Droid-Life and XDA-Developers. He lives in Brooklyn, with his partner who is a piano teacher, and plays the piano occasionally. Sometimes — but mostly unsuccessfully.

View Bio