New secret math benchmark stumps AI models and PhDs alike

Epoch AI let Fields Medal winners Terence Tao, and Timothy Gowers, review portions of the benchmark. Tao stated in feedback provided by Epoch. “I think that in the near term basically the only way to solve them, short of having a real domain expert in the area, is by a combination of a semi-expert like a graduate student in a related field, maybe paired with some combination of a modern AI and lots of other algebra packages.”

Credit: Epoch AI Epoch AI

To help in the verification of the correct answers during testing the FrontierMath questions must have answers that are automatically checked through computation. These can be exact integers or mathematical object. The designers created problems “guessproof” that required large numerical answers or complicated mathematical solutions with less than a 1% chance of correct guesses.

In a blog post, Evan Chen, a mathematician, explained his thoughts on how FrontierMath differs to traditional math competitions such as the International Mathematical Olympiad. He says that the problems in this competition require creativity and insight, but do not require complex implementation or specialized knowledge. Chen wrote that FrontierMath was the best solution.

While IMO solves problems without specialized knowledge or complex calculations, FrontierMath embraces both. “Because an AI system has vastly greater computational power, it’s actually possible to design problems with easily verifiable solutions using the same idea that IOI or Project Euler doesโ€”basically, ‘write a proof’ is replaced by ‘implement an algorithm in code,'” Chen explained.

They plan to evaluate AI models regularly against the benchmark and expand their problem set. They will be releasing additional sample problems to help researchers test their systems in the coming months.

Read More

More from this stream

Recomended


Notice: ob_end_flush(): Failed to send buffer of zlib output compression (0) in /home2/mflzrxmy/public_html/website_18d00083/wp-includes/functions.php on line 5464