Google DeepMind’s new AI agent solves real-world issues better than humans

Google DeepMind used large language models once again to find new solutions to longstanding problems in mathematics and computer science. This time, the firm has demonstrated that its approach is not limited to solving theoretical puzzles. Improve a variety of important real-world process as well

Google DeepMind’s new tool, AlphaEvolve uses the Gemini 2.0 large language models (LLMs), to produce code for a variety of different tasks. LLMs have a reputation for being hit or miss when it comes to coding. AlphaEvolve scores all of Gemini’s suggestions. It discards the bad ones and tweaks the good ones, iteratively, until it produces the best algorithm possible. In many cases, AlphaEvolve’s results are more accurate or efficient than the best (human-written solutions).

Pushmeet Kohli, a Google DeepMind vice president who leads the AI for Science team, says that “you can think of it as a super coding agent.” It doesn’t just suggest a piece or edit, but it produces a result nobody was aware of.

AlphaEvolve, in particular, came up with a solution to improve the software Google used to allocate jobs to the millions of servers across the globe. Google DeepMind says that the company has used this new software in all of its data centres for more than a full year, freeing 0.7% of Google’s total computing resources. It may not seem like much, however at Google’s scale, it is huge.

Jakob Moosbauer is a mathematician from the University of Warwick, UK. He’s impressed. He says that AlphaEvolve’s ability to search for algorithms which produce specific solutions, rather than simply searching for solutions themselves, makes it particularly powerful. He says that the approach is applicable to a wide variety of problems. “AI will become a tool essential to mathematics and computer sciences.”

AlphaEvolve is continuing a line of research that Google DeepMind pursued for years. Its vision is to use AI to advance human knowledge in math and science. In 2022 it developed AlphaTensor – a model which found a faster solution to matrix multiplications – a fundamental problem in computing science – beating a record that stood for over 50 years. In 2023 it released AlphaDev which found faster ways to perform basic calculations performed by computer trillions of time a day. AlphaTensor, AlphaDev and AlphaTensor both turn math problems in a game-like format. They then search for the winning sequence of moves.

FunSearch arrived in 2023 and replaced the game-playing AI with LLMs capable of generating code. FunSearch, which is able to perform a variety of tasks thanks to LLMs, can tackle a wider range of problems than its predecessors who were only trained to play one type of games. The tool was used in order to crack a famous unsolved pure mathematics problem.

AlphaEvolve, the next generation of FunSearch. It can produce hundreds of lines of code instead of short snippets to solve a problem, like FunSearch. This allows it to be used for a wider range of problems.

Theoretically, AlphaEvolve can be used to solve any problem that is described in code or has solutions that are measurable by computers. Matej Balog is a Google DeepMind researcher who leads the algorithm development team. He says that algorithms are the basis of the world we live in.

The survival of the fittest

This is how it works: AlphaEvolve functions like any LLM. AlphaEvolve can be prompted with a problem description and any additional hints, such as previous solution, and it will use Gemini 2.0 Flash, the smallest and fastest version of Google DeepMind’s flagship LLM, to generate multiple blocks to solve the issue.

Then, it runs these candidate solutions to determine their accuracy or efficiency, and scores them based on a variety of relevant metrics. Does this code produce the right result? Does it run faster than the previous solutions? The list goes on.

AlphaEvolve takes the best solutions from the current batch and asks Gemini for improvements. AlphaEvolve may throw an old solution back into the mix in order to keep Gemini from reaching a dead-end.

If AlphaEvolve gets stuck, it can also call upon Gemini 2.0 Pro, Google DeepMind’s most powerful LLM. The idea is to generate as many solutions as possible with the faster Flash, but then add solutions from the slower Pro if needed.

These cycles of generation, scoring and regeneration continue until Gemini is unable to improve on what it has.

Playing number games

Testing AlphaEvolve with a variety of problems. They compared the general-purpose AlphaEvolve to the specialized AlphaTensor by looking at matrix multiplication. Matrices are grids containing numbers. Matrix multiplication, a basic calculation, is used in many applications from AI to computer graphics. Yet, nobody knows how to do it the fastest. Balog says it’s “almost unbelievable” that the question is still open.

AlphaEvolve was given a description of this problem and an example standard algorithm to solve it. The tool produced new algorithms which could calculate 14 different matrix sizes faster than any other approach. It also improved AlphaTensor’s record-beating results for multiplying two four-by-4 matrices.

AlphaEvolve evaluated 16,000 candidates from Gemini in order to find the best solution. However, this is still more efficient than AlphaTensor. AlphaTensor also worked only when a matrix contained 0s and 1. AlphaEvolve can solve the problem for other numbers as well.

Moosbauer says that the result of matrix multiplication was impressive. This new algorithm could speed up computations.

Manuel Kauers is a mathematician from Johannes Kepler University, Linz, Austria. He agrees that the improvements for matrices are likely to be of practical relevance.

By chance, Kauers and his colleague used a different computing technique to find the speedups AlphaEvolve found. The two published a paper last week reporting their results.

Kauers says, “It’s great to see we are making progress with our understanding of matrix multiplication.” Matrix Multiplication was only one of the breakthroughs.

Real world problems

Matrix Multiplication is just one of many. Google DeepMind has tested AlphaEvolve with more than 50 well-known math problems, including Fourier analysis (the math that underlies data compression, which is essential for applications such as video streaming), minimum overlap (an open number theory problem proposed by mathematician Paul Erdos back in 1955), and the kissing numbers problem (a problem introduced in materials science, chemical chemistry, and encryption by Isaac Newton). AlphaEvolve found better solutions for 20% of problems and matched existing solutions in 75%.

Google DeepMind applied AlphaEvolve on a few real-world problems. The tool not only found a better algorithm for managing computing resources across data centres, but also found a way of reducing the power consumption on Google’s specialized chips.

AlphaEvolve found a way of speeding up the training of Gemini by creating a more efficient algorithm to manage a certain type computation used in training.

Google DeepMind intends to continue exploring the potential applications of their tool. AlphaEvolve is not able to solve problems that require a human to score the solutions, such as laboratory experiments that are subjective.

Moosbauer points out that AlphaEvolve, while it may produce impressive results for a variety of problems, provides little insight into the process by which it arrived at these solutions. This is a disadvantage when it comes time to advance human understanding.

Despite this, tools such as AlphaEvolve will change the way researchers conduct their work. Kohli: “I don’t believe we are finished.” “There’s much more we can do in terms of the power of this type of approach.”

Google DeepMind’s new AI agent solves real-world issues better than humans

The survival of the fittest

Playing number games

Real world problems

You.com ARI Enterprise crushes OpenAI during head-to-head testing, aiming at the...

SteamOS is gaining ground

US Plans to Track Every Exported Advanced AI chip

Can ‘godlike technologies’ be stopped from harming children’s generation?

Recomended

You.com ARI Enterprise crushes OpenAI during head-to-head testing, aiming at the deep research market

SteamOS is gaining ground

US Plans to Track Every Exported Advanced AI chip

Can ‘godlike technologies’ be stopped from harming children’s generation?

UK Parliament opts not to hold AI companies accountable over copyright material

In Graphic Detail: How creators use generative AI to shape videos and designs