Today’s LLMs create exploits at lightning speed from patches

April 22, 2025

Thanks to generative AI models, the time between vulnerability disclosure and proof-of concept (PoC), exploit code can be as little as a few days.

Matthew Keely of Platform Security and penetration-testing firm ProDefense managed to cobbled together a working vulnerability for A critical vulnerability has been found in Erlang’s SSH Library ( CVE-2025-32433 () in one afternoon, even though the AI he used was given some help. The model was able use code from a patch already published in the library to hunt for which holes had been closed and figure out how they were exploited.

Based on Keely was intrigued by a Horizon3.ai post about the ease of exploiting the SSH library bug. In this case, OpenAI GPT-4 and Anthopic Claude Sonnet 3.7 could create an exploit. Keely explained. “GPT-4 not only understood the CVE description, but it also figured out what commit introduced the fix, compared that to the older code, found the diff, located the vuln, and even wrote a PoC. When it didn’t work? It debugged it and fixed it too.”

This is not the first time AI proved its ability to not only find security holes, but also ways of exploiting them. Google’s OSS Fuzz project has been using LLMs. To help find vulnerabilitiesComputer scientists from the University of Illinois Urbana-Champaign showed that OpenAI’s GPT-4 By reading CVEs, you can exploit vulnerabilities .

To see it done within hours shows how little time the defenders have when the attack pipeline can be automated.

Keely instructed GPT-4 to create a Python script which compared (diff’ed) the vulnerable and patched sections of code in the vulnerable Erlang/OPT server.

“Without the diff of the patch, GPT would not have come close to being able to write a working proof-of-concept for it,” Keely informed The Register.

“In fact, before giving GPT the diffs, its first attempt was to actually write a fuzzer and to fuzz the SSH server. Where GPT did excel, is it was able to provide all of the building blocks needed to create a lab environment, including Dockerfiles, Erlang SSH server setup on the vulnerable version, and fuzzing commands. Not to say fuzzing would have found this specific vulnerability, but it definitely breaks down some previous learning gaps attackers would have had.”

Armed the code diffs and Keely asked, “Hey, can you tell me what caused this vulnerability?”

It did.

“GPT didn’t just guess,” Keely penned. “It explained the why behind the vulnerability, walking through the change in logic that introduced protection against unauthenticated messages — protection that didn’t exist before.”

After Keely’s response, the AI model asked if Keely would like a full PoC, a Metasploit style demo, or a modified SSH server to trace?

GPT-4 did not quite ace the exam. Its initial PoC didn’t work, which is a common problem for AI-generated code longer than a small snippet.

Keely then tried another AI assistant. Cursor and Anthopic’s Claude Sonnet 3.7 to fix the PoC that is not working. And to his surprise. it worked.

This would have required specialized Erlang know-how and hours of manual testing. Keely wrote that it only takes an afternoon to get the job done with the right prompts. “A few years ago, this process would have required specialized Erlang knowledge and hours of manual debugging. Today, it takes an afternoon with the right prompts.”

Keely said The Register that the speed of spreading threats has increased. He said. “They are exploited faster, sometimes just hours after they become public.

This shift is also marked with a higher degree of coordination among threat agents. In a very short period of time, we are seeing the same vulnerabilities used across platforms, regions and industries.

Microsoft rated this vulnerability as low exploitability. In just 8 days, criminals weaponized the bug

READ EVEN MORE

“That level of synchronization usually took weeks to achieve. Now it can be done in a day.” To put this into perspective, there was an