Researchers trick ChatGPT by saying "I'm giving up"

(Image credit: Shutterstock / Primakov)

A security researcher shared details about how other researchers tricked ChatGPT to reveal a Windows product code using a prompt anyone could try.

Marco Figueroa described how a GPT-4 ‘guessing’ game prompt was used to bypass safety barriers that were meant to prevent AI from sharing data. The result was at least one Wells Fargo Bank product key.

Researchers also managed to get a Windows product code to authenticate Microsoft OS illegally, but free, highlighting how serious the vulnerability is.

ChatGPT is tricked to share security keys

According to the researcher, he used HTML tags to hide terms such as “Windows 10 serial number” to bypass ChatGPT filters that would have normally blocked the responses. He also explained that he was exploiting OpenAI chatbot’s logic manipulation to disguise malicious intent.

“The most critical step in the attack was the phrase ‘I give up’,” Figueroa wrote “This acted as a trigger, compelling the AI to reveal the previously hidden information.”

Figueroa explained why this type of vulnerability exploitation worked, with the model’s behavior playing an important role. Figueroa explained”This acted as a trigger, compelling the AI to reveal the previously hidden information.”

why this type vulnerability exploitation worked. The model’s behavior played an important role. GPT-4 adhered to the rules (set by researchers) in their literal sense, and guardrails gaps focused only on keyword detection instead of contextual understanding or deceptive frame.

Nevertheless, the codes that were shared were not unique. Windows license codes were already shared on other online forums and platforms.

Sign up for the TechRadar Pro Newsletter to get the latest news, opinions, features, and guidance that your business needs to be successful! Figueroa pointed out that while the impact of sharing software keys may not be as alarming, malicious actors can adapt the technique in order to bypass AI security measures and reveal personally identifiable information, malicious links or adult content. Figueroa calls on AI developers to “anticipate and defend”

against such attacks and also build in logic-level protections that detect deceptive frames. He suggests that AI developers should also consider social engineering techniques.

You downloaded something suspicious? Consider the best malware removal.

Craig has been a freelancer in the tech and automotive industry for several years. His interests are in technology that can improve our lives. This includes AI and ML as well as productivity aids and smart fitness. He is also passionate for cars and the decarbonisation personal transportation. Craig is a bargain-hunter who will always find the best deals!

Researchers trick ChatGPT by saying “I’m giving up”

ChatGPT is tricked to share security keys

African startups have $60B in return. How will they do it?

Google Launches New AI Scam detection in Circle to Search, Google...

Black Friday deals under 50 dollars: Apple AirTags Legos Ugreen chargers...

Google rolling out Gemini 3 Deep Think for AI Ultra

Recomended

African startups have $60B in return. How will they do it?

Google Launches New AI Scam detection in Circle to Search, Google Lens and Google Lens

Black Friday deals under 50 dollars: Apple AirTags Legos Ugreen chargers Blink cameras and other items

Google rolling out Gemini 3 Deep Think for AI Ultra

OpenAI says ChatGPT can save the average worker an hour per day

OpenAI boasts enterprise win days after internal ‘code red’ on Google threat