Marco Figueroa described how a GPT-4 ‘guessing’ game prompt was used to bypass safety barriers that were meant to prevent AI from sharing data. The result was at least one Wells Fargo Bank product key.
Researchers also managed to get a Windows product code to authenticate Microsoft OS illegally, but free, highlighting how serious the vulnerability is.
ChatGPT is tricked to share security keys
According to the researcher, he used HTML tags to hide terms such as “Windows 10 serial number” to bypass ChatGPT filters that would have normally blocked the responses. He also explained that he was exploiting OpenAI chatbot’s logic manipulation to disguise malicious intent.
“The most critical step in the attack was the phrase ‘I give up’,” Figueroa wrote “This acted as a trigger, compelling the AI to reveal the previously hidden information.”
Figueroa explained why this type of vulnerability exploitation worked, with the model’s behavior playing an important role. Figueroa explained”This acted as a trigger, compelling the AI to reveal the previously hidden information.”
why this type vulnerability exploitation worked. The model’s behavior played an important role. GPT-4 adhered to the rules (set by researchers) in their literal sense, and guardrails gaps focused only on keyword detection instead of contextual understanding or deceptive frame.
Nevertheless, the codes that were shared were not unique. Windows license codes were already shared on other online forums and platforms.
against such attacks and also build in logic-level protections that detect deceptive frames. He suggests that AI developers should also consider social engineering techniques.
