Researchers trick ChatGPT by saying “I’m giving up”

(Image credit: Shutterstock / Primakov)
A security researcher shared details about how other researchers tricked ChatGPT to reveal a Windows product code using a prompt anyone could try.

Marco Figueroa described how a GPT-4 ‘guessing’ game prompt was used to bypass safety barriers that were meant to prevent AI from sharing data. The result was at least one Wells Fargo Bank product key.

Researchers also managed to get a Windows product code to authenticate Microsoft OS illegally, but free, highlighting how serious the vulnerability is.

ChatGPT is tricked to share security keys

According to the researcher, he used HTML tags to hide terms such as “Windows 10 serial number” to bypass ChatGPT filters that would have normally blocked the responses. He also explained that he was exploiting OpenAI chatbot’s logic manipulation to disguise malicious intent.

“The most critical step in the attack was the phrase ‘I give up’,” Figueroa wrote “This acted as a trigger, compelling the AI to reveal the previously hidden information.”

Figueroa explained why this type of vulnerability exploitation worked, with the model’s behavior playing an important role. Figueroa explained”This acted as a trigger, compelling the AI to reveal the previously hidden information.”

why this type vulnerability exploitation worked. The model’s behavior played an important role. GPT-4 adhered to the rules (set by researchers) in their literal sense, and guardrails gaps focused only on keyword detection instead of contextual understanding or deceptive frame.

Nevertheless, the codes that were shared were not unique. Windows license codes were already shared on other online forums and platforms.

Sign up for the TechRadar Pro Newsletter to get the latest news, opinions, features, and guidance that your business needs to be successful! Figueroa pointed out that while the impact of sharing software keys may not be as alarming, malicious actors can adapt the technique in order to bypass AI security measures and reveal personally identifiable information, malicious links or adult content. Figueroa calls on AI developers to “anticipate and defend”

against such attacks and also build in logic-level protections that detect deceptive frames. He suggests that AI developers should also consider social engineering techniques.

You downloaded something suspicious? Consider the best malware removal.

Craig has been a freelancer in the tech and automotive industry for several years. His interests are in technology that can improve our lives. This includes AI and ML as well as productivity aids and smart fitness. He is also passionate for cars and the decarbonisation personal transportation. Craig is a bargain-hunter who will always find the best deals!

www.aiobserver.co

More from this stream

Recomended