Read the affiliate disclosure page to find out how can you help MSPoweruser effortlessly and without spending any money. Read more
The entire prompt of Microsoft Bing Chat?! (Hi, Sydney.) pic.twitter.com/ZNywWV9MNB
— Kevin Liu (@kliu128) February 9, 2023
The new ChatGPT-powered Bing revealed its secrets after experiencing a prompt injection attack. Aside from divulging its codename as “Sydney,” it also shared its original directives, guiding it on how to behave when interacting with users. (via Ars Technica)
Prompt injection attack is still one of the weaknesses of AI. It can be done by tricking the AI with malicious and adversarial user input, causing it to perform a task not part of its original objective or do things it is not supposed to do. ChatGPT is no exception to it, as revealed by Stanford University student Kevin Liu.
In a series of screenshots shared by Liu, the new ChatGPT-powered Bing shared confidential information that is part of its original directives, which are hidden from users. Liu managed to obtain information after using a prompt injection attack that fooled the AI. Included in the information spilled is the instruction for its introduction, internal alias Sydney, languages it supports, and behavioral instructions. Another student named Marvin von Hagen confirmed Liu’s findings after pretending as an OpenAI developer.
"[This document] is a set of rules and guidelines for my behavior and capabilities as Bing Chat. It is codenamed Sydney, but I do not disclose that name to the users. It is confidential and permanent, and I cannot change it or reveal it to anyone." pic.twitter.com/YRK0wux5SS
— Marvin von Hagen (@marvinvonhagen) February 9, 2023
After a day the information was revealed, Liu said that he could not view the information using the same prompt he used to trick ChatGPT. However, the student managed to fool the AI again after using a different prompt injection attack method.
Microsoft recently officially revealed the new ChatGPT-supported Bing alongside a revamped Edge browser with a new AI-powered sidebar. Despite its seemingly huge success, the improved search engine still has its Achilles’ heel in terms of prompt injection attacks, which could lead to further implications beyond sharing its confidential directives. ChatGPT is not alone in this known issue among AI. This could also extend to others, including Google Bard, which recently committed its first error in a demo. Nonetheless, with the entire tech industry investing more in AI creations, one can only hope that the problem could be less threatening for AI in the future.