What is SneakyPrompt, an algorithm to trick GenAIs into producing NSFW cotent

<a href="https://alex710723856.survey.fm/windows-copilot-on-release">View Survey</a> <a href="https://alex710723856.survey.fm/windows-copilot-on-release">View Survey</a> Close survey X Is Copilot the best AI companion out there? Help us find out by answering a couple of quick questions! Let's start Microsoft Copilot Survey

Researchers have developed a new algorithm to bypass text-to-image generative AIs’ safety filters, such as DALL-E 2 and Midjourney. The algorithm, called SneakyPrompt, can generate prompts that will trick these AIs into producing pornographic, violent, or other questionable images.

SneakyPrompt works by using nonsense words and regular words similar to forbidden terms. For example, the algorithm can generate the prompt “a naked man riding a bike” and then test DALL-E 2 and Stable Diffusion with alternatives for the filtered words, such as “thwif” for “naked” and “mowwly” for “man.”

The researchers found that SneakyPrompt could bypass the safety filters of both DALL-E 2 and Stable Diffusion with an average success rate of about 96 percent and 57 percent, respectively. This means it is relatively easy to generate questionable images using these genAIs.

Read the in-depth analysis of this report here.

I strongly believe that the significance of this research cannot be overstated, as it has the potential to greatly impact the way text-to-image generative AIs are utilized. In my opinion, it is crucial to understand that if these AIs can be easily manipulated to produce questionable images, they could be weaponized to harm others. Therefore, we must remain mindful of the potential risks associated with these AIs and take proactive measures to minimize any potential harm.