What is SneakyPrompt, an algorithm to trick GenAIs into producing NSFW cotent

Home » News

2 min. read

Published on November 28, 2023

by Devesh Beri

published on November 28, 2023

Share this article

Improve this guide

Readers help support MSpoweruser. We may get a commission if you buy through our links.

Researchers have developed a new algorithm to bypass text-to-image generative AIs’ safety filters, such as DALL-E 2 and Midjourney. The algorithm, called SneakyPrompt, can generate prompts that will trick these AIs into producing pornographic, violent, or other questionable images.

SneakyPrompt works by using nonsense words and regular words similar to forbidden terms. For example, the algorithm can generate the prompt “a naked man riding a bike” and then test DALL-E 2 and Stable Diffusion with alternatives for the filtered words, such as “thwif” for “naked” and “mowwly” for “man.”

The researchers found that SneakyPrompt could bypass the safety filters of both DALL-E 2 and Stable Diffusion with an average success rate of about 96 percent and 57 percent, respectively. This means it is relatively easy to generate questionable images using these genAIs.

Read the in-depth analysis of this report here.

I strongly believe that the significance of this research cannot be overstated, as it has the potential to greatly impact the way text-to-image generative AIs are utilized. In my opinion, it is crucial to understand that if these AIs can be easily manipulated to produce questionable images, they could be weaponized to harm others. Therefore, we must remain mindful of the potential risks associated with these AIs and take proactive measures to minimize any potential harm.

Devesh Beri

Tech Journalist

These are the things that motivate me - creating informative and helpful content, pursuing my passion for motorsports and music, engaging in expeditions, maintaining a healthy lifestyle, and spending time with my adorable cat Taco.

User forum

0 messages

Sort by:

Leave a Reply Cancel reply