Google’s text-to-image generator Imagen produces pictures with ‘unprecedented degree of photorealism’

Google unveiled a new creation called “Imagen,” a text-to-image generator through descriptions a person will provide. The company claims that it surpasses the performance of DALL-E 2, another AI image generator. It presented some samples, which undeniably show exquisite details, but Imagen is currently unavailable to the public.

The new text-to-image diffusion model is described to have “an unprecedented degree of photorealism and a deep level of language understanding.” It understands text through large transformer language models and is said to rely on diffusion models to perform high-fidelity image generation.

Google provided images and samples of Imagen’s work, with styles varying from drawings to oil paintings and CGIs. They are accompanied by the words and phrases used to generate them. For instance, one sample reads, “a dragon fruit wearing karate belt in the snow,” while the other has the description “a small cactus wearing a straw hat and neon sunglasses in the Sahara desert.”

The generated images look incredibly real as if they were created by an actual person. However, Google says that it is done through diffusion technologies by utilizing a pure noise image and refining it in the best way possible. By understanding the text description provided, Imagen will generate a 64 x 64-pixel image, perform two enhancements, and convert the image into a larger 1024 x 1024-pixel piece.

Google Research, Brain Team says that Imagen excelled on COCO (a large-scale object detection, segmentation, and captioning dataset) despite not being trained on it. The team reported that it received a new state-of-the-art FID score of 7.27.

Google also compared Imagen’s performance to other text-to-image models by assessing them using “DrawBench.” It serves as a benchmark for text-to-image models where Google tested Imagen with other methods like VQ-GAN+CLIP, Latent Diffusion Models, and DALL-E 2. After testing for their compositionality, cardinality, spatial relations, long-form text, rare words, and challenging prompts, the team said that “human raters strongly prefer Imagen over other methods, in both image-text alignment and image fidelity.”

Despite these impressive reports from the research team, testing Imagen yourself won’t be possible as it is not accessible to the public. Google has reasons for that, such as ethical challenges, potential risks of misuse, social biases, limitations of large language models, and risk of encoded harmful stereotypes and representations. The team summarizes that with all these challenges, Imagen is still not perfect when it comes to generating images related to people. 

“Imagen exhibits serious limitations when generating images depicting people,” the team explains in a blog post. “Our human evaluations found Imagen obtains significantly higher preference rates when evaluated on images that do not portray people, indicating a degradation in image fidelity. Preliminary assessment also suggests Imagen encodes several social biases and stereotypes, including an overall bias towards generating images of people with lighter skin tones and a tendency for images portraying different professions to align with Western gender stereotypes. Finally, even when we focus generations away from people, our preliminary analysis indicates Imagen encodes a range of social and cultural biases when generating images of activities, events, and objects. We aim to make progress on several of these open challenges and limitations in future work.”

Sharron Bennet

Recent Posts

Only $27.36! ! Get Cheap Genuine Office Suite at KeysOff!

Sponsored Microsoft Office is one of the best office automation software with millions of global users. The newer version comes…

24 mins ago

Custom T-Shirts Online: A Quick Guide

Sponsored Want to bring a great custom shirt idea to life without a hassle? This guide is right up your…

2 hours ago

Microsoft releases Patch Tuesday update for Windows 11

Today is Patch Tuesday, and Microsoft has started pushing new updates to Windows 11 users. The Patch Tuesday update is…

3 hours ago

Office Insiders on iOS are getting new features for PowerPoint, Excel, and Office Mobile

After releasing new text and array manipulation functions for Office Insiders on Android, Microsoft has turned its focus on iOS.…

6 hours ago

Mark Zuckerberg announces new privacy features for WhatsApp

Meta-owned WhatsApp is now focussing more on protecting users' messages and keeping them private. To continue its effort to bring…

7 hours ago

Cult of the Lamb, Rumbleverse, and more coming to Xbox this week

Weekdays can be draining for many of us, and one of the best ways we can recommend to unwind is…

12 hours ago