Elon Musk's xAI announces Grok-1.5 Vision, with multimodal capability

Home » News

2 min. read

Published on April 13, 2024

by Devesh Beri

published on April 13, 2024

Share this article

Improve this guide

Readers help support MSpoweruser. We may get a commission if you buy through our links.

Key notes

Elon Musk’s xAI has announced Grok-1.5 Vision or Grok-1.5V.
Grok-1.5V is the company’s first multimodal model and will be available to early testers and existing Grok users soon.
Grok-1.5V can process text and visual information.

Last month, Elon Musk launched Grok-1.5 LLM days after Google launched Gemini 1.5. While Musk’s xAI claimed that its model is close to the GPT-4 performance, it doesn’t have multimodal capability. However, the company’s recently announced Grok-1.5 Vision doesn’t have that limitation, as it can process both text and visual information.

What’s Grok-1.5 Vision (Grok-1.5V) and when will it be available?

Grok-1.5V is xAI’s first-generation multimodal model that aims to connect the digital and physical worlds. “Grok outperforms its peers in our new RealWorldQA benchmark that measures real-world spatial understanding,” the company said in a blog post. Additionally, Grok-1.5V can “process a wide variety of visual information, including documents, diagrams, charts, screenshots, and photographs.”

For example, some of the exciting things it can do include writing code from a diagram, calculating calories, making bedtime stories based on drawings, helping you understand a meme, and more. xAI claims that Grok-1.5V performs better than its rival LLMs, including GPT-4V, Claude 3Sonnet, Claude 3 Opus, and Gemini Pro, in the RealWorldQA benchmark.

“Grok outperforms its peers in our new RealWorldQA benchmark that measures real-world spatial understanding,” xAI highlighted.

Grok-1.5V isn’t currently available, but it’s coming soon to early testers and existing Grok users as a preview. While xAI hasn’t specified the launch date, it’s promised to further advance “multimodal understanding” and “generation capabilities” and bring improvements to various modalities such as images, audio, and video.

Devesh Beri

Tech Journalist

These are the things that motivate me - creating informative and helpful content, pursuing my passion for motorsports and music, engaging in expeditions, maintaining a healthy lifestyle, and spending time with my adorable cat Taco.

User forum

0 messages

Sort by:

What’s Grok-1.5 Vision (Grok-1.5V) and when will it be available?

Leave a Reply Cancel reply