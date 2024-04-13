Elon Musk's xAI announces Grok-1.5 Vision, with multimodal capability

Home » News

Reading time icon 2 min. read

Calendar icon Published on

by Rahul 

published on

Share this article

Readers help support MSpoweruser. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help MSPoweruser sustain the editorial team Read more

Key notes

  • Elon Musk’s xAI has announced Grok-1.5 Vision or Grok-1.5V.
  • Grok-1.5V is the company’s first multimodal model and will be available to early testers and existing Grok users soon.
  • Grok-1.5V can process text and visual information.

Last month, Elon Musk launched Grok-1.5 LLM days after Google launched Gemini 1.5. While Musk’s xAI claimed that its model is close to the GPT-4 performance, it doesn’t have multimodal capability. However, the company’s recently announced Grok-1.5 Vision doesn’t have that limitation, as it can process both text and visual information.

What’s Grok-1.5 Vision (Grok-1.5V) and when will it be available?

Grok-1.5V is xAI’s first-generation multimodal model that aims to connect the digital and physical worlds. “Grok outperforms its peers in our new RealWorldQA benchmark that measures real-world spatial understanding,” the company said in a blog post. Additionally, Grok-1.5V can “process a wide variety of visual information, including documents, diagrams, charts, screenshots, and photographs.”

For example, some of the exciting things it can do include writing code from a diagram, calculating calories, making bedtime stories based on drawings, helping you understand a meme, and more. xAI claims that Grok-1.5V performs better than its rival LLMs, including GPT-4V, Claude 3Sonnet, Claude 3 Opus, and Gemini Pro, in the RealWorldQA benchmark.

“Grok outperforms its peers in our new RealWorldQA benchmark that measures real-world spatial understanding,” xAI highlighted.

Grok-1.5V isn’t currently available, but it’s coming soon to early testers and existing Grok users as a preview. While xAI hasn’t specified the launch date, it’s promised to further advance “multimodal understanding” and “generation capabilities” and bring improvements to various modalities such as images, audio, and video.

Rahul

Rahul Shield

Tech Journalist

Rahul is a tech Journalist, with years of experience in covering software, primarily Windows and Android. He also loves to share her opinions on diverse tech topics.