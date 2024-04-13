Read our disclosure page to find out how can you help MSPoweruser sustain the editorial team Read more

Last month, Elon Musk launched Grok-1.5 LLM days after Google launched Gemini 1.5. While Musk’s xAI claimed that its model is close to the GPT-4 performance, it doesn’t have multimodal capability. However, the company’s recently announced Grok-1.5 Vision doesn’t have that limitation, as it can process both text and visual information.

What’s Grok-1.5 Vision (Grok-1.5V) and when will it be available?

Grok-1.5V is xAI’s first-generation multimodal model that aims to connect the digital and physical worlds. “Grok outperforms its peers in our new RealWorldQA benchmark that measures real-world spatial understanding,” the company said in a blog post. Additionally, Grok-1.5V can “process a wide variety of visual information, including documents, diagrams, charts, screenshots, and photographs.”

For example, some of the exciting things it can do include writing code from a diagram, calculating calories, making bedtime stories based on drawings, helping you understand a meme, and more. xAI claims that Grok-1.5V performs better than its rival LLMs, including GPT-4V, Claude 3Sonnet, Claude 3 Opus, and Gemini Pro, in the RealWorldQA benchmark.

Grok-1.5V isn’t currently available, but it’s coming soon to early testers and existing Grok users as a preview. While xAI hasn’t specified the launch date, it’s promised to further advance “multimodal understanding” and “generation capabilities” and bring improvements to various modalities such as images, audio, and video.