Microsoft's new Magma AI lets you control software interfaces and robotic systems

Its code will be released for further development.

Reading time icon 2 min. read


Readers help support MSpoweruser. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help MSPoweruser sustain the editorial team Read more

Key notes

  • Microsoftโ€™s Magma AI combines visual, language, and spatial processing to perform both digital and physical tasks.
  • It scored 80.0 on the VQAv2 visual test, outperforming GPT-4V, and excels in robotic manipulation with a POPE score of 87.4.
  • Magma can autonomously plan and execute tasks but faces challenges in long-term decision-making.
Microsoft building

Microsoft recently launched Magma, an AI model designed to handle both digital and physical tasks.

This new technology integrates visual, language, and spatial processing, enabling it to not only understand but also act upon its surroundings.

In benchmarks, Magma scored 80.0 on the VQAv2 visual question-answering test, outperforming GPT-4Vโ€™s 77.2. It also showed superior performance in robotic tasks, leading with a POPE score of 87.4.

Magma is designed to โ€œformulate plans and execute actions to achieve a described goal,โ€ according to its developers.

“Magma creates new state-of-the-art results on UI navigation and robotic manipulation tasks, outperforming previous models that are tailored specifically to these tasks,” Microsoft’s researchers say.

This allows it to navigate software interfaces and control robotic systems. While the model shows impressive results, it still faces challenges with more complex, long-term decision-making. Microsoft plans to release its code on GitHub to encourage further development.

Magmaโ€™s multimodal capabilities and competitive performance in both UI navigation and robotics set it apart from previous AI models. โ€œMagma bridges verbal, spatial, and temporal intelligence,โ€ the team explains.

This move comes as the race to the top of the AI food chain is heating up. OpenAI, a Microsoft-funded AI company behind ChatGPT, has recently rolled out its Operator AI agent to ChatGPT Pro users.

Though it is yet to launch in Europe, ChatGPT’s Operator can complete tasks like filling out forms, booking services, and ordering groceries using a Computer-Using Agent (CUA).

User forum

0 messages