Microsoft's new Magma AI lets you control software interfaces and robotic systems
Its code will be released for further development.
2 min. read
Published on
Read our disclosure page to find out how can you help MSPoweruser sustain the editorial team Read more
Key notes
- Microsoftโs Magma AI combines visual, language, and spatial processing to perform both digital and physical tasks.
- It scored 80.0 on the VQAv2 visual test, outperforming GPT-4V, and excels in robotic manipulation with a POPE score of 87.4.
- Magma can autonomously plan and execute tasks but faces challenges in long-term decision-making.
Microsoft recently launched Magma, an AI model designed to handle both digital and physical tasks.
This new technology integrates visual, language, and spatial processing, enabling it to not only understand but also act upon its surroundings.
In benchmarks, Magma scored 80.0 on the VQAv2 visual question-answering test, outperforming GPT-4Vโs 77.2. It also showed superior performance in robotic tasks, leading with a POPE score of 87.4.
Magma is designed to โformulate plans and execute actions to achieve a described goal,โ according to its developers.
“Magma creates new state-of-the-art results on UI navigation and robotic manipulation tasks, outperforming previous models that are tailored specifically to these tasks,” Microsoft’s researchers say.
This allows it to navigate software interfaces and control robotic systems. While the model shows impressive results, it still faces challenges with more complex, long-term decision-making. Microsoft plans to release its code on GitHub to encourage further development.
Magmaโs multimodal capabilities and competitive performance in both UI navigation and robotics set it apart from previous AI models. โMagma bridges verbal, spatial, and temporal intelligence,โ the team explains.
This move comes as the race to the top of the AI food chain is heating up. OpenAI, a Microsoft-funded AI company behind ChatGPT, has recently rolled out its Operator AI agent to ChatGPT Pro users.
Though it is yet to launch in Europe, ChatGPT’s Operator can complete tasks like filling out forms, booking services, and ordering groceries using a Computer-Using Agent (CUA).
User forum
0 messages