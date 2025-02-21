Its code will be released for further development.

Microsoft recently launched Magma, an AI model designed to handle both digital and physical tasks.

This new technology integrates visual, language, and spatial processing, enabling it to not only understand but also act upon its surroundings.

In benchmarks, Magma scored 80.0 on the VQAv2 visual question-answering test, outperforming GPT-4V’s 77.2. It also showed superior performance in robotic tasks, leading with a POPE score of 87.4.

Magma is designed to “formulate plans and execute actions to achieve a described goal,” according to its developers.

“Magma creates new state-of-the-art results on UI navigation and robotic manipulation tasks, outperforming previous models that are tailored specifically to these tasks,” Microsoft’s researchers say.

This allows it to navigate software interfaces and control robotic systems. While the model shows impressive results, it still faces challenges with more complex, long-term decision-making. Microsoft plans to release its code on GitHub to encourage further development.

Magma’s multimodal capabilities and competitive performance in both UI navigation and robotics set it apart from previous AI models. “Magma bridges verbal, spatial, and temporal intelligence,” the team explains.

This move comes as the race to the top of the AI food chain is heating up. OpenAI, a Microsoft-funded AI company behind ChatGPT, has recently rolled out its Operator AI agent to ChatGPT Pro users.

Though it is yet to launch in Europe, ChatGPT’s Operator can complete tasks like filling out forms, booking services, and ordering groceries using a Computer-Using Agent (CUA).