Microsoft will bring new Phi-4-multimodal to Copilot+ PCs

Both models are now available for developers

Reading time icon 2 min. read


Readers help support MSpoweruser. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help MSPoweruser sustain the editorial team Read more

Key notes

  • Microsoft launched two new AI models, Phi-4-multimodal and Phi-4-mini, with Phi-4-multimodal in Copilot+ PCs.
  • Phi-4-multimodal handles text, images, and speech, outpacing models like Google’s Gemini 2.0 Flash.
  • Copilot+ PCs use AI locally for faster, more private tasks.
Copilot+ PCs

Microsoft has recently been adding two new AI models to its Phi-4 small family: the Phi-4-mini and the Phi-4-multimodal. And with that, the Redmond company says that it’ll integrate the latter into Copilot+ PCs.

The 5.6B parameters Phi-4-multimodal can process text, images, and speech all at once. Itโ€™s designed to be efficient, performing tasks like speech recognition and understanding visuals, while using less energy compared to larger models.

“Copilot+ PCs will build upon Phi-4-multimodalโ€™s capabilities, delivering the power of Microsoftโ€™s advanced SLMs without the energy drain. This integration will enhance productivity, creativity, and education-focused experiences, becoming a standard part of our developer platform,” says Weizhu Chen, Microsoft’s VP for generative AI.

Copilot+ PCs use AI locally for some tasks, meaning the AI runs directly on the device instead of relying on the cloud. This helps with privacy and speed. For example, AI features in Microsoft apps like Word and Outlook, or even the controversial all-knowing Recall, can work without needing an internet connection.

The Redmond tech giant also says that the Phi-4-multimodal outperforms its competitors, including Google’s new Gemini 2.0 Flash that powers the Gemini chatbot, in certain cherry-picked benchmarks.

On the other hand, the Phi-4-mini, with 3.8 billion parameters, is designed for text-based tasks such as reasoning, math, coding, and instruction-following, with the ability to process sequences up to 128,000 tokens.

Both models are available for developers through platforms like Azure AI Foundry and HuggingFace.

User forum

0 messages