Microsoft announces Phi-3-vision, a new multimodal SLM for on-device AI scenarios

Home » News

1 min. read

Updated on May 23, 2024

by Pradeep Viswav

updated on May 23, 2024

Share this article

Improve this guide

Readers help support MSpoweruser. We may get a commission if you buy through our links.

Key notes

Phi-3-vision is a 4.2B parameter model that supports general visual reasoning tasks and chart/graph/table reasoning

At Build 2024, Microsoft today expanded its Phi-3 family of AI small language models with the new Phi-3-vision. Phi-3-vision is a 4.2B parameter model that supports general visual reasoning tasks and chart/graph/table reasoning. The model can take both images and text as input, and output text responses.

Microsoft today also announced the general availability of Phi-3-mini in Azure AI’s Models-as-a Service (MaaS) offering. Phi-3 models are gaining momentum since they are cost-effective and optimized for on-device, edge, offline inference, and latency bound AI scenarios.

In addition to the news about Phi-3 models, Microsoft announced new features across APIs to enable multimodal experiences. Azure AI Speech now has speech analytics and universal translation. Azure AI Search now comes with significantly increased storage and up to 12X increase in vector index size at no additional cost to enable large RAG workloads at scale.