Microsoft announces Phi-3-vision, a new multimodal SLM for on-device AI scenarios

Reading time icon 1 min. read


Readers help support MSpoweruser. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help MSPoweruser sustain the editorial team Read more

Key notes

  • Phi-3-vision is a 4.2B parameter model that supports general visual reasoning tasks and chart/graph/table reasoning

At Build 2024, Microsoft today expanded its Phi-3 family of AI small language models with the new Phi-3-vision. Phi-3-vision is a 4.2B parameter model that supports general visual reasoning tasks and chart/graph/table reasoning. The model can take both images and text as input, and output text responses.

Microsoft today also announced the general availability of Phi-3-mini in Azure AI’s Models-as-a Service (MaaS) offering. Phi-3 models are gaining momentum since they are cost-effective and optimized for on-device, edge, offline inference, and latency bound AI scenarios.

In addition to the news about Phi-3 models, Microsoft announced new features across APIs to enable multimodal experiences. Azure AI Speech now has speech analytics and universal translation. Azure AI Search now comes with significantly increased storage and up to 12X increase in vector index size at no additional cost to enable large RAG workloads at scale.

More about the topics: microsoft, Phi-3-vision