Microsoft announces Phi-3-vision, a new multimodal SLM for on-device AI scenarios

Home » News

1 min. read

Updated on May 23, 2024

by Pradeep Viswav

updated on May 23, 2024

Readers help support MSpoweruser. We may get a commission if you buy through our links.

Key notes

Phi-3-vision is a 4.2B parameter model that supports general visual reasoning tasks and chart/graph/table reasoning

At Build 2024, Microsoft today expanded its Phi-3 family of AI small language models with the new Phi-3-vision. Phi-3-vision is a 4.2B parameter model that supports general visual reasoning tasks and chart/graph/table reasoning. The model can take both images and text as input, and output text responses.

Microsoft today also announced the general availability of Phi-3-mini in Azure AI’s Models-as-a Service (MaaS) offering. Phi-3 models are gaining momentum since they are cost-effective and optimized for on-device, edge, offline inference, and latency bound AI scenarios.

In addition to the news about Phi-3 models, Microsoft announced new features across APIs to enable multimodal experiences. Azure AI Speech now has speech analytics and universal translation. Azure AI Search now comes with significantly increased storage and up to 12X increase in vector index size at no additional cost to enable large RAG workloads at scale.

More about the topics: microsoft, Phi-3-vision

Pradeep Viswav

Software and Services Expert

Pradeep is a Computer Science and Engineering Graduate. He was also a Microsoft Student Partner. He is currently working in a leading IT company.