Benchmark suggests Phi-3-vision is on-par with Claude 3-haiku & Gemini 1.0 Pro, if not better

Microsoft launched Phi-3-vision a little while ago.

Home » News

2 min. read

Updated on September 4, 2024

by Rafly Gilang

updated on September 4, 2024

Share this article

Improve this guide

Readers help support MSpoweruser. We may get a commission if you buy through our links.

Key notes

Microsoft launched Phi-3-vision a little while ago.
Now, recent benchmark suggests it’s on par with other smaller models, if not better.
The small model itself is now available for preview.

Microsoft launched Phi-3-vision, the latest addition to the Phi-3 family, a little while ago during the Build 2024 conference. The new multimodal SLM for on-device AI scenarios has a vast parameter count of 4.2 billion, capable of giving support for general visual reasoning tasks and others.

How good is Phi-3-vision, you may ask? A recently published paper suggests that it’s on par with other small models like Claude 3-haiku & Gemini 1.0 Pro. It puts Phi-3-vision against some of the popular AI benchmarks, like MMMU (college-level knowledge).

As you can see, in some cases, like ScienceQA, MathVista, and ChartQA for example, Phi-3-vision still remains superior to its competitors in smaller models.

Phi-3-vision, with 4.2 billion parameters, handles visual reasoning and chart/graph/table analysis. It can process images and text to provide text responses, such as answering questions about charts or images. Unlike larger models like DALL-E or Stable Diffusion, Phi-3-vision doesn’t generate images but understands and analyzes them.

Phi-3-vision itself is now available for preview. Additionally, Phi-3-mini and Phi-3-medium models are now available through Azure AI’s MaaS offering.

Besides Phi-3-vision, Microsoft also launched the Phi Silica model. It will be shipped as a part of Windows to power generative AI apps, which are cost & energy-friendly, and “custom-built for the NPUs in Copilot+ PCs.”

“With full NPU offload of prompt processing, the first token latency is at 650 tokens/second – and only costs about 1.5 Watts of power while leaving your CPU and GPU free for other computations,” Microsoft adds.

Rafly Gilang

Tech Reporter

Rafly is a reporter with years of journalistic experience, ranging from technology, business, social, and culture. Currently reporting news on Microsoft-related products, tech, and AI on MSPowerUser. Got a tip? Send it to [email protected]

User forum

0 messages

Sort by:

Leave a Reply Cancel reply