Gemini 1.5 Pro/Advanced on-par with OpenAI's GPT-4o model, latest benchmark score reveals

Home » News

2 min. read

Published on May 29, 2024

by Rafly Gilang

published on May 29, 2024

Share this article

Improve this guide

Readers help support MSpoweruser. We may get a commission if you buy through our links.

Key notes

Google’s Gemini 1.5 Pro, capable of 1 million tokens, will soon have a 2M token version.
It performs on par with OpenAI’s GPT-4o in some categories, based on the Arena Elo system.
Gemini 1.5 Flash, available on Google AI Studio, competes well with Microsoft’s latest Phi-3 models.

Google’s Gemini 1.5 Pro arrived some time ago, running up to 1 million tokens. But recently, the Mountain View tech giant announced during the Google I/O 2024 event that a 2M token version was coming soon for developers to try.

But how good the Gemini 1.5 Pro actually is, though? Usually, benchmark numbers are a good start, although it does not necessarily paint an accurate entire picture. It turns out that Gemini 1.5 Pro, or even its “Advanced” tier, is on par with OpenAI’s latest GPT-4o in certain categories.

As seen on the overall leaderboard comparison above from LMSYS Org, both Gemini-1.5-Pro-API-0514 & Gemini-Advanced-0514 come close to GPT-4o according to the Arena Elo system measurement. These two models are also extremely popular in Chinese, and the Gemini 1.5 Pro also comes close in the “hard prompts” category.

The Arena Elo system measures the skill of large language models (LLMs) by having users anonymously vote on which model performs better in random battles, updating their ratings like the Elo system in chess. The non-profit AI-oriented organization focuses on comparing models side-by-side.

Gemini 1.5 Flash, which is now available to try on Google AI Studio & Vertex AI, comes close. For a small, lightweight model, it sure holds up with Microsoft’s latest addition to the Phi-3 family, the Phi-3-vision & Phi Silica.

When OpenAI arrived a little while ago with the new GPT-4o model and a ChatGPT desktop app, the expectations were high. The latest model makes the AI chatbot sound extremely humane when having a conversation with the users.

Rafly Gilang

Tech Reporter

Rafly is a reporter with years of journalistic experience, ranging from technology, business, social, and culture. Currently reporting news on Microsoft-related products, tech, and AI on MSPowerUser. Got a tip? Send it to [email protected]

User forum

0 messages

Sort by:

Leave a Reply Cancel reply