OpenAI’s new GPT-4o model beats Gemini and Claude to set new benchmarks

Home » News

2 min. read

Updated on July 29, 2024

by Pradeep Viswav

updated on July 29, 2024

Share this article

Improve this guide

Readers help support MSpoweruser. We may get a commission if you buy through our links.

Today, OpenAI announced its latest flagship model, GPT-4o. The GPT-4o (o refers to ‘omni’) model is now available via API for developers. The new GPT-4o model is as smart as GPT-4 Turbo, but has improved vision capabilities, and is much more efficient.

OpenAI claims that this new model is 2x faster, 50% cheaper and comes with 5x rate limits. The GPT-4 Turbo will cost $14 for million tokens whereas the GPT-4o will just cost $7 for million tokens. And yes, the GPT-4o model will support up to 10 million tokens per minute. The GPT-4o model API will support text and vision for now, with audio and video support coming soon. Also, the model has 128K context and an October 2023 knowledge cutoff.

How does GPT-4o perform when compared to Gemini and Claude?

For the past few days, OpenAI was testing a version of GPT-4o model on the LMSys arena as im-also-a-good-gpt2-chatbot. As you can see from the chart above, GPT-4o is the best model in the world right now and it is available for free for all ChatGPT users.

The new GPT-4o model also sets record in several standard AI benchmarks. Check it out below.

Model	Prompt	MMLU	GPQA	MATH	HumanEval	MGSM	DROP (F1,3-shot)
OPENAI GPT4s
gpt-4o	chatgpt¹	`88.7`	`53.6`	`76.6`	90.2	90.5	83.4
gpt-4o	assistant²	87.2	49.9	`76.6`	`91.0`	89.9	83.7
gpt-4-turbo-2024-04-09	chatgpt	86.5	49.1	72.2	87.6	88.6	85.4
gpt-4-turbo-2024-04-09	assistant	86.7	49.3	73.4	88.2	89.6	`86.0`
gpt-4-1106(-vision)-preview	chatgpt	84.6	42.1	64.1	82.2	86.5	81.3
gpt-4-1106(-vision)-preview	assistant	84.7	42.5	64.3	83.7	87.1	83.2
gpt-4-0125-preview	chatgpt	84.8	39.7	64.2	88.2	83.7	83.4
gpt-4-0125-preview	assistant	85.4	41.4	64.5	86.6	85.1	81.5
REFERENCE-RERUN
Claude-3-Opus (rerun w/ api)	empty³	84.1	49.7	63.2	84.8	89.7	79.0
Claude-3-Opus (rerun w/ api)	lmsys⁴	84.2	50.7	63.8	82.9	89.2	77.1
Llama3 70b (rerun w/ api)	empty	80.2	41.3	52.8	70.1	82.6	81.4
REFERENCE-REPORT		(5-shot)
Claude-3-Opus (report⁵)	unknown	86.8	50.4	60.1	84.9	`90.7`	83.1
Gemini-Ultra-1.0 (report⁶)	unknown	83.7	n/a	53.2	74.4	79.0	82.4
Gemini-Pro-1.5 (report⁶)	unknown	81.9	n/a	58.5	71.9	88.7	78.9
Llama3 8b (report⁷)	unknown	68.4	34.2	30.0	62.2	n/a	58.4
Llama3 70b (report⁷)	unknown	82.0	39.5	50.4	81.7	n/a	79.7
Llama3 400b (still training, report⁷)	unknown	86.1	48.0	57.8	84.1	n/a	83.5

Developers can try out GPT-4o model at OpenAI Playground.

OpenAI is asking the public to help identify tasks where GPT-4 Turbo still outperforms GPT-4o, so they can continue to improve the model.

Pradeep Viswav

Software and Services Expert

Pradeep is a Computer Science and Engineering Graduate. He was also a Microsoft Student Partner. He is currently working in a leading IT company.

User forum

0 messages

Sort by:

Leave a Reply Cancel reply