OpenAI’s new GPT-4o model beats Gemini and Claude to set new benchmarks

Reading time icon 2 min. read


Readers help support MSpoweruser. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help MSPoweruser sustain the editorial team Read more

OpenAI GPT-4o model

Today, OpenAI announced its latest flagship model, GPT-4o. The GPT-4o (o refers to ‘omni’) model is now available via API for developers. The new GPT-4o model is as smart as GPT-4 Turbo, but has improved vision capabilities, and is much more efficient.

OpenAI claims that this new model is 2x faster, 50% cheaper and comes with 5x rate limits. The GPT-4 Turbo will cost $14 for million tokens whereas the GPT-4o will just cost $7 for million tokens. And yes, the GPT-4o model will support up to 10 million tokens per minute. The GPT-4o model API will support text and vision for now, with audio and video support coming soon. Also, the model has 128K context and an October 2023 knowledge cutoff.

How does GPT-4o perform when compared to Gemini and Claude?

For the past few days, OpenAI was testing a version of GPT-4o model on the LMSys arena as im-also-a-good-gpt2-chatbot. As you can see from the chart above, GPT-4o is the best model in the world right now and it is available for free for all ChatGPT users.

The new GPT-4o model also sets record in several standard AI benchmarks. Check it out below.

ModelPromptMMLUGPQAMATHHumanEvalMGSMDROP
(F1,3-shot)
OPENAI GPT4s
gpt-4ochatgpt188.753.676.690.290.583.4
gpt-4oassistant287.249.976.691.089.983.7
gpt-4-turbo-2024-04-09chatgpt86.549.172.287.688.685.4
gpt-4-turbo-2024-04-09assistant86.749.373.488.289.686.0
gpt-4-1106(-vision)-previewchatgpt84.642.164.182.286.581.3
gpt-4-1106(-vision)-previewassistant84.742.564.383.787.183.2
gpt-4-0125-previewchatgpt84.839.764.288.283.783.4
gpt-4-0125-previewassistant85.441.464.586.685.181.5
REFERENCE-RERUN
Claude-3-Opus (rerun w/ api)empty384.149.763.284.889.779.0
Claude-3-Opus (rerun w/ api)lmsys484.250.763.882.989.277.1
Llama3 70b (rerun w/ api)empty80.241.352.870.182.681.4
REFERENCE-REPORT(5-shot)
Claude-3-Opus (report5)unknown86.850.460.184.990.783.1
Gemini-Ultra-1.0 (report6)unknown83.7n/a53.274.479.082.4
Gemini-Pro-1.5 (report6)unknown81.9n/a58.571.988.778.9
Llama3 8b (report7)unknown68.434.230.062.2n/a58.4
Llama3 70b (report7)unknown82.039.550.481.7n/a79.7
Llama3 400b (still training, report7)unknown86.148.057.884.1n/a83.5

Developers can try out GPT-4o model at OpenAI Playground.

OpenAI is asking the public to help identify tasks where GPT-4 Turbo still outperforms GPT-4o, so they can continue to improve the model.

User forum

0 messages