Claude 3 Opus beats OpenAI's GPT-4 in important chatbot ranking

Home » News

2 min. read

Published on March 28, 2024

by Rahul

published on March 28, 2024

Readers help support MSpoweruser. We may get a commission if you buy through our links.

Key notes

Claude 3 Opus has beaten OpenAI’s GPT-4 to become number one in Arena ranking.
Claude 3 Opus has an Elo Score of 1253, slightly more than GPT-4.
The results are based on how satisfied users were with the outputs of several AI models.

Anthropic announced the Claude 3 model family earlier this month, claiming it can outclass OpenAI’s GPT-4. The company showed various performance metrics of the model and compared them with those of rival chatbots to draw that conclusion. Now, the Claude 3 supremacy also reflects on the Arena leaderboard.

Claude 3 Opus beats GPT-4 to become the number one

Claude 3 Opus has topped the LYMSYS Chatbot Arena ranking to push the GPT-4 model to the second position. The Claude 3 Opus gained an Elo score of 1253, slightly more than 1251 of GPT-4. It’s the same score that judges how skillful chess players are. But in this case, the benchmark scores are judging various AI models, not chess players.

[Arena Update]

70K+ new Arena votes?? are in!

Claude-3 Haiku has impressed all, even reaching GPT-4 level by our user preference! Its speed, capabilities & context length are unmatched now in the market?

Congrats @AnthropicAI on the incredible Claude-3 launch!

More exciting… pic.twitter.com/p1Guuf0B3K
— lmsys.org (@lmsysorg) March 26, 2024

However, the LYMSYS Chatbot Arena isn’t perfect. The benchmarking results that it shows are based on people’s voting. As such, the scores were updated after 70 thousand new votes. So, in theory, a better score should indicate that the overall output of the AI model was better. But a lot of the time, how good the output is depends on who’s viewing it. Users also complain that GPT-4 doesn’t load properly in Chatbot Arena (via Tom’sguide). Despite that, OpenAI held the first position all these years until it was ousted by the Claude 3 Opus a few hours ago.

While an updated Arena ranking will likely generate more interest in Anthropic’s AI models, OpenAI has plans to launch GPT-5 this summer, which is said to be “materially better”. If that turns out to be the case, OpenAI is likely to regain its top position on the Arena leaderboard.

Claude 3 Opus beats GPT-4 to become the number one

Leave a Reply