Claude 3 Opus beats OpenAI's GPT-4 in important chatbot ranking

Reading time icon 2 min. read

Readers help support MSpoweruser. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help MSPoweruser sustain the editorial team Read more

Key notes

  • Claude 3 Opus has beaten OpenAI’s GPT-4 to become number one in Arena ranking.
  • Claude 3 Opus has an Elo Score of 1253, slightly more than GPT-4.
  • The results are based on how satisfied users were with the outputs of several AI models.

Anthropic announced the Claude 3 model family earlier this month, claiming it can outclass OpenAI’s GPT-4. The company showed various performance metrics of the model and compared them with those of rival chatbots to draw that conclusion. Now, the Claude 3 supremacy also reflects on the Arena leaderboard.

Claude 3 Opus beats GPT-4 to become the number one

Claude 3 Opus has topped the LYMSYS Chatbot Arena ranking to push the GPT-4 model to the second position. The Claude 3 Opus gained an Elo score of 1253, slightly more than 1251 of GPT-4. It’s the same score that judges how skillful chess players are. But in this case, the benchmark scores are judging various AI models, not chess players.

However, the LYMSYS Chatbot Arena isn’t perfect. The benchmarking results that it shows are based on people’s voting. As such, the scores were updated after 70 thousand new votes. So, in theory, a better score should indicate that the overall output of the AI model was better. But a lot of the time, how good the output is depends on who’s viewing it. Users also complain that GPT-4 doesn’t load properly in Chatbot Arena (via Tom’sguide). Despite that, OpenAI held the first position all these years until it was ousted by the Claude 3 Opus a few hours ago.

While an updated Arena ranking will likely generate more interest in Anthropic’s AI models, OpenAI has plans to launch GPT-5 this summer, which is said to be “materially better”. If that turns out to be the case, OpenAI is likely to regain its top position on the Arena leaderboard.

More about the topics: Anthropic, Claude 3 Opus, GPT-4

Leave a Reply

Your email address will not be published. Required fields are marked *