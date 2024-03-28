Read the affiliate disclosure page to find out how can you help MSPoweruser effortlessly and without spending any money. Read more

Anthropic announced the Claude 3 model family earlier this month, claiming it can outclass OpenAI’s GPT-4. The company showed various performance metrics of the model and compared them with those of rival chatbots to draw that conclusion. Now, the Claude 3 supremacy also reflects on the Arena leaderboard.

Claude 3 Opus beats GPT-4 to become the number one

Claude 3 Opus has topped the LYMSYS Chatbot Arena ranking to push the GPT-4 model to the second position. The Claude 3 Opus gained an Elo score of 1253, slightly more than 1251 of GPT-4. It’s the same score that judges how skillful chess players are. But in this case, the benchmark scores are judging various AI models, not chess players.

However, the LYMSYS Chatbot Arena isn’t perfect. The benchmarking results that it shows are based on people’s voting. As such, the scores were updated after 70 thousand new votes. So, in theory, a better score should indicate that the overall output of the AI model was better. But a lot of the time, how good the output is depends on who’s viewing it. Users also complain that GPT-4 doesn’t load properly in Chatbot Arena (via Tom’sguide). Despite that, OpenAI held the first position all these years until it was ousted by the Claude 3 Opus a few hours ago.

While an updated Arena ranking will likely generate more interest in Anthropic’s AI models, OpenAI has plans to launch GPT-5 this summer, which is said to be “materially better”. If that turns out to be the case, OpenAI is likely to regain its top position on the Arena leaderboard.