Meta's upcoming Llama-3 400B model could potentially beat GPT-4 Turbo and Claude 3 Opus

It doesn't exceed them, but it has a potential

Home » News

2 min. read

Published on April 19, 2024

by Rafly Gilang

published on April 19, 2024

Share this article

Improve this guide

Readers help support MSpoweruser. We may get a commission if you buy through our links.

Key notes

Meta unveils Llama-3, its yet most powerful model with 700B parameters
Llama-3 shows potential for improvement despite being in training phase
Recent numbers suggest that it’s close to Claude 3 Opus and GPT-4 Turbo in benchmarks

Meta is set to launch its yet most powerful AI model, the Llama-3 with 400B parameters. In its announcement on Thursday, the open-source model will soon power the Meta AI assistant tool that’s coming to WhatsApp and Instagram.

But the truth is, there are plenty of powerful AI models in the market at the moment. GPT-4 Turbo with a 128k context window from OpenAI has been around for quite some time, and Claude 3 Opus from Anthropic is now available on Amazon Bedrock.

So, how do these models compare to one another, based on several benchmarks? Here’s a comparison of how these powerful models tested in several options. These figures are taken from publicly available info and Meta’s announcement.

Benchmark	Llama 3 400B	Claude 3 Opus	GPT-4 Turbo	Gemini Ultra 1.0	Gemini Pro 1.5
MMLU	86.1	86.8	86.5	83.7	81.9
GPQA	48	50.4	49.1	–	–
HumanEval	84.1	84.9	87.6	74.4	71.9
MATH	57.8	60.1	72.2	53.2	58.5

As you can see, Llama-3 400B actually does fall slightly short in these benchmarks, scoring 86.1 in MMLU, 48 in GPQA, 84.1 in HumanEval, and 57.8 in MATH.

But, given that it’s still in the training phase, there’s a good possibility for big improvements once it’s fully deployed. And for an open-source model, that’s way beyond impressive.

MMLU tests how well models understand different subjects without directly teaching them, covering a wide range of topics. GPQA, on the other hand, sorts models on how well they’re doing in biology, physics, and chemistry, while HumanEval focuses on how they code.

Rafly Gilang

Tech Reporter

Rafly is a reporter with years of journalistic experience, ranging from technology, business, social, and culture. Currently reporting news on Microsoft-related products, tech, and AI on MSPowerUser. Got a tip? Send it to [email protected]

User forum

0 messages

Sort by:

Leave a Reply Cancel reply