Codestral, Mistral AI's new code model, is better than Llama 3 70B & Code Llama 70B

It (may be) the best model for coding in the market.

Reading time icon 2 min. read


Readers help support MSpoweruser. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help MSPoweruser sustain the editorial team Read more

Key notes

  • Mistral AI launches Codestral, a code model with 22B parameters and 32k context length.
  • Codestral outperforms Meta’s Code Llama 70B in various benchmarks.
  • It’s available as an open-source model via HuggingFace or Mistra’s Le Chat.

Mistral AI has been one of the most exciting AI startups in recent months, and the Microsoft-backed company has recently launched yet another model. It’s called Codestral, the company’s debut foray into creating code models, with 22B perimeters & 32k context length, and is fluent in over 80 programming languages.

From the look of it, however, it appears that Codestral is a particularly more proficient option compared to existing market offerings like Meta’s Code Llama 70B, which debuted in August last year and open-sourced earlier this year. A side-by-side benchmark comparison between some of the AI models for coding tells this story.

Codestral, which is now available as an open-source model via HuggingFace or through Mistra’s ChatGPT-style Le Chat, strikes better in HumanEval, a benchmark used to measure the functionality of codes generated. It’s better than CodeLlama 70B, DeepSeek Coder 33B, and Llama 3 70B in Python, C++, bash, Java, and PHP.

CruxEval-O and RepoBench, other benchmarks used as a comparison, also reveal that Codestral may be the best code AI model for now. Respectively, these two benchmarks test how well LLM reasons, understands, and evaluates codes, as well as its auto-completion systems.

Or, maybe, at least until OpenAI’s Code Interpreter is launched after sitting in beta testing for so long.

“Codestral is trained on a diverse dataset of 80+ programming languages, including the most popular ones, such as Python, Java, C, C++, JavaScript, and Bash. It also performs well on more specific ones like Swift and Fortran,” says the French company.