How does Apple's OpenELM open-source model compare to Microsoft's Phi-3, parameters-wise?

Coincidence?

Reading time icon 2 min. read


Readers help support MSpoweruser. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help MSPoweruser sustain the editorial team Read more

Key notes

  • Apple released OpenELM on HuggingFace with eight variants.
  • Each model comes with different parameters: 270 million, 450 million, 1.1 billion, and 3 billion.
  • Microsoft’s Phi-3 model, on the other hand, includes versions with 3.8 billion, 7 billion, and 14 billion parameters.
Apple

Shortly after Microsoft launched the Phi-3 family, a set of small, open-source models designed for lighter use, Apple joined the train. The iPhone makers have (quietly) launched OpenELM, its latest open-source AI model. 

OpenELM, short for Open-source Efficient Language Models, comes in eight variants, each pre-trained and instruction-tuned gets four. Apple’s researchers said that the model uses a layer-wise scaling strategy to efficiently distribute parameters within each layer of the transformer model, and you can use these models on HuggingFace.

“For example, with a parameter budget of approximately one billion parameters, OpenELM exhibits a 2.36% improvement in accuracy compared to OLMo while requiring 2× fewer pre-training tokens,” the documentation reads.

As for its sizes, each model comes with different parameters: 270 million, 450 million, 1.1 billion, and 3 billion. And while it’s not always the best measurement standard, parameters in AI models are always the start in comparing them.

Frankly enough, OpenELM isn’t as impressive (parameters-wise) as other open-source models: Llama 3, which powers Meta AI, comes with a maxed-out parameters count of 70 billion, and Microsoft-backed Mixtral launched its 8x22B model with 176B parameters.

Phi-3-mini, the smallest version of Microsoft’s Phi-3 model, has 3.8 billion parameters and was trained for a week using Nvidia’s H100 GPUs. In comparison, the medium version has 14 billion parameters, and the small version has 7 billion parameters.