Apple's new LLM: MM1 could reduce the need for multiple prompts to get the desired result

Reading time icon 2 min. read


Readers help support MSpoweruser. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help MSPoweruser sustain the editorial team Read more

Key notes

  • Apple’s MM1 is a new AI model that trains on text and image data, potentially powering Siri 2.0.
  • MM1 uses a multimodal approach to achieve better performance and reduce need for multiple prompts.
  • MM1’s unique architecture and MoE model allow it to run on devices like iPhones.
Apple

Apple has been relatively quiet about its work in large language models (LLMs), but a new research paper suggests they’re catching up quickly. MM1 is a new method for training AI models that combine text and image data, which could speed up training and reduce the need for multiple prompts to get the desired result. This comes days after Apple acquired DarwinAI.

What is MM1?

MM1 is a family of AI models, with the largest one reaching 30 billion parameters (smaller than some competitors but still powerful). Parameters refer to the numerical values that the model uses to learn and represent the world. A higher number of parameters generally indicates a more complex model that can handle a wider range of tasks and produce more nuanced outputs.

It focuses on multimodal learning, meaning it can process and understand both text and images. This could be a big leap forward for Siri, allowing it to better understand your requests and respond with more relevant information. Last month, Apple introduced an AI image manipulator as well.

The researchers behind MM1 argue that combining different types of training data leads to better performance. MM1 uses a mix of image captions, text-only data, and visual question answering to train the model. This allows MM1 to perform tasks like image captioning, visual question answering, and natural language understanding.

MM1 uses a unique architecture with higher image resolution encoders and a different approach to pre-training and labeling data. It also uses a mixture-of-experts (MoE) model to scale up while keeping processing requirements low, which means it could potentially run on devices like iPhones and laptops.

The research paper doesn’t explicitly mention Siri, but the focus on efficiency, minimal prompting, and multimodal capabilities hints at Apple’s direction for Siri’s future. Earlier, a leaker suggested a smarter Siri with GenAI subscription offerings.

With Apple bringing other LLMs like Gemini to the iPhone, it seems Apple is taking a multi-pronged approach to AI advancements.

More here.