Apple's new LLM: MM1 could reduce the need for multiple prompts to get the desired result

Home » News

2 min. read

Published on March 18, 2024

by Devesh Beri

published on March 18, 2024

Share this article

Improve this guide

Readers help support MSpoweruser. We may get a commission if you buy through our links.

Key notes

Apple’s MM1 is a new AI model that trains on text and image data, potentially powering Siri 2.0.
MM1 uses a multimodal approach to achieve better performance and reduce need for multiple prompts.
MM1’s unique architecture and MoE model allow it to run on devices like iPhones.

Apple has been relatively quiet about its work in large language models (LLMs), but a new research paper suggests they’re catching up quickly. MM1 is a new method for training AI models that combine text and image data, which could speed up training and reduce the need for multiple prompts to get the desired result. This comes days after Apple acquired DarwinAI.

What is MM1?

MM1 is a family of AI models, with the largest one reaching 30 billion parameters (smaller than some competitors but still powerful). Parameters refer to the numerical values that the model uses to learn and represent the world. A higher number of parameters generally indicates a more complex model that can handle a wider range of tasks and produce more nuanced outputs.

It focuses on multimodal learning, meaning it can process and understand both text and images. This could be a big leap forward for Siri, allowing it to better understand your requests and respond with more relevant information. Last month, Apple introduced an AI image manipulator as well.

The researchers behind MM1 argue that combining different types of training data leads to better performance. MM1 uses a mix of image captions, text-only data, and visual question answering to train the model. This allows MM1 to perform tasks like image captioning, visual question answering, and natural language understanding.

MM1 uses a unique architecture with higher image resolution encoders and a different approach to pre-training and labeling data. It also uses a mixture-of-experts (MoE) model to scale up while keeping processing requirements low, which means it could potentially run on devices like iPhones and laptops.

The research paper doesn’t explicitly mention Siri, but the focus on efficiency, minimal prompting, and multimodal capabilities hints at Apple’s direction for Siri’s future. Earlier, a leaker suggested a smarter Siri with GenAI subscription offerings.

With Apple bringing other LLMs like Gemini to the iPhone, it seems Apple is taking a multi-pronged approach to AI advancements.

More here.

Devesh Beri

Tech Journalist

These are the things that motivate me - creating informative and helpful content, pursuing my passion for motorsports and music, engaging in expeditions, maintaining a healthy lifestyle, and spending time with my adorable cat Taco.

User forum

0 messages

Sort by:

Leave a Reply Cancel reply