While others train their AI models on news posts, Mark Zuckerberg trains on users' private data

Home » News

Devesh Beri

Tech Journalist

News

2 min. read

Updated on October 31, 2025

In Meta Platforms Inc.’s latest financial results, Mark Zuckerberg unveiled plans to harness the extensive user data from Facebook and Instagram to advance the development of powerful artificial intelligence. While the ambition to create “general intelligence” software systems is not new, Zuckerberg’s approach of utilizing private user data for AI training sets a distinctive course in the field.

Zuckerberg highlighted the unparalleled wealth of data available on Facebook and Instagram, consisting of hundreds of billions of images, tens of billions of videos, and a vast number of text posts. This reservoir of user-generated content provides a unique opportunity to enhance AI capabilities, particularly in conversational agents and chatbots, but we all know at what cost.

This comes after OpenAI and Microsoft faced lawsuits from The New York Times and Time magazine stating that the former have been training their models on the latter’s data.

There are valid concerns regarding privacy, ethics, and content moderation. Exploiting user data for AI development prompts questions about protecting the privacy rights of Facebook’s 3 billion users and Instagram’s 1.5 billion users. Challenges could also be associated with addressing biases and toxicity within the data, echoing past content moderation issues faced by the platform.

Zuckerberg’s commitment to building “general intelligence” software systems has been met with caution, especially considering the potential reputational damage Meta could face for utilizing private user data. The complexity of handling such vast amounts of data and the necessity of compliance with global data protection laws add another challenge to this ambitious endeavor.

The next key part of our playbook is learning from unique data and feedback loops in our products… On Facebook and Instagram, there are hundreds of billions of publicly shared images and tens of billions of public videos, which we estimate is greater than the Common Crawl dataset and people share large numbers of public text posts in comments across our services as well.

While acknowledging the potential advancements in AI that could result from this approach, the article maintains a neutral and objective tone, emphasizing the need for careful consideration of ethical implications and privacy concerns.

It ultimately underscores the importance of navigating these challenges to prevent negative consequences for users and to maintain the public’s trust.

More here.

Devesh Beri

Tech Journalist

These are the things that motivate me - creating informative and helpful content, pursuing my passion for motorsports and music, engaging in expeditions, maintaining a healthy lifestyle, and spending time with my adorable cat Taco.

Readers help support MSpoweruser. We may get a commission if you buy through our links.

Improve this guide

User forum

0 messages

Sort by:

Leave a Reply Cancel reply