Caught red-handed: Google's hypocrisy on AI training exposed

Home » News

2 min. read

Published on April 5, 2024

by Devesh Beri

published on April 5, 2024

Share this article

Improve this guide

Readers help support MSpoweruser. We may get a commission if you buy through our links.

Key notes

YouTube CEO accuses OpenAI of potentially violating terms by training AI with YouTube videos.
OpenAI stays silent on specific data sources for their AI video generator, Sora.
Google claims they respect creator contracts and only use publicly available data with permission for their AI, Gemini.

YouTube CEO Neal Mohan slams OpenAI, accusing them of potentially violating terms of service by using YouTube videos to train their AI video generator, Sora, which is a few months away from its release. While Mohan admits he has no concrete proof, he emphasizes that such use would clearly breach YouTube’s rules.

This accusation comes amidst a growing debate about the ethical sourcing of data for training AI models. OpenAI has remained tight-lipped about Sora’s specific training data sources, but companies compete to gather the most content possible to fuel their AI advancements. Both are currently at the top of their game in the field of AI.

From a creator’s perspective, when a creator uploads their hard work to our platform, they have certain expectations. One of those expectations is that the terms of service is going to be abided by. It does not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service. Those are the rules of the road in terms of content on our platform.

Mohan assures the public that when training its own AI model Gemini, Google adheres to individual contracts with creators before using any YouTube videos; it’s quite hypocritical, isn’t it? This raises questions about Google’s stance on data usage—it is protective of creators when it comes to competitors but utilizes similar tactics for its own benefit.

Barry Schwartz has summarized it correctly:

Google to publishers – we can use your content to train our search engines and AI
Google to OpenAI – you cannot use YouTube to train your AI https://t.co/iEUsGlov14
— Barry Schwartz (@rustybrick) April 4, 2024

This is how Google trains its LLMs like Gemini. It collects data from websites, articles, books, and other content. Complex algorithms analyze the data to improve language understanding. This helps AI models perform tasks such as translating languages more accurately, generating creative text, and answering questions.

It remains to be seen if OpenAI was indeed scraping YouTube content, but the situation exposes a potential double standard within Google’s ecosystem.

More here.

Devesh Beri

Tech Journalist

These are the things that motivate me - creating informative and helpful content, pursuing my passion for motorsports and music, engaging in expeditions, maintaining a healthy lifestyle, and spending time with my adorable cat Taco.

User forum

0 messages

Sort by:

Leave a Reply Cancel reply