Caught red-handed: Google's hypocrisy on AI training exposed

Reading time icon 2 min. read


Readers help support MSpoweruser. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help MSPoweruser sustain the editorial team Read more

Key notes

  • YouTube CEO accuses OpenAI of potentially violating terms by training AI with YouTube videos.
  • OpenAI stays silent on specific data sources for their AI video generator, Sora.
  • Google claims they respect creator contracts and only use publicly available data with permission for their AI, Gemini.

YouTube CEO Neal Mohan slams OpenAI, accusing them of potentially violating terms of service by using YouTube videos to train their AI video generator, Sora, which is a few months away from its release. While Mohan admits he has no concrete proof, he emphasizes that such use would clearly breach YouTube’s rules.

This accusation comes amidst a growing debate about the ethical sourcing of data for training AI models. OpenAI has remained tight-lipped about Sora’s specific training data sources, but companies compete to gather the most content possible to fuel their AI advancements. Both are currently at the top of their game in the field of AI.

From a creator’s perspective, when a creator uploads their hard work to our platform, they have certain expectations. One of those expectations is that the terms of service is going to be abided by. It does not allow for things like transcripts or video bits to be downloaded, and that is a clear violation of our terms of service. Those are the rules of the road in terms of content on our platform.

Mohan assures the public that when training its own AI model Gemini, Google adheres to individual contracts with creators before using any YouTube videos; it’s quite hypocritical, isn’t it? This raises questions about Google’s stance on data usage—it is protective of creators when it comes to competitors but utilizes similar tactics for its own benefit.

Barry Schwartz has summarized it correctly:

This is how Google trains its LLMs like Gemini. It collects data from websites, articles, books, and other content. Complex algorithms analyze the data to improve language understanding. This helps AI models perform tasks such as translating languages more accurately, generating creative text, and answering questions.

It remains to be seen if OpenAI was indeed scraping YouTube content, but the situation exposes a potential double standard within Google’s ecosystem.

More here.