OpenAI finds GPT-4 human reviewers aided by CriticGPT outperform non-AI counterparts

OpenAI is open for criticism

Reading time icon 2 min. read


Readers help support MSpoweruser. We may get a commission if you buy through our links. Tooltip Icon

Read our disclosure page to find out how can you help MSPoweruser sustain the editorial team Read more

Key notes

  • OpenAI’s new CriticGPT, based on GPT-4, critiques ChatGPT’s code to assist human trainers.
  • It improves trainer performance by 60% compared to non-assisted reviews.
  • CriticGPT critiques are preferred 63% of the time over ChatGPT’s due to fewer nitpicks and hallucinations.
OpenAI

Not too long after releasing the ChatGPT desktop app on macOS, OpenAI has just launched yet another model. It’s called CriticGPT, based on GPT-4, and it lets you identify and critique errors in the popular AI chatbot’s code outputs to help human trainers during feedback.

The Microsoft-backed company explains that CriticGPT-assisted human trainers were able to outperform their unassisted counterparts by 60%. But, still, despite the reduction of hallucinated issues, CriticGPT still needs some criticism, especially when handling complex tasks and dispersed errors.

An AI sure does know how to automate itself, but human reviewers are still needed, that’s why even Google still explicitly says that they’re using human reviewers to review how AI is used in the browsing history section of Chrome.

So, similar to how ChatGPT is trained, CriticGPT also learns through human feedback, focusing on spotting errors deliberately inserted into code generated by ChatGPT. AI trainers then evaluated CriticGPT’s ability to find these intentional errors and naturally occurring bugs caught by other trainers.

The results showed that CriticGPT’s critiques were preferred over ChatGPT’s in 63% of cases for naturally occurring bugs, as it generated fewer unhelpful nitpicks and hallucinations.

“In our research on CriticGPT, we found that applying RLHF to GPT-4 has promise to help humans produce better RLHF data for GPT-4. We are planning to scale this work further and put it into practice,” OpenAI promises.