OpenAI uses the “CriticGPT” model to find errors in ChatGPT responses

Image via Pixabay

ChatGPT took the tech world by storm when it arrived during the final months of 2022. The launch was big enough to shake things up under Google’s roof and search introduced its generative AI offering. While ChatGPT doesn’t suggest adding glue to your pizza, the chatbot that does it all isn’t perfect and can make mistakes.

One of the tasks that ChatGPT can perform is writing code snippets after receiving prompts from the user. OpenAI has trained a GPT-4-based AI model called CriticGPT to find errors in the output code provided by the chatbot. It can write critiques that highlight inaccuracies in ChatGPT’s responses. The model is being used internally, and OpenAI has published a research paper to describe it in detail.

CriticGPT is intended to help human AI trainers whose job is to train and improve GPT-4 responses using a technique called Reinforcement Learning from Human Feedback (RLHF). It involves AI trainers rating different ChatGPT responses against each other.



However, things are getting more difficult for AI trainers as ChatGPT becomes more precise and its bugs more subtle. “This is a fundamental limitation of RLHF, and can make it increasingly difficult to align models as they gradually gain more knowledge than anyone who can provide feedback,” OpenAI said.

CriticGPT comes in to save the day, but it’s still an AI model and its answers may not always be correct. He is also susceptible to AI problems such as hallucinations; However, the model can help humans become better at pointing out errors than when doing the work on their own.

OpenAI said that “a second random trainer preferred criticism from the Human+CriticGPT team over that from an unassisted person more than 60% of the time.” CriticGPT was also trained using RLHF and was tasked with analyzing and criticizing a large number of inputs containing errors.

The model had to find errors deliberately inserted by humans and “naturally occurring” ChatGPT errors previously detected by a trainer. There are some limitations that OpenAI is currently working to remove.

CriticGPT was trained using short responses from ChatGPT, and new methods need to be developed that can help trainers understand long and complex tasks. Hallucinations could have consequences, as trainers who see them could make labeling errors.




Currently, CriticGPT has an eagle eye when trying to spot errors in ChatGPT responses. OpenAI notes that real-world errors can spread across many parts of a response, something it will need to address in the future.