OpenAI has introduced a new model called CriticGPT, which is based on GPT-4. Unlike other consumer-facing models, CriticGPT is designed to critique ChatGPT responses, helping human trainers identify errors during reinforcement learning from human feedback (RLHF).
CriticGPT is based on GPT-4 and aims to help human trainers in OpenAI detect errors in ChatGPT code output. OpenAI claims that CriticGPT reviewed code can outperform unreviewed code by 60%.
The company is currently integrating CriticGPT-like models into the RLHF labeling process to help AI trainers evaluate the results of advanced AI systems.
According to OpenAI, models like CriticGPT can improve the accuracy of ChatGPT. Additionally, the new model can identify errors that humans might miss as the models gain more knowledge.
The CriticGPT training process involved manually editing the code generated by ChatGPT, introducing new errors into the code along with sample comments to train the model to identify common and less common errors easily.
Just like human suggestions, CriticGPT suggestions are not always correct. However, combining human input with CriticGPT feedback is said to outperform unassisted human trainers and help trainers write comprehensive critiques while producing fewer errors.
OpenAI also stated that CriticGPT could spread real-world errors into many parts of the response and cannot evaluate an extremely complex task or response.
Overall, this new AI model is expected to help human trainers produce better RLHF data for GPT-4, and OpenAI plans to expand this work further.
(tags to translate)ChatGPT