OpenAI proposes a second neural network to detect errors in ChatGPT code

Critigpt points out a bug in ChatGPT in June 2024

CriticGPT is a neural network-based artificial intelligence model that criticizes the code created by ChatGPT and points out errors in the code.

Open AI

The problem of hallucinations – artificial intelligence (AI) models that assert falsehoods under a guise of authority – has led some academics to conclude that generative AI simply cannot detect and correct its errors.

In a paper published last October, researchers at Google’s DeepMind argued that “LLMs are not yet capable of self-correcting their reasoning.”

Also: If AI is so amazing, why does ChatGPT fail at this simple image editing task?

However, ChatGPT creator OpenAI disagrees with this claim and last week the company offered a version of GPT-4, called CriticGPT, which it claims can help find and fix bugs to improve the overall accuracy of the model.

The results are encouraging for human teams cleaning up code with the help of AI. However, the results also suggest that there is no way to avoid hallucinations that robots performing the task can cause.

Also: Generative AI can’t find its own errors. Do we need better guidance?

The CriticGPT scenario is writing programming code: the researchers propose CriticGPT as a second neural network that caches instances where ChatGPT makes mistakes in the code it generates.

They focus on writing code because, as they say, computer code is “crisp” – it has clear answers of right and wrong. Additionally, OpenAI as an organization hopes to use generative AI as “an alignment research assistant,” to automate some of the setting of guardrails for emerging technology. Writing code is already a big use of generative AI, so it’s a worthy goal to aim for.

In the paper posted to the arXiv preprint server, “LLM critics help detect LLM errors,” lead author Nat McAleese of OpenAI and colleagues describe what they call “the first demonstration of a simple, scalable supervision method that helps humans more comprehensively detect problems in real-world RLHF data.”

RLHF (reinforcement learning from human feedback) refers to a well-known practice of subjecting chatbots to responses from humans to make their output more acceptable. It is one of the ways OpenAI and others have set up guardrails to try to prevent unwanted behavior.

In this case, CriticGPT is subject to feedback from hired human programmers who review CriticGPT-generated critiques of programming code. The humans rate the generated critiques for relevance, specificity, comprehensiveness, and more. CriticGPT is trained to refine critiques based on human feedback to get closer to a higher passing score.

Also: Is AI lying to us? These researchers built a kind of lie detector to find out

However, McAleese and his team went a step further. They introduced some deliberate errors into the code that CriticGPT reviewed, which were deliberately inserted by human contractors. The researchers wanted the contractors to explain their errors and for CriticGPT to digest those explanations and learn to associate the errors with the explanations.

The hope was that CriticGPT would improve as it produces bug descriptions that come closer to what human contractors have written about known bugs.

The result of the training, McAleese and his team write, is that ChatGPT finds more bugs than human code reviewers. CriticGPT “greatly improves the rate at which inserted bugs are detected, as both LLM critics (ChatGPT and CriticGPT) detect many more bugs than human annotators,” they write.

They note that even human contractors prefer what the machine generates in code analysis over what their human colleagues write.

“Reviews written by CriticGPT are substantially preferred by contractors over reviews generated by ChatGPT and over human-written reviews from our contractor team based on overall rating.”

The AI ​​model helps human contractors enrich their bug reviews, a type of human-augmenting AI output that should please everyone: “Human+CriticGPT teams write substantially more comprehensive reviews than humans alone, and CriticGPT improves comprehensiveness over ChatGPT on both human-injected and detected bugs.”

As the authors write in a companion blog post, “CriticGPT’s suggestions aren’t always correct, but we found that they can help trainers catch many more problems with model-written answers than they could without the help of AI.”

Also: Can AI code? Only in small steps

But there’s a problem. Just as ChatGPT and various AI models can “hallucinate” incorrect statements, it turns out that CriticGPT can also claim to identify errors that don’t exist.

“However, we found that the rate of nitpicking and hallucinations is much higher in models than in humans, although CriticGPT can substantially reduce this rate relative to ChatGPT,” they write.

criticalgpt-error-hallucinated

CriticGPT hallucinating a bug in human code.

Open AI

That’s a dilemma: the better the AI ​​model is at detecting errors, the more it appears to hallucinate errors: “Unfortunately, it is not obvious what the right balance is between hallucinations and error detection for a general RLHF system that uses feedback to improve model performance.”

And it’s not easy to find the middle ground, they note, because “an ideal experiment would run completely separate critically enhanced RLHF data collection cycles for each precision/recall point; but this is prohibitively expensive.”

In the midst of the gap, McAleese and his team came up with a compromise: forced-sampling beam search attempts to extract the most valuable reviews from CriticGPT while minimizing the number of fake reviews.

Among the potential problems with OpenAI’s approach is that Critic GPT’s training relies on the insertion of deliberate errors by humans. That approach, McAleese and his team write, differs from LLM’s natural error distribution.

“Training models to insert subtle problems into the distribution (rather than paying humans to insert errors) may be able to mitigate this concern, but we leave those directions for future work.”

Also: From AI trainers to ethicists: AI may make some jobs obsolete, but create new ones

Therefore, the problem will always be how to start automation without human help.

Another problem (which the authors don’t mention) is that, as with everything related to OpenAI, neither the new CriticGPT model nor its training data are publicly available: everything is closed, there is no source code to examine and no datasets for others to download. That closure means there is little to no chance for outside ethics or security experts to scrutinize the corrections made by the CriticGPT model.

With no oversight from anywhere outside of OpenAI, the saying goes, who’s going to police the watchmen?