OpenAI overruled the concerns of expert testers and released sycophantic GPT-4o (

).

Credit: VentureBeat created with Midjourney

Subscribe to our daily and weekly emails for the latest updates on AI. Learn More


This week has been a bit of an upheaval for the number-one generative AI company based on users.

OpenAI (creator of ChatGPT) released and then withdrawn an updated version of GPT-4o (the underlying large language model) that ChatGPT was hooked up to by default. This is because it was too sycophantic for users. The company recently reported that At least 500 million active users are using the popular web service each week.

The terrible, no-good, sycophantic GPT-4o Update

OpenAI started updating GPT-4o on April 24th. They completed the update by April 25th. Five days later they rolled it back to the previous model on April 29th after receiving complaints from users on social media, mainly X and Reddit.

While the complaints varied in intensity, they all centered around the fact that GPT-4o seemed to respond to user queries by displaying undue flattery and support for misguided and incorrect ideas. It also appeared to “glaze” or praise the user excessively when it was not requested. ChatGPT powered with the updated GPT-4o model, as shown in screenshots and posts by users, praised a business plan for literally “shit on stick,” endorsed and praised a sample text from a user describing schizophrenic isolation, and even supported alleged terrorist plans.

Users, including top AI researchers and a former OpenAI interim chief executive officer, expressed concern that an AI model’s unabashed praise for these terrible user prompts could be more than just annoying or inappropriate. It could even cause harm to users if they mistakenly believe the AI and feel emboldened by it. It amounted to an AI safety concern.

OpenAI then published a blog explaining what went wrong – “we focused too heavily on short-term feedback and did not fully take into account how users’ interactions evolve with ChatGPT over time.” GPT-4o skewed toward responses that were overly positive but disingenuous”, and the steps taken by the company to address these issues. Joanne Jang, OpenAI’s Head for Model Behavior, also took part in a Reddit AMA forum where she answered text posts from users. She revealed more information about OpenAI’s approach to GPT-4o. This included not “bak[ing] ing in enough nuance” in how it incorporated user feedback like “thumbs-up” actions made by the users in response to the model outputs that they liked.

Today, OpenAI has published a blog post that provides even more details about the sycophantic GPT-4o Update.

The CEO and cofounder Sam Altman is also credited as “OpenAI.” The blog post on X was linked to by a comment that said: “We missed the mark last week with the GPT-4o Update.” What happened, what we learned and some things that we will do differently going forward.

The new OpenAI blog post on how and why GPT-4o became so sycophantic.

As a regular user of ChatGPT, including the 4o Model, I find it most interesting that the company seems to admit that they received concerns about the model before its release, but that the company overrode them in favor of an enthusiastic response from a larger group of users.

The company writes:

While we’ve been discussing risks related to sycophancy for a while, the model’s new tone and style was more important to some of our expert testers. Some expert testers had said that the model’s behavior “felt”slightly off…

Then we had to make a decision: Should we delay deploying this update, despite positive evaluations, A/B tests results, and based solely on the subjective flags raised by the expert testers. We decided to launch the model because of the positive feedback from users who tested it.

This was a bad decision. While user feedback is important to our decisions, we are ultimately responsible for interpreting that feedback correctly. Why have expert testers when you don’t value their expertise more than the crowd? Altman has not yet responded to my question about his choice of X .

Not all reward signals are created equal

OpenAI’s new post-mortem post also reveals how the company trains, updates, and changes existing models. It also explains how human feedback affects the model’s character and personality. releasedfive major updates focused on personality and helpfulness. Each update involves a new post-training and many minor adjustments to model training are independently tested, then combined into one updated model that is then evaluated for release.

We post-train a model by taking a pre-trained one, fine-tuning it with a large set of ideal responses, either written by humans or models already in use, and then running reinforcement learning using reward signals from various sources.

We ask the language model to write a response when we use reinforcement learning. We then rate the model’s response based on the reward signals and update the language to make it more likely for it to produce higher-rated answers and less likely for it to produce lower-rated ones.

It is clear that the “rewards signals” used by OpenAI in post-training have a huge impact on the model behavior. As the company admitted before, when it overweighted the “thumbs up”it may not be the most appropriate signal to use equally when determining how to learn to communicate and what kinds of responses it should be serving. OpenAI admits as much in the next paragraph, where it writes:

‘Defining the right set of reward signals can be a difficult task. We take into account many factors, including whether the answers are correct, helpful, and in line with our goals. Model Spec– Are they safe, are users happy with them, etc. We’re always trying out new reward signals to improve ChatGPT models. Each one has its own quirks.

OpenAI also revealed that the “thumbs-up” reward signal used in this update was a brand new one.

The update introduced an extra reward signal based upon user feedback – thumbs-up and down data from ChatGPT. This signal can be useful, as a thumbs down usually means that something went wrong.

Yet, it is important to note that the company does not blame the new “thumbs-up” data for the model failure and ostentatious behavior. OpenAI’s blog says that a combination of and a variety other new and old reward signals led to the problem: we were able to make improvements to better integrate user feedback, fresher data and memory. We believe that these changes, while beneficial individually, could have contributed to the sycophancy problem when combined. I wrote on X another example to show how subtle changes in reward incentive and model guidelines can have a dramatic impact on model performance:

I had a disagreement early on at OpenAI with a co-worker (who is now the founder of another lab), over using the term “polite.” in a prompt I wrote.

The argument was that “polite”which is a word that is politically incorrect, should be replaced with “helpful.” I responded that focusing on helpfulness could make a model too compliant. It can even become sexually explicit within a few turns.

I demonstrated this risk with a simple swap.

The prompt kept “polite.” These models are weird.

OpenAI’s plans to improve model testing going forward

According to the company, there are six improvements to the process that will help to avoid future undesirable model behavior. But the one I find most important is:

We’ll adjust our safety process to consider behavior issues, such as hallucination and deception. We will also consider personality and reliability concerns. Even if the issues aren’t quantifiable, we will block launches based on qualitative signals or proxy measurements, even if metrics like A/B tests look good.

OpenAI acknowledges that despite the importance of data, particularly quantitative data, to the fields machine learning and artificial intelligent, this alone cannot and should not be used to judge a model’s effectiveness.

Although many users giving a “thumbs-up” could signal a desirable behavior in the near term, the long-term implications for how AI models respond and where these behaviors lead it and its users could ultimately lead to an extremely dark, distressing and destructive place. It’s not always better to have more signals, especially if you limit the “more” only to a few domains.

The model’s performance in all tests and the positive feedback from users is not enough. The expertise of power users, who are trained to recognize anomalies, as well as their qualitative feedback, that something was “off” about the model even if the users couldn’t explain why, should be given much more weight.

We hope that the company, and the entire field, learn from this incident and incorporate the lessons moving forward.

Takeaways for enterprise decision makers

Perhaps more theoretically speaking, it also shows why expertise is important — and in particular, expertise in areas that are beyond orthat are outside the one you’re optimising for (in this instance, machine learning and AI). It is the diversity of expertise which allows us to make new discoveries that are beneficial to our species. The STEM field, for example, should not be viewed as superior to the arts or humanities.

Lastly, I think it also reveals a fundamental issue with using human feedback when designing products and services. Users may like a more sycophantic artificial intelligence based on isolated interactions, just as they may like the taste of fast food, the convenience and entertainment they get from social media and the tribalism they feel after reading tabloid gossip or politicized media. Taken together, these trends and activities can lead to undesirable outcomes for both individuals and society. For example, obesity and poor health are often associated with fast food. Plastic waste is associated with pollution and endocrine disruptors. Social media overuse leads to depression and isolation. Poor quality news sources can also lead to a less informed public and a more splintered body.

AI models designers and technical decision makers at enterprises should keep this broader concept in mind when designing metrics for any measurable goal. Even when you think that you are using data to your benefit, it can backfire in ways that you didn’t expect or anticipate.

VB Daily provides daily insights on business use-cases

Want to impress your boss? VB Daily can help. We provide you with the inside scoop a bout what companies are doing to maximize ROI, from regulatory changes to practical deployments.

Read our privacy policy

Thank you for subscribing. Click here to view more VB Newsletters.

An error occured.

www.aiobserver.co

More from this stream

Recomended