In April, OpenAI announced that it would roll back an update to the GPT-4o model which made ChatGPT’s answers to user queries too sycophantic.
A model that is overly agreeable or flattering is more than annoying. It could reinforce incorrect beliefs of users, mislead others, and spread misinformation which can be dangerous. This is a risk that is particularly high when more young people use ChatGPT to guide their lives. Sycophancy can be difficult to detect and therefore go unnoticed before a model is deployed.
A benchmark called Elephant, which measures the sycophantic traits of major AI models, could help companies avoid this issue in the future. Knowing when models are sycophantic doesn’t suffice; you have to be able do something about it. This is harder. Read the whole story.
–Rhiannon William
AI Hype Index
It’s not always easy to separate AI reality from hyped up fiction. We created the AI Hype Index, a simple summary that gives you all the information you need about the current state of the industry. Here’s the latest edition of the AI Hype Index.
The must-reads
I’ve combed the internet to find you today’s most fun/important/scary/fascinating stories about technology.
Meta and 1 Anduril are partnering to develop an advanced weapons system.
EagleEye VR headsets will improve soldiers’ hearing, vision and hearing. ( WSJ ($)
+ Palmer Luckey is trying to turn “warfighters” into technomancers. (TechCrunch)
+ Luckey has buried the hatchet with Mark Zuckerberg. Palmer Luckey discusses the future of mixed reality at the Pentagon. MIT Technology Review
A new Texas law mandates app stores to verify the age of users
This follows Utah, which passed a similar law in March. (NYT $)
+ Apple is pushing back against the law. CNN
What will happen to DOGE? It has lost both its leader and top lieutenant in the space of one week. (WSJ$)
+ Musk’s departure raises concerns about how much power the company will have without him. The Guardian
+ DOGE’s tech takeover poses a threat to the safety and stability our critical data. MIT Technology Review
NASA’s ambitions for a moon landing in 2027 are looking less probable
They need SpaceX’s Starship which keeps blowing up. (WP$)
+ Does there exist a viable alternative to the WP $? New Scientist ($)
Five students are using AI to create nude images of one another
This is a grave problem that has no solution. (404 Media)
Google AI Overviews does not know the year
One year after its launch, this feature still makes obvious mistakes. (Wired $)
+ Google’s new AI-powered Search isn’t able to handle basic queries. NOW ($)
+ Google is pushing AI in everything. Will it pay off in the end? Vox
+ Why Google’s AI Overviews get things wrong. MIT Technology Review
Hugging Face has developed two humanoid robotic machines
These machines are open-source, meaning anyone can create software for them. TechCrunch (19459031)
Eight A popular vibe coding application has a major flaw
even though it was notified months ago. Semafor (19459031])
+ AI coding programs for amateurs are all affected by the same problem. What is vibration coding? MIT Technology Review
[9AI-generatedvideoshavebecomemorerealistic[9AI-generatedvideoshavebecomemorerealistic
but not when it comes down to gymnastics. This electronic tattoo measures stress levels. (IEEE Spectrum)
Today’s Quote
Sarah Gardner, CEO of Heat Initiative, a child safety collective, told the Washington Post that Texas’ new app-store law could be a turning point in Apple’s history.
Another thing
House-flipping algorithms will soon be in your neighborhoodWhen Michael Maxson found the home of his dreams in Nevada, it wasn’t owned by an individual but by Zillow, a tech company. When Maxson went to inspect the property, he found it badly damaged by a large water leak. Maxson offered to make the expensive repairs himself but found out that the house was already sold to another family at the same price.
Zillow lost $420 million during this period of unprofitable sales and erratic buying, leading analysts to doubt the viability of the entire tech-driven business model. For the rest, the bigger question is: Does the arrival Silicon Valley tech signal a better future in housing or a disruption to be feared? Read the full article.
–Matthew Ponsford