In the first clinical trial, a therapy bot using generative AI was as effective as a human therapist for participants with anxiety, depression, or a risk of developing eating disorders. It doesn’t, however, give the green light to the dozens companies that are hyping these technologies while operating in a gray area of regulation.
A group of psychiatric researchers at Dartmouth College and psychologists from the Geisel School of Medicine built the tool called Therabot. The results were impressive. Published in the New England Journal of Medicine on March 27,Many tech companies have developed AI tools for therapy. They promise that people can speak with a bot much more frequently and cheaper than they can talk to a trained therapist, and that this approach is effective and safe.
Many psychiatrists and psychologists have shared this vision, noting that less than half [of people with mental disorders]receive therapy and those who do may only get 45 minutes per week. Researchers have attempted to develop tech that would allow more people to access therapy. However, they have been hindered by two factors.
A therapy bot that says something wrong could cause real harm. Many researchers have created bots that are explicitly programmed: The software uses a bank of approved responses. (This was the case for Eliza, an early mock-psychotherapist program). This makes them less interesting to chat with and people lose their interest. The second issue is that the hallmarks of good therapeutic relationships–shared goals and collaboration–are hard to replicate in software.
In 2019 as early large-language models like OpenAI’s GPT began to take shape, researchers at Dartmouth believed that generative AI could help overcome these obstacles. They began building an AI model that would give evidence-based answers. They built it first from mental-health discussions pulled from Internet forums. Then, they looked at thousands of hours worth of transcripts from real sessions with psychotherapists.
In an interview, Michael Heinz said, “We got a bunch of ‘hmmhmms’, ‘go-ons’, and then a statement like ‘Your mother’s relationship is the root of your problems’.” Michael Heinz is a research psychiatrist and first author at Dartmouth College/Dartmouth Health. “Really tropes about what psychotherapy should be, instead of what we would want.”
Unsatisfied with the results, they began to assemble their own custom data set based on evidence based practices. In contrast, many AI therapy bots are just slight variations of foundational models like Meta’s Llama. They have been trained primarily on internet conversations. This is problematic, especially when it comes to topics like disordered food consumption.
Heinz says that if you say you want to lose some weight, they will support you, even if your weight is low to begin with. A human therapist would not do that.
The researchers tested the bot in an eight-week trial with 210 participants, who had symptoms of generalized anxiety disorder (GAD) or depression. They also included those at high risk of eating disorders. A control group was not given access to Therabot. About half of the participants had Therabot. Participants responded to the AI’s prompts and initiated conversations. They averaged about 10 messages a day.
Participants who had depression experienced a 51% decrease in symptoms. This was the best result of the study. The anxiety group saw a reduction of 31%, while those at risk of eating disorders saw a reduction of 19% in their concerns about weight and body image. These measurements are based upon self-reporting via surveys, which is not perfect but is still one of the most effective tools available to researchers. Heinz says that these results are similar to those found in randomized controlled trials of psychotherapy involving 16 hours of human-provided therapy, but Therabot achieved it in half the time. “I have been working in digital therapy for a long period of time and I’ve never experienced levels of engagement at this level,” says Heinz.
Jean-Christophe Belisle-Pipon is an assistant professor at Simon Fraser University, who has written about AI therapy bots but was not involved in this research. He says that the results are impressive, but that, like any clinical trial, they don’t necessarily reflect how the treatment will act in the real-world.
He wrote in an email that “we remain far from a greenlight” for widespread clinical deployment.
The issue of supervision is one that may arise from a wider deployment. Heinz said that he personally monitored all messages from participants who had consented to this arrangement at the beginning of the trial to look out for any problematic responses from the robot. If therapy bots required this level of oversight, they would not be able reach as many people.
Heinz was asked if the results validated the burgeoning AI therapy sites industry.
He says “quite the opposite”cautioning that many don’t appear train their models using evidence-based practices such as cognitive behavioral therapy and they don’t likely employ a trained team of researchers to monitor interactions. “I’m concerned about the industry, and how quickly we’re moving forward without really evaluating this,” Heinz adds. Heinz says that when AI sites advertise their services in a clinically-validated context, they fall under the Food and Drug Administration’s regulatory jurisdiction. The FDA has not yet targeted many of these sites. Heinz says that if it did, “my suspicions are that almost none–probably none–of them-that operate in this space–would be able to get a claim approval”–that’s, a ruling supporting their claims about the services provided.
Belisle Pipon says that if digital therapies aren’t integrated into insurance and health-care systems, they will be severely limited in their reach. People who would benefit more from these types of AI might instead seek emotional bonds and therapies from AI that is not designed for this purpose (in fact, new research by OpenAI indicates that interactions with their AI models have an impact on emotional wellbeing).
It is likely that many people will continue to use affordable, nontherapeutic bots, such as ChatGPT and Character.AI, for everyday needs, from generating recipes to managing mental health, he wrote.