Anthropic’s Claude AI can now end ‘distressing conversations’

August 18, 2025

Anthropic’s latest feature

for two of its Claude AI model could be the end of the AI jailbreaking scene. The company announced its announcement in a Anthropic says that this feature will only be used in “rare, extreme cases of persistently harmful or abusive user interactions.”

To clarify, Anthropic said those two Claude models could exit harmful conversations like “requests from users for sexual content involving minors and attempts to solicit information that would enable large-scale violence or acts of terror.” With Claude Opus 4 or 4.1, these models will only end a conversation according to Anthropic. Anthropic claims that this feature will be used only in”rare, extreme cases of persistently harmful or abusive user interactions.”

To clarify Anthropic said these two Claude models can exit harmful conversations like”requests from users for sexual content involving minors and attempts to solicit information that would enable large-scale violence or acts of terror.”According to Anthropic Claude Opus 4.1 and 4 will only end a discussion “as a last resort when multiple attempts at redirection have failed and hope of a productive interaction has been exhausted,” . Anthropic claims that most users will not experience Claude cutting a discussion short, even if they are discussing highly controversial topics. This feature is reserved for “extreme edge cases.”

Anthropic’s example of Claude terminating a conversation.

(Anthropic).

When Claude terminates a chat, the users cannot send new messages, but they can immediately start a new conversation. Anthropic said that ending a conversation won’t affect any other chats. Users can edit or retry messages in the past to steer the conversation towards a new direction.

This move is part of Anthropic’s research program which studies the idea AI welfare. The idea of anthropomorphizing AI is still a hot topic, but the company believes that the ability to exit “potentially distressing interaction” is a low-cost method to manage risks in AI welfare. Anthropic continues to experiment with this feature, and encourages users to provide feedback if they encounter a similar scenario.