Anthropic
Featured
NVIDIA Introduces ProRL: Long-Horizon Reinforcement Learning Boosts Reasoning and Generalization
Recent advances in reasoning-focused language models have marked a major change in AI by scaling test-time computation. Reinforcement learning (RL) is crucial in developing reasoning capabilities and mitigating reward hacking pitfalls. However, a fundamental debate remains: whether RL provides new reasoning capabilities from a base model or just helps...