OpenAI

Worldcoin Crackdown in Kenya Marks a Turning Point for Digital Rights

AI Observer
News

OpenAI’s Deep Research is more accurate than you in fact-finding, but...

AI Observer
News

OpenAI releases new simulated reason models with full access to tools

AI Observer
News

xAI adds a memory feature to Grok

AI Observer
News

Claude has just acquired superpowers. Anthropic AI can now search through...

AI Observer
News

OpenAI names new nonprofit ‘advisors’

AI Observer
News

ChatGPT 4.1 Early Benchmarks compared to Google Gemini

AI Observer
News

OpenAI launches its flagship AI model, the GPT-4.1

AI Observer
News

OpenAI plans to phase-out GPT-4.5 from its API

AI Observer
News

OpenAI’s new GPT-4.1 AI models focus on coding

AI Observer
News

Netflix is testing out a new OpenAI powered search

AI Observer

Featured

Healthcare and Biotechnology

OpenAI Releases HealthBench: An Open-Source Benchmark for Measuring the Performance and...

AI Observer
Education

RL^V: Unifying Reasoning and Verification in Language Models through Value-Free Reinforcement...

AI Observer
News

Implementing an LLM Agent with Tool Access Using MCP-Use

AI Observer
News

A Step-by-Step Guide to Deploy a Fully Integrated Firecrawl-Powered MCP Server...

AI Observer
AI Observer

OpenAI Releases HealthBench: An Open-Source Benchmark for Measuring the Performance and...

OpenAI has released HealthBench, an open-source evaluation framework designed to measure the performance and safety of large language models (LLMs) in realistic healthcare scenarios. Developed in collaboration with 262 physicians across 60 countries and 26 medical specialties, HealthBench addresses the limitations of existing benchmarks by focusing on real-world applicability,...