Baidu unveils proprietary ERNIE 5 beating GPT-5 performance on charts, document understanding and more

    0

    Just hours after OpenAI unveiled enhancements to its premier foundation model-highlighting reduced token consumption and a more engaging, customizable user experience-China’s leading search engine Baidu responded with a comprehensive suite of AI innovations and strategic moves aimed at global expansion.

    The company’s objective is clear: to establish itself as a formidable player in the fiercely competitive enterprise AI arena worldwide.

    Introducing ERNIE 5.0: Baidu’s Next-Generation Multimodal AI

    Unveiled at Baidu World 2025, ERNIE 5.0 is Baidu’s proprietary, natively multimodal foundation model engineered to seamlessly process and generate content across diverse data types including text, images, audio, and video. This integrated approach contrasts with many models that rely on separate modality-specific encoders followed by fusion.

    Unlike Baidu’s recently launched open-source model, which is distributed under the permissive Apache 2.0 license, ERNIE 5.0 remains a closed-source offering accessible exclusively through Baidu’s official platform. Users must manually select it from the model dropdown on Baidu’s AI service portal.

    Alongside ERNIE 5.0, Baidu rolled out significant upgrades to its digital human technology, no-code AI development tools, and versatile AI agents-all designed to extend its AI ecosystem beyond China’s borders.

    Additionally, Baidu introduced ERNIE 5.0 Preview 1022, a variant fine-tuned for text-heavy applications, complementing the general-purpose multimodal preview model.

    CEO Robin Li emphasized the transformative potential of ERNIE 5.0, stating, “Embedding AI as a native capability turns intelligence from a cost center into a productivity engine.”

    Performance Highlights: How ERNIE 5.0 Compares to Leading Western Models

    Benchmark data presented at Baidu World 2025 suggests ERNIE 5.0 rivals, and in some cases surpasses, top-tier Western foundation models such as OpenAI’s GPT-5-High and Google’s Gemini 2.5 Pro across a variety of complex tasks.

    Specifically, ERNIE 5.0 excelled in multimodal reasoning, document comprehension, and image-based question answering, while also demonstrating robust capabilities in language understanding and code execution.

    Its native multimodal architecture enables joint input-output processing across different data types, a technical edge over models that perform modality fusion only after separate encoding.

    On visual benchmarks like OCRBench, DocVQA, and ChartQA-which assess document recognition, understanding, and structured data reasoning-ERNIE 5.0 outperformed both GPT-5-High and Gemini 2.5 Pro. These strengths are particularly relevant for enterprise use cases such as automated document workflows and financial data analysis.

    In image generation, ERNIE 5.0 matched or exceeded Google’s Veo3 model in semantic alignment and image fidelity, benefiting from its integrated multimodal design that enhances contextual awareness.

    For audio and speech tasks, ERNIE 5.0 showed competitive results on benchmarks like MM-AU and TUT2017, including spoken language question answering, indicating a broad multimodal capability footprint.

    In language-centric tasks-covering instruction following, factual Q&A, and mathematical reasoning-the model delivered strong performance. The Preview 1022 variant, optimized for text, demonstrated even higher proficiency, particularly excelling in Chinese-language tasks and narrowing the gap with leading English-language models.

    While Baidu has not publicly disclosed exhaustive benchmark scores, the company positions ERNIE 5.0 as a flagship model capable of general-purpose reasoning on par with the largest closed-source systems.

    Baidu’s distinct advantage lies in its superior structured document understanding, visual chart reasoning, and seamless multimodal integration within a unified modeling framework. Independent validation is awaited, but these capabilities position ERNIE 5.0 as a compelling contender in the global multimodal AI landscape.

    Pricing Strategy Tailored for Enterprise Clients

    ERNIE 5.0 is marketed at the premium tier within Baidu’s AI model lineup. The company has published API pricing on its Qianfan platform, aligning costs with other leading Chinese AI providers such as Alibaba.

    Model Input Cost (per 1,000 tokens) Output Cost (per 1,000 tokens)
    ERNIE 5.0 $0.00085 (¥0.006) $0.0034 (¥0.024)
    ERNIE 4.5 Turbo (example) $0.00011 (¥0.0008) $0.00045 (¥0.0032)
    Qwen3 (Coder example) $0.00085 (¥0.006) $0.0034 (¥0.024)

    This pricing differential highlights Baidu’s strategy to distinguish between high-throughput, cost-efficient models and advanced, multimodal systems designed for complex enterprise workloads.

    When compared to U.S.-based alternatives, ERNIE 5.0’s pricing is positioned in the mid-range segment:

    Model Input Cost (per 1M tokens) Output Cost (per 1M tokens)
    GPT-5.1 $1.25 $10.00
    ERNIE 5.0 $0.85 $3.40
    ERNIE 4.5 Turbo (example) $0.11 $0.45
    Claude Opus 4.1 $15.00 $75.00
    Gemini 2.5 Pro $1.25 (≤200k tokens) / $2.50 (>200k tokens) $10.00 (≤200k tokens) / $15.00 (>200k tokens)
    Grok 4 (grok-4-0709) $3.00 $15.00

    Expanding Horizons: Baidu’s Global AI Ecosystem

    Coinciding with ERNIE 5.0’s debut, Baidu is accelerating its international footprint through several key product launches and platform enhancements:

    • GenFlow 3.0: Boasting over 20 million users, this general-purpose AI agent now features improved memory capabilities and enhanced multimodal task management.
    • Famou: A self-adaptive AI agent designed to autonomously tackle complex challenges, now available commercially by invitation.
    • MeDo: The global iteration of Baidu’s no-code AI builder Miaoda, accessible worldwide via web platforms.
    • Oreate: A productivity suite supporting documents, presentations, images, videos, and podcasts, with a user base exceeding 1.2 million globally.

    Baidu’s digital human technology, already deployed in markets like Brazil, is integral to this global push. During China’s recent “Double 11” shopping festival, 83% of livestream hosts utilized Baidu’s digital human solutions, contributing to a 91% surge in gross merchandise volume (GMV).

    Moreover, Baidu’s autonomous ride-hailing service, Apollo Go, has surpassed 17 million rides across 22 cities, establishing itself as the world’s largest robotaxi network.

    Open-Source Multimodal Model: Driving Industry Innovation

    Just days before ERNIE 5.0’s launch, Baidu released an open-source multimodal vision-language model under the Apache 2.0 license, designed to democratize access to advanced AI capabilities.

    This model activates only 3 billion parameters during inference out of a total 28 billion, leveraging a Mixture-of-Experts (MoE) architecture to optimize efficiency.

    Notable features include:

    • “Thinking with Images” technology enabling dynamic zoom and detailed visual analysis.
    • Comprehensive support for chart interpretation, document understanding, visual grounding, and temporal reasoning in video content.
    • Capability to run on a single 80GB GPU, making it accessible to mid-sized enterprises and research groups.
    • Full compatibility with popular AI frameworks such as Transformers, vLLM, and Baidu’s FastDeploy toolkit.

    This open-source release intensifies competition with proprietary models by offering a high-performance foundation model free from restrictive licensing, a rarity in this class of AI systems.

    Community Insights and Baidu’s Developer Engagement

    Following ERNIE 5.0’s rollout, AI researcher and evaluator Lisan al Gaib (@scaling01) shared initial enthusiasm for the model’s benchmark achievements but noted a recurring issue: the model persistently triggered tool invocations during SVG generation tasks despite explicit instructions to avoid doing so.

    “ERNIE 5.0’s benchmarks looked impressive until I tested it… unfortunately, it seems to have reinforcement learning glitches or serious problems with its chat system prompt,” Lisan commented.

    Baidu’s developer support team responded promptly, acknowledging the bug and advising temporary workarounds while a fix is underway:

    “Thank you for the feedback! This is a known issue triggered by certain syntax. We are actively working on a solution. Meanwhile, rephrasing your prompts may help avoid the problem.”

    This swift response underscores Baidu’s growing commitment to developer relations, especially as it seeks to attract international users through both proprietary and open-source AI offerings.

    Looking Ahead: Baidu’s Ambitions in the Foundation Model Landscape

    ERNIE 5.0 represents a significant leap forward in Baidu’s quest to compete on the global stage of foundation models. By delivering performance that rivals leading systems from OpenAI and Google, coupled with a dual strategy of premium APIs and open-source releases, Baidu aims to become a trusted provider of AI infrastructure worldwide.

    As enterprise customers increasingly demand multimodal capabilities, flexible licensing, and efficient deployment, Baidu’s bifurcated approach may broaden its appeal across diverse user segments-from large corporations to independent developers.

    While independent validation of Baidu’s performance claims is pending, ERNIE 5.0 and its expanding ecosystem position the company strongly amid rising AI complexity, escalating costs, and compute resource constraints shaping the future of AI adoption.

    Exit mobile version