Throughout 2025, the leading edge of open-weight language models has been shaped less by Silicon Valley or New York and more by innovation hubs in Beijing and Hangzhou.
Chinese tech giants such as Alibaba and other prominent research institutions have accelerated the development of large-scale, open Mixture-of-Experts (MoE) models, frequently releasing them under permissive licenses and achieving top-tier benchmark results. Although OpenAI introduced its own open-source, general-purpose large language model (LLM) this summer, adoption outside China has been comparatively modest.
In response, a small American startup is making a significant push to reclaim leadership in this space.
Today marks the launch of Trinity Mini and Trinity Nano Preview, the inaugural models in Arcee AI’s new “Trinity” series-an open-weight MoE model family entirely trained on U.S. soil. These models are accessible for direct interaction via a chatbot on Arcee’s website, and developers can freely download, modify, and fine-tune both models under an enterprise-friendly Apache 2.0 license.
While these models are smaller than the largest global counterparts, their release is a notable effort by a U.S.-based company to build comprehensive open-weight models from the ground up, leveraging American infrastructure and a carefully curated domestic dataset.
“I’m filled with immense pride for my team’s achievement, though utterly exhausted,” shared Lucas Atkins, Arcee’s Chief Technology Officer, reflecting on the launch. “Especially with Mini.”
Meanwhile, a third model, Trinity Large-a 420 billion parameter MoE model with 13 billion active parameters per token-is currently in training and slated for release in January 2026.
Atkins emphasized the vision behind Trinity: “We aim to fill a gap by delivering a serious open-weight model family, fully trained in the U.S., that businesses and developers can truly own.”
From Compact Beginnings to Ambitious Scale
Arcee AI’s Trinity initiative represents a strategic evolution from its previous focus on smaller, enterprise-oriented models. Having secured $29.5 million in funding-including a 2024 round led by Emergence Capital-the company’s earlier offerings included a compact instruct-tuned model launched in mid-2025 and a 70 billion parameter instruction-following model designed for secure in-VPC enterprise deployment.
These earlier models primarily addressed regulatory compliance and cost challenges associated with proprietary LLMs in corporate environments.
With Trinity, Arcee is setting its sights on full-stack pretraining of open-weight foundational models, engineered for extended context understanding, synthetic data integration, and future compatibility with live retraining systems.
Initially conceived as a precursor to Trinity Large, both Mini and Nano evolved from early sparse modeling experiments into production-ready models.
Innovative Architecture and Technical Features
Trinity Mini is a 26 billion parameter model activating 3 billion parameters per token, optimized for high-throughput reasoning, function invocation, and tool integration. Trinity Nano Preview, a more experimental 6 billion parameter model with approximately 800 million active non-embedding parameters, focuses on chat applications with a distinct personality but somewhat reduced reasoning robustness.
Both models employ Arcee’s proprietary Attention-First Mixture-of-Experts (AFMoE) architecture, which innovatively combines global sparsity, local and global attention mechanisms, and gated attention techniques.
Drawing inspiration from recent breakthroughs by DeepSeek and Qwen, AFMoE diverges from conventional MoE designs by tightly coupling sparse expert routing with an enhanced attention framework. This includes grouped-query attention, gated attention, and a hybrid local/global attention pattern that significantly improves long-context reasoning.
To illustrate, traditional MoE models function like a call center with 128 specialized agents (“experts”), where only a select few are consulted per query to conserve resources. AFMoE refines this by employing sigmoid routing-a smooth, volume-dial-like mechanism-allowing the model to blend multiple expert opinions more fluidly rather than making binary expert selections.
The “attention-first” aspect prioritizes how the model allocates focus across different parts of a conversation, akin to how a reader might remember certain passages of a novel more vividly based on their relevance or emotional weight. AFMoE balances local attention (recent inputs) with global attention (key earlier points) to maintain coherence over extended dialogues.
Gated attention acts as a dynamic volume control on each attention output, enabling the model to emphasize or suppress information selectively, much like moderating voices in a group discussion.
These innovations collectively enhance training stability and computational efficiency, enabling the model to handle longer conversations, reason more effectively, and operate faster without requiring exorbitant computing power.
Unlike many MoE implementations, AFMoE prioritizes depth stability and training efficiency through techniques such as sigmoid-based routing without auxiliary loss and depth-scaled normalization, facilitating scaling without divergence.
Performance and Functional Strengths
Trinity Mini features 128 experts with 8 active per token plus one always-on shared expert, supporting context windows up to 131,072 tokens depending on the deployment provider.
Benchmark results demonstrate Trinity Mini’s competitive edge against larger models in reasoning tasks, surpassing gpt-oss on several key evaluations:
- MMLU (zero-shot): 84.95
- Math-500: 92.10
- GPQA-Diamond: 58.55
- BFCL V3: 59.67
Latency and throughput metrics from providers like Together and Clarifai reveal over 200 tokens per second throughput with end-to-end latency under three seconds, making Trinity Mini suitable for interactive applications and agent workflows.
Trinity Nano, while smaller and less robust on edge cases, validates the feasibility of sparse MoE architectures with fewer than 1 billion active parameters per token.
Availability, Pricing, and Ecosystem Support
Both Trinity Mini and Nano are distributed under the permissive Apache 2.0 license, enabling unrestricted commercial and research use. Trinity Mini is accessible through multiple platforms, including API endpoints and direct downloads.
API pricing for Trinity Mini is structured as follows:
- $0.045 per million input tokens
- $0.15 per million output tokens
- A limited-time free tier is available via OpenRouter
The models are already integrated into applications such as Benchable.ai, Open WebUI, and SillyTavern, and enjoy support within popular frameworks like Hugging Face Transformers, VLLM, LM Studio, and llama.cpp.
Data Integrity and Curation: The Role of DatologyAI
A cornerstone of Arcee’s strategy is meticulous control over training data, contrasting with many open models that rely on web-scraped or legally ambiguous datasets. This is where DatologyAI, a data curation startup co-founded by former Meta and DeepMind researcher Ari Morcos, plays a pivotal role.
DatologyAI’s platform automates data filtering, deduplication, and quality enhancement across multiple data types, ensuring Arcee’s training corpus is free from noisy, biased, or copyright-sensitive content.
For Trinity, DatologyAI assembled a 10 trillion token training curriculum divided into three phases: 7 trillion tokens of general data, 1.8 trillion tokens of high-quality text, and 1.2 trillion tokens focused on STEM subjects, including mathematics and programming.
This partnership, which also supported Arcee’s AFM-4.5B model, scaled significantly in both volume and complexity for Trinity. Arcee credits DatologyAI’s filtering and data-ranking tools with enabling clean scaling and improved performance on tasks such as math, question answering, and agent tool use.
DatologyAI also contributes synthetic data generation, producing over 10 trillion synthetic tokens paired with 10 trillion curated web tokens to form a 20 trillion token dataset for Trinity Large’s training.
Robust U.S. Infrastructure: Prime Intellect’s Contribution
Arcee’s capacity to conduct full-scale training domestically is bolstered by its infrastructure partner, Prime Intellect. Founded in early 2024, Prime Intellect aims to democratize AI compute access through a decentralized GPU marketplace and training stack.
While Prime Intellect gained attention for distributed training of INTELLECT-1-a 10 billion parameter model trained across contributors in five countries-its more recent work, including the 106 billion parameter INTELLECT-3, acknowledges that centralized infrastructure remains more efficient for models exceeding 100 billion parameters.
For Trinity Mini and Nano, Prime Intellect provided the orchestration stack, a customized TorchTitan runtime, and a physical compute environment featuring 512 H200 GPUs operating in a bf16 pipeline with high-efficiency HSDP parallelism. It also hosts the 2048 B300 GPU cluster currently training Trinity Large.
This collaboration highlights the distinction between branding and execution: while Prime Intellect’s long-term vision is decentralized compute, its immediate value to Arcee lies in delivering efficient, transparent training infrastructure under U.S. jurisdiction with clear provenance and security controls.
Championing Model Sovereignty in Enterprise AI
Arcee’s commitment to full pretraining reflects a broader conviction that the future of enterprise AI hinges on owning the entire training pipeline-not merely fine-tuning existing models. As AI systems grow more autonomous and adaptive, compliance and control over training objectives will become as critical as raw performance.
“As applications become more sophisticated, the line between ‘model’ and ‘product’ blurs,” Atkins noted in Arcee’s Trinity manifesto. “Building such software requires control over both the model weights and the training process, not just the instruction layer.”
This philosophy distinguishes Trinity from other open-weight initiatives. Rather than modifying third-party base models, Arcee has developed its own-from data curation to deployment infrastructure and optimization algorithms-in partnership with organizations aligned with its vision of openness and sovereignty.
Anticipating the Launch of Trinity Large
Training is underway for Trinity Large, a 420 billion parameter MoE model scaling the AFMoE architecture with an expanded expert set.
The training dataset comprises 20 trillion tokens, evenly split between synthetic data from DatologyAI and carefully curated web data.
Scheduled for release in January 2026, Trinity Large will be accompanied by a comprehensive technical report.
If successful, Trinity Large will stand among the few fully open-weight, U.S.-trained frontier-scale models, positioning Arcee as a formidable contender in the open AI ecosystem at a time when most American LLM projects remain closed or rely on foreign foundations.
Renewed Commitment to U.S.-Based Open Source AI
In an environment where the most advanced open-weight models are increasingly dominated by Chinese research institutions, Arcee’s Trinity launch represents a rare and strategic effort to reclaim leadership in transparent, U.S.-controlled model development.
Supported by specialized partners in data and infrastructure and built from the ground up for long-term adaptability, Trinity exemplifies how smaller, lesser-known companies can still drive innovation and push boundaries in an industry trending toward commoditization and productization.
While the ultimate test will be whether Trinity Large can rival better-funded competitors, the early success of Mini and Nano, combined with a robust architectural foundation, suggests Arcee’s core thesis is gaining traction: that model sovereignty-not just scale-will define the next chapter of AI advancement.
