Latvian language technology company Tilde has unveiled TildeOpen LLM, an open-source foundational large language model specifically designed to support European languages, with a particular emphasis on lesser-known and regional tongues. This initiative marks a significant advancement toward promoting linguistic fairness and enhancing digital autonomy across the European Union.
Technical Foundations: Model Design, Training Process, and Oversight
- Released publicly on September 3, 2025, TildeOpen is accessible at no cost through the Hugging Face platform.
- The model architecture is a 30-billion-parameter dense decoder-only transformer, distributed under the permissive CC-BY-4.0 license. It supports a wide array of languages, including Latvian, Lithuanian, Ukrainian, Turkish, and several others.
- Training was conducted on the EU’s cutting-edge supercomputers: LUMI in Finland and JUPITER, utilizing an impressive allocation of 2 million GPU hours granted by the European Commission’s Large AI Grand Challenge.
- Leveraging EleutherAI-inspired GPT-NeoX training scripts, the model underwent approximately 450,000 update cycles, processing around 2 trillion tokens. The training regimen involved a three-phase sampling strategy: an initial uniform distribution across languages, followed by a natural distribution to emphasize languages with abundant data, and concluding with a uniform pass to ensure balanced representation.
- Key hyperparameters include 60 transformer layers, an embedding dimension of 6144, 48 attention heads, an 8192-token context window, SwiGLU activation functions, Rotary Positional Encoding (RoPE), and RMSNorm normalization layers.
Championing Linguistic Diversity and Data Sovereignty
- Most prevailing language models disproportionately favor English and other dominant languages, resulting in subpar performance-such as grammatical errors, unnatural phrasing, and hallucinations-when processing Baltic, Slavic, and other smaller European languages.
- TildeOpen addresses these challenges by integrating an “equitable tokenizer”, which standardizes token representation across languages. This innovation reduces token counts and enhances inference efficiency, particularly for underrepresented languages.
- Importantly, organizations have the option to self-host the model within local data centers or secure, EU-compliant cloud environments. This capability ensures strict compliance with GDPR and other data protection regulations, mitigating concerns related to reliance on US- or Asia-based AI services.
Future Outlook: Building a Pan-European AI Ecosystem
- TildeOpen serves as a foundational “base” model, paving the way for future iterations tailored to specific applications, such as instruction-tuned translation systems and domain-specific AI tools.
- This release symbolizes a strategic milestone for Latvia, positioning the country as a technology exporter and a key contributor to the expansion of European AI infrastructure while safeguarding linguistic plurality.
- From a research perspective, TildeOpen contributes to ongoing investigations into multilingual model performance. Despite advances, even leading open-source LLMs exhibit occasional hallucinations and lexical inaccuracies in Baltic languages, underscoring the necessity for localized AI development.
Conclusion
TildeOpen LLM redefines the European AI landscape by emphasizing not only regulatory adherence but also proactive technical stewardship. It offers a robust, transparent, and scalable solution committed to linguistic inclusivity, delivering practical value without succumbing to hype.
Frequently Asked Questions
Q1: What exactly is TildeOpen LLM?
TildeOpen is a 30-billion-parameter multilingual large language model trained on European supercomputing infrastructure, optimized to support a broad spectrum of European languages, especially those that are underrepresented.
Q2: How does TildeOpen differ from other large language models?
Unlike many global models that prioritize English, TildeOpen employs an equitable tokenizer and balanced training methodology to ensure fair and accurate language representation across smaller European languages.
Q3: Is it possible to self-host TildeOpen?
Yes. Released under the CC-BY-4.0 license, TildeOpen can be deployed on-premises or within EU-compliant cloud environments, facilitating compliance with GDPR and data sovereignty requirements.
Q4: What are the primary applications for TildeOpen?
The model is well-suited for government services, translation, education, AI-powered assistants, speech recognition technologies, and multilingual customer support-essentially any domain requiring precise European language processing.

