Meta returns to open source AI with Omnilingual ASR models that can transcribe 1,600+ languages natively

Meta has unveiled a groundbreaking speech recognition system supporting over 1,600 languages-vastly surpassing OpenAI’s Whisper model, which accommodates only 99 languages.

This innovative architecture empowers developers to extend language support even further. Utilizing a technique known as zero-shot in-context learning, users can input a handful of paired audio and text samples in a previously unsupported language during inference. This enables the model to transcribe additional speech in that language without requiring any retraining.

Consequently, the system’s potential coverage expands to more than 5,400 languages, encompassing nearly every spoken language with a documented writing system.

Unlike traditional static models, this approach offers a dynamic, adaptable framework that communities can customize themselves. While the 1,600 languages represent the officially trained set, the broader figure highlights Omnilingual ASR’s ability to generalize on demand, making it the most scalable and flexible speech recognition platform available today.

Best of all, Meta has open-sourced this technology under the permissive Apache 2.0 license-unlike previous releases restricted by quasi-open licenses that limited enterprise use without fees. This means researchers, developers, and businesses can freely adopt and implement Omnilingual ASR immediately, including in commercial and large-scale applications.

Launched on November 10, Meta’s Omnilingual ASR suite comprises a collection of speech recognition models, a 7-billion parameter multilingual audio representation model, and an extensive speech dataset covering over 350 underrepresented languages. All components are openly accessible under liberal licenses, enabling out-of-the-box speech-to-text transcription.

Advanced Speech-to-Text Capabilities

At its foundation, Omnilingual ASR is designed to convert spoken language into written text, supporting a wide range of applications such as voice assistants, transcription services, subtitle generation, oral history digitization, and accessibility tools for low-resource languages.

Unlike earlier automatic speech recognition (ASR) systems that demanded vast amounts of labeled training data, Omnilingual ASR introduces a zero-shot variant capable of transcribing languages it has never encountered before. By providing just a few paired audio-text examples, users can enable transcription in new languages without the need for extensive datasets or retraining.

Comprehensive Model Architecture

The Omnilingual ASR ecosystem includes several model types trained on over 4.3 million hours of audio spanning more than 1,600 languages:

  • wav2vec 2.0 models for self-supervised speech representation learning, ranging from 300 million to 7 billion parameters
  • CTC-based ASR models optimized for efficient supervised transcription
  • LLM-ASR models combining speech encoders with Transformer-based text decoders to achieve state-of-the-art transcription accuracy
  • LLM-ZeroShot ASR models that enable on-the-fly adaptation to previously unseen languages during inference

All models employ an encoder-decoder framework: raw audio is first transformed into a language-neutral representation, which is then decoded into written text.

Significance of Scale and Coverage

While models like Whisper have advanced speech recognition for widely spoken languages, they fall short in addressing the vast linguistic diversity worldwide. Whisper supports 99 languages, whereas Meta’s Omnilingual ASR:

  • Officially supports over 1,600 languages
  • Can generalize to more than 5,400 languages through zero-shot in-context learning
  • Achieves character error rates below 10% in 78% of supported languages

Notably, the system covers over 500 languages that have never before been included in any ASR model, opening new avenues for communities whose languages have traditionally been excluded from digital technologies.

Context: Meta’s Strategic AI Renewal

The launch of Omnilingual ASR comes at a critical juncture in Meta’s AI evolution, following a turbulent year marked by leadership shifts and mixed success with prior models like Llama 4. Despite Llama 4’s limited enterprise uptake compared to competitors, Meta has pivoted strategically.

With the appointment of Alexandr Wang, former CEO of Scale AI, Meta has embarked on a renewed focus on foundational AI technologies. Omnilingual ASR exemplifies this shift by reestablishing Meta’s leadership in multilingual AI through an open, extensible platform that minimizes barriers to adoption.

Released under a fully permissive license with transparent data sourcing and reproducible training methods, Omnilingual ASR aligns with Meta’s 2025 vision emphasizing “personal superintelligence” and robust AI infrastructure, while deprioritizing the metaverse. This move also coincides with Meta’s resumption of public training data use in Europe, signaling a commitment to global AI competitiveness despite regulatory challenges.

Collaborative Dataset Development

To build this extensive language coverage, Meta collaborated with academic institutions and community organizations across Africa, Asia, and beyond to assemble the Omnilingual ASR Corpus-a 3,350-hour dataset spanning 348 low-resource languages. Contributors were compensated local speakers, and data collection was conducted in partnership with groups such as:

  • African Next Voices: A Gates Foundation-backed consortium including Maseno University (Kenya), University of Pretoria, and Data Science Nigeria
  • Mozilla Foundation’s Common Voice, supported by the Open Multilingual Speech Fund
  • Lanfrica / NaijaVoices, which contributed data for 11 African languages including Igala, Serer, and Urhobo

The dataset emphasizes natural, unscripted speech with culturally relevant, open-ended prompts like “Is it better to have a few close friends or many casual acquaintances? Why?” Transcriptions adhere to established writing systems, with rigorous quality control at every stage.

Performance Metrics and Hardware Requirements

The largest model, omniASR_LLM_7B, demands approximately 17GB of GPU memory for inference, making it suitable for deployment on high-performance hardware. Smaller variants (300 million to 1 billion parameters) can operate on less powerful devices and provide real-time transcription.

Benchmark results demonstrate robust performance even in challenging conditions:

  • Character error rates below 10% in 95% of high- and mid-resource languages
  • Character error rates below 10% in 36% of low-resource languages
  • Strong resilience to noisy environments and unfamiliar domains, especially after fine-tuning

The zero-shot model, omniASR_LLM_7B_ZS, enables transcription of new languages with minimal setup by accepting a few example audio-text pairs and generating transcriptions for subsequent utterances.

Open-Source Accessibility and Developer Support

All models and datasets are distributed under permissive licenses:

  • Apache 2.0 for models and code
  • CC-BY 4.0 for the dataset

Installation is streamlined via PyPI and uvicorn:

pip install omnilingual-asr

Meta also offers:

  • Integration with HuggingFace datasets
  • Pre-configured inference pipelines
  • Language-code conditioning to enhance transcription accuracy

Developers can easily retrieve the list of supported languages through the API:

from omnilingual_asr.models.wav2vec2_llama.lang_ids import supported_langs
print(len(supported_langs))
print(supported_langs)

Transforming Language Inclusion in ASR

Omnilingual ASR redefines speech recognition from a fixed language list to a scalable, community-driven framework. This approach facilitates:

  • Inclusion of underrepresented and endangered languages through community contributions
  • Enhanced digital access for oral traditions and minority language speakers
  • Expanded research opportunities in linguistically diverse environments

Meta underscores ethical engagement by promoting open-source collaboration and partnerships with native-speaking communities. As the Omnilingual ASR research paper notes, “No model can anticipate and include all of the world’s languages in advance, but Omnilingual ASR enables communities to extend recognition with their own data.”

Getting Started with Omnilingual ASR

All tools and resources are publicly accessible:

  • Code and Models
  • Dataset
  • Documentation and Tutorials

Implications for Enterprise Applications

For businesses operating in multilingual or international markets, Omnilingual ASR dramatically lowers the barriers to deploying speech-to-text solutions across diverse languages. Instead of relying on commercial ASR services limited to a handful of high-resource languages, enterprises can integrate an open-source system supporting over 1,600 languages out of the box, with the ability to extend to thousands more via zero-shot learning.

This flexibility is particularly advantageous for sectors such as voice-based customer support, transcription services, accessibility technologies, education, and civic engagement, where local language coverage is often a competitive edge or regulatory requirement. The Apache 2.0 license permits enterprises to fine-tune, deploy, and embed the models within proprietary systems without restrictive conditions.

Ultimately, Omnilingual ASR signals a paradigm shift in the ASR landscape-from centralized, cloud-dependent platforms to open, community-extendable infrastructure. By making multilingual speech recognition more accessible, customizable, and cost-effective, it paves the way for a new generation of speech applications centered on linguistic diversity and inclusion rather than limitation.

More from this stream

Recomended