Baichuan AI Launches Open-Source Full-Modal Model Omni-1.5

On January 26th, Baichuan AI has announced the official launch of the Baichuan-Omni 1.5 open-source full modal model. This model not only supports the full-modal understanding and generation of text, images and audio, but also has dual-modal generation capabilities for text and audio.

Officially, Baichuan-Omni 1.5 has superior performance in areas like visual, speech and multi-modal stream processing. In the field of medical multi-modal applications, it also has a greater leading advantage.

Baichuan-Omni 1.5 is capable of performing a wide range of interactive operations both at the input and output end. It also has powerful multi-modal reasoning abilities and cross-modal transfer abilities.

It is an end-to end solution in audio technology that supports multilingual conversations. It also has automatic speech recognition and text-tospeech conversion features.

According to reports, Baichuan – Omni-1.5 has significantly outperformed GPT-4o Mini in terms of video comprehension capabilities by optimizing multiple key factors such as encoders and training data.

Baichuan-Omni 1.5 supports a variety of modalities for the model input section by converting them into a large language models using Encoders/Tokenizers.

Baichuan-Omni 1.5 adopts a design for text-audio interleaved outputting, generating both text and audio simultaneously using Text Tokenizer, and Audio Decoder.

Baichuan AI built a massive database with 340 million images/videos/texts of high quality and almost 1 million hours audio data using 17 millions full-modal data in the SFT phase.

Sign up for 5 free articles a month !

www.aiobserver.co

More from this stream

Recomended