Contenuto
Hugging Face (Twitter) RT @Alibaba_Qwen: 🚀 Introducing Qwen3-Omni — the first natively end-to-end omni-modal AI unifying text, image, audio & video in one model — no modality trade-offs! 🏆 SOTA on 22/36 audio & AV benchmarks 🌍 119L text / 19L speech in / 10L speech out ⚡ 211ms latency | 🎧 30-min audio understanding 🎨 Fully customizable via system prompts 🔗 Built-in tool calling 🎤 Open-source Captioner model (low-hallucination!) 🌟 What’s Open-Sourced? We’ve open-sourced Qwen3-Omni-30B-A3B-Instruct, Qwen3-Omni-30B-A3B-Thinking, and Qwen3-Omni-30B-A3B-Captioner, to empower developers to explore a variety of applications from instruction-following to creative tasks. Try it now 👇 💬 Qwen Chat: https://chat.qwen.ai/?models=qwen3-omni-flash 💻 GitHub: github.com/QwenLM/Qwen3-Omni 🤗 HF Models: https://huggingface.co/collections/Qwen/qwen3-omni-68d100a86cd0906843ceccbe 🤖 MS Models: https://modelscope.cn/collections/Qwen3-Omni-867aef131e7d4f 🎬 Demo: https://huggingface.co/spaces/Qwen/Qwen3-Omni-Demo