Contenuto
Hugging Face (Twitter) RT @Alibaba_Qwen: Qwen3-ASR and Qwen3-ForcedAligner are now open source — production-ready speech models designed for messy, real-world audio, with competitive performance and strong robustness. ● 52 languages & dialects with auto language ID (30 languages + 22 dialects/accents) ● Robust in noisy and complex settings (yes, singing and songs too) ● Long audio support: up to 20 minutes per pass ● Word/phrase-level timestamps: high-precision alignment for 11 languages via Qwen3-ForcedAligner, stronger than MFA/CTC/CIF-style aligners Also included: a full open-source inference & finetuning stack with vLLM batch, streaming, and async serving. GitHub: github.com/QwenLM/Qwen3-ASR Hugging Face: https://huggingface.co/collections/Qwen/qwen3-asr ModelScope: https://modelscope.cn/collections/Qwen/Qwen3-ASR Hugging Face Demo: https://huggingface.co/spaces/Qwen/Qwen3-... Перейти на оригинальный пост