⚡️Qwen3-VL: выпустили технический отчёт по новой линейке VLM
Опубликован tech report по Qwen3-VL - мультимодальным моделям, работающим с изображениями и текстом.
Кратко :
- Три модели собрали 1M+ загрузок за месяц.
- Qwen3-VL-8B - более 2M скачиваний.
- Линейка развивает идеи Qwen2.5-VL (2800+ цитирований).
Что описано в отчёте:
- Архитектура vision–language модели.
- Процесс обучения: pretraining + post-training.
- Источники данных и методы фильтрации.
- Сравнения с другими VLM и ключевые метрики.
🔗 PDF: https://arxiv.org/pdf/2511.21631
🔗Видео: https://www.youtube.com/watch?v=clwFmuJX_wQ
@ai_machinelearning_big_data
#Qwen#Qwen3#QwenVL#Qwen3VL#LLM#AIModel
🌟 AI Sunday Wonders: Meet TinyLlama, the 550MB AI Model Trained on 3 Trillion Tokens
Hello, everyone! In the world of AI, smaller models are gaining immense popularity due to their efficiency on edge devices with limited memory and processing power. Enter TinyLlama, a groundbreaking project led by a research assistant at Singapore University of Technology and Design.
Despite its tiny 550MB size, TinyLlama is pre-trained on a massive three trillion tokens. This compact model holds great promise for various applications, including real-time machine translation without the need for an internet connection.
The project aims to complete the training of this 1.1 billion Llama model in just 90 days, utilizing 16 A100-40G GPUs. You can track its progress and loss metrics in real-time.
TinyLlama shares the same architecture and tokenizer as Meta's Llama 2, making it compatible with open-source projects built on Llama.
TinyLlama joins the league of smaller language models like Pythia-1b and MPT-1b, offering developers efficient options for creating cutting-edge AI applications.
#TinyLlama#AIModel#AIResearch#MachineLearning#AIInnovation#TinyButMighty