TGTGInsightintelligence telegramLIVE / telegram public index
← Hugging Face
Hugging Face avatar

TGINSIGHT POST

Post #1544

@huggingface

Hugging Face

Visualizzazioni26Numero di visualizzazioni
Pubblicato21 ott21/10/2025, 03:56
Contenuto del post

Contenuto

Hugging Face (Twitter) RT @eliebakouch: DeepSeek-OCR has some weird architectural choices for the LLM decoder: DeepSeek3B-MoE-A570M -> uses MHA, no MLA (not even GQA?) -> 2 shared experts (like DeepSeek V2, but V3 only has 1) -> quite low sparsity, activation ratio is 12.5%. For V3 it’s 3.52%, for V2 it’s 5% -> not very deep, 12 layers