Post #1544

@huggingface

Hugging Face

Visualizzazioni26Numero di visualizzazioni

Pubblicato21 ott21/10/2025, 03:56

Contenuto del post

Contenuto

Hugging Face (Twitter) RT @eliebakouch: DeepSeek-OCR has some weird architectural choices for the LLM decoder: DeepSeek3B-MoE-A570M -> uses MHA, no MLA (not even GQA?) -> 2 shared experts (like DeepSeek V2, but V3 only has 1) -> quite low sparsity, activation ratio is 12.5%. For V3 it’s 3.52%, for V2 it’s 5% -> not very deep, 12 layers