TGTGInsightintelligence telegramLIVE / telegram public index
Contenuto del post
Contenuto
Hugging Face (Twitter) RT @eliebakouch: wow, this looks like a very solid open model by Xiaomi, competing with K2/DSV3.2 on benchmarks with fewer parameters. it's MIT licensed, with a very good tech report and base/thinking versions available it's using the same sliding window attention arch as gpt-oss (sink with SWA size = 128) but with much fewer global attention layers, multi-token prediction for speculative decoding support, and a new post-training distillation method. really seems like a beast at inference with day 0 @sgl_project support! really exciting https://twitter.com/XiaomiMiMo/status/2000929154670157939#m