TGTGInsighttelegram intelligenceLIVE / telegram public index
← AI Post — Artificial Intelligence
AI Post — Artificial Intelligence avatar

TGINSIGHT POST

Post #6327

@aiposted

AI Post — Artificial Intelligence

Visninger4,980Antal visninger
Publiceret17. mar.17.03.2026, 18.18
Indhold

Opslagsindhold

🔥 Kimi AI introduces Attention Residuals, a new way to rethink how neural networks use past layers Researchers at Moonshot AI just proposed a new architecture tweak that could make large AI models more efficient and smarter about how they use information from earlier layers. Instead of the traditional residual connections used in deep networks, they introduce Attention Residuals, a system where each layer can selectively attend to representations from previous layers. Here’s what’s new: Attention over past layers: • Traditional residuals simply add outputs from earlier layers in a fixed way. • Attention Residuals let the model dynamically choose which earlier layers matter for a given input. Solves depth dilution: • In very deep models, useful information from earlier layers can get diluted. • Attention-based retrieval allows the network to pull specific past representations when needed. Block AttnRes for scale: • Layers are grouped into compressed blocks so cross-layer attention remains computationally practical. Efficient in practice: • Reported 1.25× compute advantage • <2% extra inference latency, meaning almost no slowdown. Tested on the Kimi Linear model: • Evaluated on 48B parameter architecture (3B activated parameters). • Shows consistent downstream performance improvements. Source. @aipost🏴