TGTGInsighttelegram intelligenceLIVE / telegram public index
Post content
Post content
A transformer's attention could be 99 % sparser without losing its smarts A new research from MPI-IS, Oxford, and ETH Zürich shows it can A simple post-training method strips away redundant connections, revealing a cleaner, more interpretable circuit This suggests much of the computation we rely on is just noise