Post #535

@MachineLearningResearch

AML

Views34Post view count

PostedDec 812/08/2025, 10:55 AM

Post content

A transformer's attention could be 99 % sparser without losing its smarts A new research from MPI-IS, Oxford, and ETH Zürich shows it can A simple post-training method strips away redundant connections, revealing a cleaner, more interpretable circuit This suggests much of the computation we rely on is just noise