TGTGInsightintelligence telegramLIVE / telegram public index
Contenuto del post
Contenuto
Hugging Face (Twitter) RT @RedHat_AI: We’re open-sourcing a set of high quality speculator models for Llamas, Qwens, and gpt-oss on Hugging Face. In real workloads, you can expect 1.5 to 2.5x speedups and sometimes more than 4x. Here’s how this fits into the bigger story for speculative decoding. A thread 🧵: