Post #1742

@huggingface

Hugging Face

Visualizzazioni16Numero di visualizzazioni

Pubblicato19 nov19/11/2025, 23:13

Contenuto del post

Contenuto

Hugging Face (Twitter) RT @RedHat_AI: We’re open-sourcing a set of high quality speculator models for Llamas, Qwens, and gpt-oss on Hugging Face. In real workloads, you can expect 1.5 to 2.5x speedups and sometimes more than 4x. Here’s how this fits into the bigger story for speculative decoding. A thread 🧵: