TGTGInsighttelegram intelligenceLIVE / telegram public index
Post content
Post content
#python DFlash is a lightweight block diffusion model that speeds up large language models like Qwen3.5 and Llama through speculative decoding, generating draft tokens in parallel for over 6x faster inference with no quality loss—up to 2.5x better than top methods. It supports easy integration with vLLM, SGLang, Transformers, or MLX via simple installs and commands, with ready models on Hugging Face. You benefit by running quicker AI generation on your hardware, boosting throughput to ~430 tokens/second and GPU use over 90% for efficient tasks like math or coding. https://github.com/z-lab/dflash