Post #15623

@githubtrending

GitHub Trends

Views230Post view count

PostedApr 1604/16/2026, 12:00 PM

Post content

#python DFlash is a lightweight block diffusion model that speeds up large language models like Qwen3.5 and Llama through speculative decoding, generating draft tokens in parallel for over 6x faster inference with no quality loss—up to 2.5x better than top methods. It supports easy integration with vLLM, SGLang, Transformers, or MLX via simple installs and commands, with ready models on Hugging Face. You benefit by running quicker AI generation on your hardware, boosting throughput to ~430 tokens/second and GPU use over 90% for efficient tasks like math or coding. https://github.com/z-lab/dflash