#python#deep_learning#inference#llm#nlp#pytorch#transformer
Nano-vLLM is a small, fast, and easy-to-understand tool for running large language models offline. It matches the speed of bigger systems like vLLM but uses only about 1,200 lines of clean Python code, making it simple to read and modify. It includes smart features like prefix caching and tensor parallelism to boost performance. You can install it easily and run models like Qwen3-0.6B on your own GPU. This tool is great if you want fast, efficient AI inference without complex setups, ideal for learning, research, or small deployments on limited hardware.
https://github.com/GeeeekExplorer/nano-vllm
#DL
📱
Zeus New Pytorch Ecosystem Tool
Zeus is an open source toolkit for measuring and optimizing power consumption of deep learning workloads.
🖥Github
-----
Main channel: @repo_science
Coupons: @freecoupons_reposcience
-----
#dl
Park, Chanwook, Sourav Saha, Jiachen Guo, Hantao Zhang, Xiaoyu Xie, Miguel A. Bessa, Dong Qian, et al. 2025. “Unifying Machine Learning and Interpolation Theory via Interpolating Neural Networks.” Nature Communications 16 (1): 1–12.
https://www.nature.com/articles/s41467-025-63790-8
#dl
A few cool ideas in this model.
Introducing Gemma 3n: The developer guide - Google Developers Blog
https://developers.googleblog.com/en/introducing-gemma-3n-developer-guide/
#dl
There is this new lib called scale. One could compile CUDA code to use it on AMD GPU.
https://docs.scale-lang.com/manual/how-to-use/
I don't know who is more pissed off, NVidia or AMD.
#dl
This repo is really nice.
yuanchenyang/smalldiffusion: Simple and readable code for training and sampling from diffusion models
https://github.com/yuanchenyang/smalldiffusion
#dl
Google & USC benchmarked a prompt based forecasting method, and the results are amazing.
Cao D, Jia F, Arik SO, Pfister T, Zheng Y, Ye W, et al. TEMPO: Prompt-based Generative Pre-trained Transformer for time series forecasting. arXiv [cs.LG]. 2023. Available: http://arxiv.org/abs/2310.04948