Post #140

@dps_build

DPS Build

Views429Post view count

PostedApr 204/02/2023, 04:25 AM

Post content

使用 mmap() 之后，30B 的 LLM 只用到了不到6G内存具体原理是：每次的调用不需要使用到所有的 weights，所以使用 lazy loading 可以大大减少内存的消耗。 https://github.com/ggerganov/llama.cpp/discussions/638?sort=top#discussioncomment-5492916