Post #410

@amneumarkt

Am Neumarkt 😱

Visninger230Antal visninger

Publiceret19. okt.19.10.2022, 17.55

Indhold

Opslagsindhold

#ml https://developer.nvidia.com/blog/how-optimize-data-transfers-cuda-cc/ I find this post very useful. I have always wondered what happens after my dataloader prepared everything for the GPU. I didn’t know that CUDA has to copy the data again to create page-locked memory. I used to set pin_memory=True in a PyTorch DataLoader and benchmark it. To be honest, I have only observed very small improvements in most of my experiments. So I stopped caring about pin_memory. After some digging, I also realized that performance from setting pin_memory=True in DataLoader is ticky. If we don’t use multiprocessing nor reuse the page-locked memory, it is hard to expect any performance gain. (some other notes: https://datumorphism.leima.is/cards/machine-learning/practice/cuda-memory/)