TGTGInsighttelegram intelligenceLIVE / telegram public index
Post content
帖子内容
massive GPU cluster 上训练技巧,看起来是对 mini-batch size 有一个比较好的 control,以及 2D-Torus all-reduce 来做各个 GPU 梯度更新同步问题。刚刚提交到 arxiv,来自 SONY 团队。paper 题目也很有意思:ImageNet/ResNet-50 Training in 224 Seconds. This work Tesla V100 x1088, Infiniband EDR x2, 91.62% GPU scaling efficiency https://arxiv.org/abs/1811.05233