TGTGInsighttelegram intelligenceLIVE / telegram public index
← Data Science Archive
Data Science Archive avatar

TGINSIGHT POST

Post #52

@DataScienceArchive

Data Science Archive

Views1,000帖子阅读量
发布11月14日2018/11/14 07:34
Post content

帖子内容

massive GPU cluster 上训练技巧,看起来是对 mini-batch size 有一个比较好的 control,以及 2D-Torus all-reduce 来做各个 GPU 梯度更新同步问题。刚刚提交到 arxiv,来自 SONY 团队。paper 题目也很有意思:ImageNet/ResNet-50 Training in 224 Seconds. This work Tesla V100 x1088, Infiniband EDR x2, 91.62% GPU scaling efficiency https://arxiv.org/abs/1811.05233