no code implementations • 30 Mar 2021 • Masahiro Tanaka, Kenjiro Taura, Toshihiro Hanawa, Kentaro Torisawa
RaNNC also achieved better training throughputs than GPipe on both the enlarged BERT model pre-training (GPipe with hybrid parallelism) and the enlarged ResNet models (GPipe with model parallelism) in all of the settings we tried.
no code implementations • COLING 2016 • Junta Mizuno, Masahiro Tanaka, Kiyonori Ohtake, Jong-Hoon Oh, Julien Kloetzer, Chikara Hashimoto, Kentaro Torisawa
We demonstrate our large-scale NLP systems: WISDOM X, DISAANA, and D-SUMM.