2 code implementations • 2 Oct 2024 • Shubham Toshniwal, Wei Du, Ivan Moshkov, Branislav Kisacanin, Alexan Ayrapetyan, Igor Gitman
However, most of the cutting-edge progress in mathematical reasoning with LLMs has become \emph{closed-source} due to lack of access to training data.
Ranked #3 on Arithmetic Reasoning on GSM8K (using extra training data)
1 code implementation • 17 Jun 2024 • Nvidia, :, Bo Adler, Niket Agarwal, Ashwath Aithal, Dong H. Anh, Pallab Bhattacharya, Annika Brundyn, Jared Casper, Bryan Catanzaro, Sharon Clay, Jonathan Cohen, Sirshak Das, Ayush Dattagupta, Olivier Delalleau, Leon Derczynski, Yi Dong, Daniel Egert, Ellie Evans, Aleksander Ficek, Denys Fridman, Shaona Ghosh, Boris Ginsburg, Igor Gitman, Tomasz Grzegorzek, Robert Hero, Jining Huang, Vibhu Jawa, Joseph Jennings, Aastha Jhunjhunwala, John Kamalu, Sadaf Khan, Oleksii Kuchaiev, Patrick Legresley, Hui Li, Jiwei Liu, Zihan Liu, Eileen Long, Ameya Sunil Mahabaleshwarkar, Somshubra Majumdar, James Maki, Miguel Martinez, Maer Rodrigues de Melo, Ivan Moshkov, Deepak Narayanan, Sean Narenthiran, Jesus Navarro, Phong Nguyen, Osvald Nitski, Vahid Noroozi, Guruprasad Nutheti, Christopher Parisien, Jupinder Parmar, Mostofa Patwary, Krzysztof Pawelec, Wei Ping, Shrimai Prabhumoye, Rajarshi Roy, Trisha Saar, Vasanth Rao Naik Sabavat, Sanjeev Satheesh, Jane Polak Scowcroft, Jason Sewall, Pavel Shamis, Gerald Shen, Mohammad Shoeybi, Dave Sizer, Misha Smelyanskiy, Felipe Soares, Makesh Narsimhan Sreedhar, Dan Su, Sandeep Subramanian, Shengyang Sun, Shubham Toshniwal, Hao Wang, Zhilin Wang, Jiaxuan You, Jiaqi Zeng, Jimmy Zhang, Jing Zhang, Vivienne Zhang, Yian Zhang, Chen Zhu
We release the Nemotron-4 340B model family, including Nemotron-4-340B-Base, Nemotron-4-340B-Instruct, and Nemotron-4-340B-Reward.
1 code implementation • 15 Feb 2024 • Shubham Toshniwal, Ivan Moshkov, Sean Narenthiran, Daria Gitman, Fei Jia, Igor Gitman
Building on the recent progress in open-source LLMs, our proposed prompting novelty, and some brute-force scaling, we construct OpenMathInstruct-1, a math instruction tuning dataset with 1. 8M problem-solution pairs.
Ranked #1 on Math Word Problem Solving on MAWPS (using extra training data)
no code implementations • 27 Jun 2023 • Igor Gitman, Vitaly Lavrukhin, Aleksandr Laptev, Boris Ginsburg
Second, we demonstrate that it is possible to combine base and adapted models to achieve strong results on both original and target data.
1 code implementation • 18 Mar 2023 • Aleksandr Laptev, Vladimir Bataev, Igor Gitman, Boris Ginsburg
This paper presents a framework based on Weighted Finite-State Transducers (WFST) to simplify the development of modifications for RNN-Transducer (RNN-T) loss.
1 code implementation • NeurIPS 2019 • Igor Gitman, Hunter Lang, Pengchuan Zhang, Lin Xiao
The use of momentum in stochastic gradient methods has become a widespread practice in machine learning.
no code implementations • WS 2018 • Oleksii Kuchaiev, Boris Ginsburg, Igor Gitman, Vitaly Lavrukhin, Carl Case, Paulius Micikevicius
We present OpenSeq2Seq {--} an open-source toolkit for training sequence-to-sequence models.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
3 code implementations • 25 May 2018 • Oleksii Kuchaiev, Boris Ginsburg, Igor Gitman, Vitaly Lavrukhin, Jason Li, Huyen Nguyen, Carl Case, Paulius Micikevicius
We present OpenSeq2Seq - a TensorFlow-based toolkit for training sequence-to-sequence models that features distributed and mixed-precision training.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
1 code implementation • 28 Apr 2018 • Igor Gitman, Jieshi Chen, Eric Lei, Artur Dubrawski
In this paper we propose two novel approaches on how to solve this problem.
no code implementations • 9 Jan 2018 • Igor Gitman, Deepak Dilipkumar, Ben Parr
The basic idea of both of these algorithms is to make each step of the gradient descent proportional to the current weight norm and independent of the gradient magnitude.
no code implementations • ICLR 2018 • Boris Ginsburg, Igor Gitman, Yang You
Using LARS, we scaled AlexNet and ResNet-50 to a batch size of 16K.
no code implementations • 24 Sep 2017 • Igor Gitman, Boris Ginsburg
However, it is not clear if these algorithms could replace BN in practical, large-scale applications.
12 code implementations • 13 Aug 2017 • Yang You, Igor Gitman, Boris Ginsburg
Using LARS, we scaled Alexnet up to a batch size of 8K, and Resnet-50 to a batch size of 32K without loss in accuracy.