Search Results for author: Sandeep Subramanian

Found 24 papers, 9 papers with code

NVIDIA NeMo’s Neural Machine Translation Systems for English-German and English-Russian News and Biomedical Tasks at WMT21

no code implementations • WMT (EMNLP) 2021 • Sandeep Subramanian, Oleksii Hrinchuk, Virginia Adams, Oleksii Kuchaiev

This paper provides an overview of NVIDIA NeMo’s neural machine translation systems for the constrained data track of the WMT21 News and Biomedical Shared Translation Tasks.

Data Augmentation Knowledge Distillation +3

Paper
Add Code

NVIDIA NeMo Offline Speech Translation Systems for IWSLT 2022

no code implementations • IWSLT (ACL) 2022 • Oleksii Hrinchuk, Vahid Noroozi, Ashwinkumar Ganesan, Sarah Campbell, Sandeep Subramanian, Somshubra Majumdar, Oleksii Kuchaiev

Our cascade system consists of 1) Conformer RNN-T automatic speech recognition model, 2) punctuation-capitalization model based on pre-trained T5 encoder, 3) ensemble of Transformer neural machine translation models fine-tuned on TED talks.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Add Code

Nemotron-4 15B Technical Report

no code implementations • 26 Feb 2024 • Jupinder Parmar, Shrimai Prabhumoye, Joseph Jennings, Mostofa Patwary, Sandeep Subramanian, Dan Su, Chen Zhu, Deepak Narayanan, Aastha Jhunjhunwala, Ayush Dattagupta, Vibhu Jawa, Jiwei Liu, Ameya Mahabaleshwarkar, Osvald Nitski, Annika Brundyn, James Maki, Miguel Martinez, Jiaxuan You, John Kamalu, Patrick Legresley, Denys Fridman, Jared Casper, Ashwath Aithal, Oleksii Kuchaiev, Mohammad Shoeybi, Jonathan Cohen, Bryan Catanzaro

We introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens.

Language Modelling

Paper
Add Code

Mixtral of Experts

3 code implementations • 8 Jan 2024 • Albert Q. Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de Las Casas, Emma Bou Hanna, Florian Bressand, Gianna Lengyel, Guillaume Bour, Guillaume Lample, Lélio Renard Lavaud, Lucile Saulnier, Marie-Anne Lachaux, Pierre Stock, Sandeep Subramanian, Sophia Yang, Szymon Antoniak, Teven Le Scao, Théophile Gervet, Thibaut Lavril, Thomas Wang, Timothée Lacroix, William El Sayed

In particular, Mixtral vastly outperforms Llama 2 70B on mathematics, code generation, and multilingual benchmarks.

Ranked #9 on Question Answering on PIQA

Code Generation Common Sense Reasoning +4

605

Paper
Code

Retrieval meets Long Context Large Language Models

no code implementations • 4 Oct 2023 • Peng Xu, Wei Ping, Xianchao Wu, Lawrence McAfee, Chen Zhu, Zihan Liu, Sandeep Subramanian, Evelina Bakhturina, Mohammad Shoeybi, Bryan Catanzaro

Perhaps surprisingly, we find that LLM with 4K context window using simple retrieval-augmentation at generation can achieve comparable performance to finetuned LLM with 16K context window via positional interpolation on long context tasks, while taking much less computation.

16k 4k +4

Paper
Add Code

Finding the Right Recipe for Low Resource Domain Adaptation in Neural Machine Translation

no code implementations • 2 Jun 2022 • Virginia Adams, Sandeep Subramanian, Mike Chrzanowski, Oleksii Hrinchuk, Oleksii Kuchaiev

General translation models often still struggle to generate accurate translations in specialized domains.

8k Domain Adaptation +3

Paper
Add Code

NVIDIA NeMo Neural Machine Translation Systems for English-German and English-Russian News and Biomedical Tasks at WMT21

no code implementations • 16 Nov 2021 • Sandeep Subramanian, Oleksii Hrinchuk, Virginia Adams, Oleksii Kuchaiev

This paper provides an overview of NVIDIA NeMo's neural machine translation systems for the constrained data track of the WMT21 News and Biomedical Shared Translation Tasks.

Data Augmentation Knowledge Distillation +3

Paper
Add Code

Multi-scale Transformer Language Models

no code implementations • 1 May 2020 • Sandeep Subramanian, Ronan Collobert, Marc'Aurelio Ranzato, Y-Lan Boureau

We investigate multi-scale transformer language models that learn representations of text at multiple scales, and present three different architectures that have an inductive bias to handle the hierarchical nature of language.

Inductive Bias Language Modelling

Paper
Add Code

On Extractive and Abstractive Neural Document Summarization with Transformer Language Models

1 code implementation • EMNLP 2020 • Sandeep Subramanian, Raymond Li, Jonathan Pilault, Christopher Pal

We present a method to produce abstractive summaries of long documents that exceed several thousand words via neural abstractive summarization.

Ranked #18 on Text Summarization on Pubmed

Abstractive Text Summarization Document Summarization +1

Paper
Code

Do Neural Dialog Systems Use the Conversation History Effectively? An Empirical Study

1 code implementation • ACL 2019 • Chinnadhurai Sankar, Sandeep Subramanian, Christopher Pal, Sarath Chandar, Yoshua Bengio

Neural generative models have been become increasingly popular when building conversational agents.

Paper
Code

State-Reification Networks: Improving Generalization by Modeling the Distribution of Hidden Representations

no code implementations • 26 May 2019 • Alex Lamb, Jonathan Binas, Anirudh Goyal, Sandeep Subramanian, Ioannis Mitliagkas, Denis Kazakov, Yoshua Bengio, Michael C. Mozer

Machine learning promises methods that generalize well from finite labeled data.

Paper
Add Code

Multiple-Attribute Text Rewriting

no code implementations • ICLR 2019 • Guillaume Lample, Sandeep Subramanian, Eric Smith, Ludovic Denoyer, Marc'Aurelio Ranzato, Y-Lan Boureau

The dominant approach to unsupervised "style transfer" in text is based on the idea of learning a latent representation, which is independent of the attributes specifying its "style".

Attribute Disentanglement +2

Paper
Add Code

Towards Text Generation with Adversarially Learned Neural Outlines

no code implementations • NeurIPS 2018 • Sandeep Subramanian, Sai Rajeswar Mudumba, Alessandro Sordoni, Adam Trischler, Aaron C. Courville, Chris Pal

We generate outlines with an adversarial model trained to approximate the distribution of sentences in a latent space induced by general-purpose sentence encoders.

Sentence Text Generation

Paper
Add Code

Multiple-Attribute Text Style Transfer

3 code implementations • 1 Nov 2018 • Sandeep Subramanian, Guillaume Lample, Eric Michael Smith, Ludovic Denoyer, Marc'Aurelio Ranzato, Y-Lan Boureau

The dominant approach to unsupervised "style transfer" in text is based on the idea of learning a latent representation, which is independent of the attributes specifying its "style".

Attribute Disentanglement +3

222

Paper
Code

A Framework towards Domain Specific Video Summarization

no code implementations • 24 Sep 2018 • Vishal Kaushal, Sandeep Subramanian, Suraj Kothawade, Rishabh Iyer, Ganesh Ramakrishnan

We propose a novel framework for domain specific video summarization.

Video Summarization

Paper
Add Code

Fortified Networks: Improving the Robustness of Deep Networks by Modeling the Manifold of Hidden Representations

1 code implementation • ICLR 2019 • Alex Lamb, Jonathan Binas, Anirudh Goyal, Dmitriy Serdyuk, Sandeep Subramanian, Ioannis Mitliagkas, Yoshua Bengio

Deep networks have achieved impressive results across a variety of important tasks.

Paper
Code

Learning General Purpose Distributed Sentence Representations via Large Scale Multi-task Learning

4 code implementations • ICLR 2018 • Sandeep Subramanian, Adam Trischler, Yoshua Bengio, Christopher J. Pal

In this work, we present a simple, effective multi-task learning framework for sentence representations that combines the inductive biases of diverse training objectives in a single model.

Ranked #1 on Semantic Textual Similarity on SentEval

Multi-Task Learning Natural Language Inference +2

2,279

Paper
Code

A Deep Reinforcement Learning Chatbot (Short Version)

no code implementations • 20 Jan 2018 • Iulian V. Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang, Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath Chandar, Nan Rosemary Ke, Sai Rajeswar, Alexandre de Brebisson, Jose M. R. Sotelo, Dendi Suhubdy, Vincent Michalski, Alexandre Nguyen, Joelle Pineau, Yoshua Bengio

We present MILABOT: a deep reinforcement learning chatbot developed by the Montreal Institute for Learning Algorithms (MILA) for the Amazon Alexa Prize competition.

Chatbot reinforcement-learning +3

Paper
Add Code

A Deep Reinforcement Learning Chatbot

no code implementations • 7 Sep 2017 • Iulian V. Serban, Chinnadhurai Sankar, Mathieu Germain, Saizheng Zhang, Zhouhan Lin, Sandeep Subramanian, Taesup Kim, Michael Pieper, Sarath Chandar, Nan Rosemary Ke, Sai Rajeshwar, Alexandre de Brebisson, Jose M. R. Sotelo, Dendi Suhubdy, Vincent Michalski, Alexandre Nguyen, Joelle Pineau, Yoshua Bengio

By applying reinforcement learning to crowdsourced data and real-world user interactions, the system has been trained to select an appropriate response from the models in its ensemble.

Chatbot reinforcement-learning +3

Paper
Add Code

Neural Models for Key Phrase Detection and Question Generation

no code implementations • 14 Jun 2017 • Sandeep Subramanian, Tong Wang, Xingdi Yuan, Saizheng Zhang, Yoshua Bengio, Adam Trischler

We propose a two-stage neural model to tackle question generation from documents.

Question Answering Question Generation +2

Paper
Add Code

Adversarial Generation of Natural Language

no code implementations • WS 2017 • Sai Rajeswar, Sandeep Subramanian, Francis Dutil, Christopher Pal, Aaron Courville

Generative Adversarial Networks (GANs) have gathered a lot of attention from the computer vision community, yielding impressive results for image generation.

Image Generation Language Modelling +1

Paper
Add Code

Deep Complex Networks

9 code implementations • ICLR 2018 • Chiheb Trabelsi, Olexa Bilaniuk, Ying Zhang, Dmitriy Serdyuk, Sandeep Subramanian, João Felipe Santos, Soroush Mehri, Negar Rostamzadeh, Yoshua Bengio, Christopher J. Pal

Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models.

Ranked #3 on Music Transcription on MusicNet

Image Classification Music Transcription +1

700

Paper
Code

Machine Comprehension by Text-to-Text Neural Question Generation

4 code implementations • WS 2017 • Xingdi Yuan, Tong Wang, Caglar Gulcehre, Alessandro Sordoni, Philip Bachman, Sandeep Subramanian, Saizheng Zhang, Adam Trischler

We propose a recurrent neural model that generates natural-language questions from documents, conditioned on answers.

Question Answering Question Generation +4

216

Paper
Code

Neural Architectures for Named Entity Recognition

43 code implementations • NAACL 2016 • Guillaume Lample, Miguel Ballesteros, Sandeep Subramanian, Kazuya Kawakami, Chris Dyer

State-of-the-art named entity recognition systems rely heavily on hand-crafted features and domain-specific knowledge in order to learn effectively from the small, supervised training corpora that are available.

Ranked #8 on Named Entity Recognition (NER) on CoNLL++

Named Entity Recognition

13,553

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.