Search Results for author: Dzmitry Bahdanau

Found 44 papers, 27 papers with code

Neural Machine Translation by Jointly Learning to Align and Translate

121 code implementations • 1 Sep 2014 • Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio

Neural machine translation is a recently proposed approach to machine translation.

Ranked #4 on Dialogue Generation on Persona-Chat (using extra training data)

Bangla Spelling Error Correction Dialogue Generation +3

13,691

Paper
Code

Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation

43 code implementations • 3 Jun 2014 • Kyunghyun Cho, Bart van Merrienboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio

In this paper, we propose a novel neural network model called RNN Encoder-Decoder that consists of two recurrent neural networks (RNN).

Ranked #47 on Machine Translation on WMT2014 English-French

Machine Translation Translation

13,691

Paper
Code

Theano: A Python framework for fast computation of mathematical expressions

1 code implementation • 9 May 2016 • The Theano Development Team, Rami Al-Rfou, Guillaume Alain, Amjad Almahairi, Christof Angermueller, Dzmitry Bahdanau, Nicolas Ballas, Frédéric Bastien, Justin Bayer, Anatoly Belikov, Alexander Belopolsky, Yoshua Bengio, Arnaud Bergeron, James Bergstra, Valentin Bisson, Josh Bleecher Snyder, Nicolas Bouchard, Nicolas Boulanger-Lewandowski, Xavier Bouthillier, Alexandre de Brébisson, Olivier Breuleux, Pierre-Luc Carrier, Kyunghyun Cho, Jan Chorowski, Paul Christiano, Tim Cooijmans, Marc-Alexandre Côté, Myriam Côté, Aaron Courville, Yann N. Dauphin, Olivier Delalleau, Julien Demouth, Guillaume Desjardins, Sander Dieleman, Laurent Dinh, Mélanie Ducoffe, Vincent Dumoulin, Samira Ebrahimi Kahou, Dumitru Erhan, Ziye Fan, Orhan Firat, Mathieu Germain, Xavier Glorot, Ian Goodfellow, Matt Graham, Caglar Gulcehre, Philippe Hamel, Iban Harlouchet, Jean-Philippe Heng, Balázs Hidasi, Sina Honari, Arjun Jain, Sébastien Jean, Kai Jia, Mikhail Korobov, Vivek Kulkarni, Alex Lamb, Pascal Lamblin, Eric Larsen, César Laurent, Sean Lee, Simon Lefrancois, Simon Lemieux, Nicholas Léonard, Zhouhan Lin, Jesse A. Livezey, Cory Lorenz, Jeremiah Lowin, Qianli Ma, Pierre-Antoine Manzagol, Olivier Mastropietro, Robert T. McGibbon, Roland Memisevic, Bart van Merriënboer, Vincent Michalski, Mehdi Mirza, Alberto Orlandi, Christopher Pal, Razvan Pascanu, Mohammad Pezeshki, Colin Raffel, Daniel Renshaw, Matthew Rocklin, Adriana Romero, Markus Roth, Peter Sadowski, John Salvatier, François Savard, Jan Schlüter, John Schulman, Gabriel Schwartz, Iulian Vlad Serban, Dmitriy Serdyuk, Samira Shabanian, Étienne Simon, Sigurd Spieckermann, S. Ramana Subramanyam, Jakub Sygnowski, Jérémie Tanguay, Gijs van Tulder, Joseph Turian, Sebastian Urban, Pascal Vincent, Francesco Visin, Harm de Vries, David Warde-Farley, Dustin J. Webb, Matthew Willson, Kelvin Xu, Lijun Xue, Li Yao, Saizheng Zhang, Ying Zhang

Since its introduction, it has been one of the most used CPU and GPU mathematical compilers - especially in the machine learning community - and has shown steady performance improvements.

BIG-bench Machine Learning Clustering +2

9,851

Paper
Code

SantaCoder: don't reach for the stars!

5 code implementations • 9 Jan 2023 • Loubna Ben allal, Raymond Li, Denis Kocetkov, Chenghao Mou, Christopher Akiki, Carlos Munoz Ferrandis, Niklas Muennighoff, Mayank Mishra, Alex Gu, Manan Dey, Logesh Kumar Umapathi, Carolyn Jane Anderson, Yangtian Zi, Joel Lamy Poirier, Hailey Schoelkopf, Sergey Troshin, Dmitry Abulkhanov, Manuel Romero, Michael Lappert, Francesco De Toni, Bernardo García del Río, Qian Liu, Shamik Bose, Urvashi Bhattacharyya, Terry Yue Zhuo, Ian Yu, Paulo Villegas, Marco Zocca, Sourab Mangrulkar, David Lansky, Huu Nguyen, Danish Contractor, Luis Villa, Jia Li, Dzmitry Bahdanau, Yacine Jernite, Sean Hughes, Daniel Fried, Arjun Guha, Harm de Vries, Leandro von Werra

The BigCode project is an open-scientific collaboration working on the responsible development of large language models for code.

Code Generation

7,105

Paper
Code

StarCoder: may the source be with you!

4 code implementations • 9 May 2023 • Raymond Li, Loubna Ben allal, Yangtian Zi, Niklas Muennighoff, Denis Kocetkov, Chenghao Mou, Marc Marone, Christopher Akiki, Jia Li, Jenny Chim, Qian Liu, Evgenii Zheltonozhskii, Terry Yue Zhuo, Thomas Wang, Olivier Dehaene, Mishig Davaadorj, Joel Lamy-Poirier, João Monteiro, Oleh Shliazhko, Nicolas Gontier, Nicholas Meade, Armel Zebaze, Ming-Ho Yee, Logesh Kumar Umapathi, Jian Zhu, Benjamin Lipkin, Muhtasham Oblokulov, Zhiruo Wang, Rudra Murthy, Jason Stillerman, Siva Sankalp Patel, Dmitry Abulkhanov, Marco Zocca, Manan Dey, Zhihan Zhang, Nour Fahmy, Urvashi Bhattacharyya, Wenhao Yu, Swayam Singh, Sasha Luccioni, Paulo Villegas, Maxim Kunakov, Fedor Zhdanov, Manuel Romero, Tony Lee, Nadav Timor, Jennifer Ding, Claire Schlesinger, Hailey Schoelkopf, Jan Ebert, Tri Dao, Mayank Mishra, Alex Gu, Jennifer Robinson, Carolyn Jane Anderson, Brendan Dolan-Gavitt, Danish Contractor, Siva Reddy, Daniel Fried, Dzmitry Bahdanau, Yacine Jernite, Carlos Muñoz Ferrandis, Sean Hughes, Thomas Wolf, Arjun Guha, Leandro von Werra, Harm de Vries

The BigCode community, an open-scientific collaboration working on the responsible development of Large Language Models for Code (Code LLMs), introduces StarCoder and StarCoderBase: 15. 5B parameter models with 8K context length, infilling capabilities and fast large-batch inference enabled by multi-query attention.

Ranked #43 on Code Generation on MBPP

8k Code Generation

7,105

Paper
Code

BabyAI: A Platform to Study the Sample Efficiency of Grounded Language Learning

6 code implementations • ICLR 2019 • Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Salem Lahlou, Lucas Willems, Chitwan Saharia, Thien Huu Nguyen, Yoshua Bengio

Allowing humans to interactively train artificial agents to understand language instructions is desirable for both practical and scientific reasons, but given the poor data efficiency of the current learning methods, this goal may require substantial research efforts.

Grounded language learning

2,009

Paper
Code

Blocks and Fuel: Frameworks for deep learning

5 code implementations • 1 Jun 2015 • Bart van Merriënboer, Dzmitry Bahdanau, Vincent Dumoulin, Dmitriy Serdyuk, David Warde-Farley, Jan Chorowski, Yoshua Bengio

We introduce two Python frameworks to train neural networks on large datasets: Blocks and Fuel.

BIG-bench Machine Learning

1,161

Paper
Code

Attention-Based Models for Speech Recognition

14 code implementations • NeurIPS 2015 • Jan Chorowski, Dzmitry Bahdanau, Dmitriy Serdyuk, Kyunghyun Cho, Yoshua Bengio

Recurrent sequence generators conditioned on input data through an attention mechanism have recently shown very good performance on a range of tasks in- cluding machine translation, handwriting synthesis and image caption gen- eration.

Ranked #17 on Speech Recognition on TIMIT

Machine Translation Speech Recognition +1

1,159

Paper
Code

BabyAI 1.1

3 code implementations • 24 Jul 2020 • David Yu-Tung Hui, Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Yoshua Bengio

This increases reinforcement learning sample efficiency by up to 3 times and improves imitation learning performance on the hardest level from 77 % to 90. 4 %.

Computational Efficiency Imitation Learning +2

667

Paper
Code

An Actor-Critic Algorithm for Sequence Prediction

3 code implementations • 24 Jul 2016 • Dzmitry Bahdanau, Philemon Brakel, Kelvin Xu, Anirudh Goyal, Ryan Lowe, Joelle Pineau, Aaron Courville, Yoshua Bengio

We present an approach to training neural networks to generate sequences using actor-critic methods from reinforcement learning (RL).

Ranked #8 on Machine Translation on IWSLT2015 English-German

Caption Generation Machine Translation +3

656

Paper
Code

LLM2Vec: Large Language Models Are Secretly Powerful Text Encoders

2 code implementations • 9 Apr 2024 • Parishad BehnamGhader, Vaibhav Adlakha, Marius Mosbach, Dzmitry Bahdanau, Nicolas Chapados, Siva Reddy

We outperform encoder-only models by a large margin on word-level tasks and reach a new unsupervised state-of-the-art performance on the Massive Text Embeddings Benchmark (MTEB).

Contrastive Learning

340

Paper
Code

PICARD: Parsing Incrementally for Constrained Auto-Regressive Decoding from Language Models

2 code implementations • EMNLP 2021 • Torsten Scholak, Nathan Schucher, Dzmitry Bahdanau

Large pre-trained language models for textual data have an unconstrained output space; at each decoding step, they can produce any of 10, 000s of sub-word tokens.

Ranked #1 on Text-To-SQL on SPIDER

Dialogue State Tracking Semantic Parsing +3

321

Paper
Code

End-to-End Attention-based Large Vocabulary Speech Recognition

1 code implementation • 18 Aug 2015 • Dzmitry Bahdanau, Jan Chorowski, Dmitriy Serdyuk, Philemon Brakel, Yoshua Bengio

Many of the current state-of-the-art Large Vocabulary Continuous Speech Recognition Systems (LVCSR) are hybrids of neural networks and Hidden Markov Models (HMMs).

Acoustic Modelling Language Modelling +2

260

Paper
Code

Task Loss Estimation for Sequence Prediction

1 code implementation • 19 Nov 2015 • Dzmitry Bahdanau, Dmitriy Serdyuk, Philémon Brakel, Nan Rosemary Ke, Jan Chorowski, Aaron Courville, Yoshua Bengio

Our idea is that this score can be interpreted as an estimate of the task loss, and that the estimation error may be used as a consistent surrogate loss.

Language Modelling speech-recognition +1

260

Paper
Code

On the Properties of Neural Machine Translation: Encoder-Decoder Approaches

2 code implementations • 3 Sep 2014 • Kyunghyun Cho, Bart van Merrienboer, Dzmitry Bahdanau, Yoshua Bengio

In this paper, we focus on analyzing the properties of the neural machine translation using two models; RNN Encoder--Decoder and a newly proposed gated recursive convolutional neural network.

Machine Translation Sentence +1

150

Paper
Code

DuoRAT: Towards Simpler Text-to-SQL Models

1 code implementation • NAACL 2021 • Torsten Scholak, Raymond Li, Dzmitry Bahdanau, Harm de Vries, Chris Pal

Recent neural text-to-SQL models can effectively translate natural language questions to corresponding SQL queries on unseen databases.

Text-To-SQL

Paper
Code

Systematic Generalization: What Is Required and Can It Be Learned?

2 code implementations • ICLR 2019 • Dzmitry Bahdanau, Shikhar Murty, Michael Noukhovitch, Thien Huu Nguyen, Harm de Vries, Aaron Courville

Numerous models for grounded language understanding have been recently proposed, including (i) generic models that can be easily adapted to any given task and (ii) intuitively appealing modular models that require background knowledge to be instantiated.

Systematic Generalization Visual Question Answering (VQA)

Paper
Code

CLOSURE: Assessing Systematic Generalization of CLEVR Models

3 code implementations • 12 Dec 2019 • Dzmitry Bahdanau, Harm de Vries, Timothy J. O'Donnell, Shikhar Murty, Philippe Beaudoin, Yoshua Bengio, Aaron Courville

In this work, we study how systematic the generalization of such models is, that is to which extent they are capable of handling novel combinations of known linguistic constructs.

Few-Shot Learning Systematic Generalization +1

Paper
Code

Data Augmentation for Intent Classification with Off-the-shelf Large Language Models

1 code implementation • NLP4ConvAI (ACL) 2022 • Gaurav Sahu, Pau Rodriguez, Issam H. Laradji, Parmida Atighehchian, David Vazquez, Dzmitry Bahdanau

Data augmentation is a widely employed technique to alleviate the problem of data scarcity.

Data Augmentation intent-classification +1

Paper
Code

Systematic Generalization with Edge Transformers

1 code implementation • NeurIPS 2021 • Leon Bergen, Timothy J. O'Donnell, Dzmitry Bahdanau

Recent research suggests that systematic generalization in natural language understanding remains a challenge for state-of-the-art neural models such as Transformers and Graph Neural Networks.

Dependency Parsing Natural Language Understanding +3

Paper
Code

Understanding by Understanding Not: Modeling Negation in Language Models

1 code implementation • NAACL 2021 • Arian Hosseini, Siva Reddy, Dzmitry Bahdanau, R Devon Hjelm, Alessandro Sordoni, Aaron Courville

To improve language models in this regard, we propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences from a raw text corpus.

Language Modelling Negation

Paper
Code

LAGr: Labeling Aligned Graphs for Improving Systematic Generalization in Semantic Parsing

1 code implementation • 14 Oct 2021 • Dora Jambor, Dzmitry Bahdanau

In this work, we show that better systematic generalization can be achieved by producing the meaning representation (MR) directly as a graph and not as a sequence.

Semantic Parsing Systematic Generalization

Paper
Code

LAGr: Label Aligned Graphs for Better Systematic Generalization in Semantic Parsing

1 code implementation • ACL 2022 • Dora Jambor, Dzmitry Bahdanau

In this work, we show that better systematic generalization can be achieved by producing the meaning representation directly as a graph and not as a sequence.

Semantic Parsing Systematic Generalization

Paper
Code

Learning to Understand Goal Specifications by Modelling Reward

1 code implementation • ICLR 2019 • Dzmitry Bahdanau, Felix Hill, Jan Leike, Edward Hughes, Arian Hosseini, Pushmeet Kohli, Edward Grefenstette

Recent work has shown that deep reinforcement-learning agents can learn to follow language-like instructions from infrequent environment rewards.

Paper
Code

PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation

1 code implementation • 22 Oct 2023 • Gaurav Sahu, Olga Vechtomova, Dzmitry Bahdanau, Issam H. Laradji

Our specific PromptMix method consists of two steps: 1) generate challenging text augmentations near class boundaries; however, generating borderline examples increases the risk of false positives in the dataset, so we 2) relabel the text augmentations using a prompting-based LLM classifier to enhance the correctness of labels in the generated data.

Data Augmentation Language Modelling +3

Paper
Code

Automated curriculum generation for Policy Gradients from Demonstrations

1 code implementation • 1 Dec 2019 • Anirudh Srinivasan, Dzmitry Bahdanau, Maxime Chevalier-Boisvert, Yoshua Bengio

In this paper, we present a technique that improves the process of training an agent (using RL) for instruction following.

Instruction Following

Paper
Code

Commonsense mining as knowledge base completion? A study on the impact of novelty

no code implementations • WS 2018 • Stanisław Jastrzębski, Dzmitry Bahdanau, Seyedarian Hosseini, Michael Noukhovitch, Yoshua Bengio, Jackie Chi Kit Cheung

Commonsense knowledge bases such as ConceptNet represent knowledge in the form of relational triples.

Knowledge Base Completion

Paper
Add Code

Learning to Compute Word Embeddings On the Fly

no code implementations • ICLR 2018 • Dzmitry Bahdanau, Tom Bosc, Stanisław Jastrzębski, Edward Grefenstette, Pascal Vincent, Yoshua Bengio

Words in natural language follow a Zipfian distribution whereby some words are frequent but most are rare.

Ranked #48 on Question Answering on SQuAD1.1 dev

Language Modelling Natural Language Inference +3

Paper
Add Code

Sequence Tutor: Conservative Fine-Tuning of Sequence Generation Models with KL-control

no code implementations • ICML 2017 • Natasha Jaques, Shixiang Gu, Dzmitry Bahdanau, José Miguel Hernández-Lobato, Richard E. Turner, Douglas Eck

This paper proposes a general method for improving the structure and quality of sequences generated by a recurrent neural network (RNN), while maintaining information originally learned from data, as well as sample diversity.

Reinforcement Learning (RL)

Paper
Add Code

End-to-end Continuous Speech Recognition using Attention-based Recurrent NN: First Results

no code implementations • 4 Dec 2014 • Jan Chorowski, Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio

We replace the Hidden Markov Model (HMM) which is traditionally used in in continuous speech recognition with a bi-directional recurrent neural network encoder coupled to a recurrent neural network decoder that directly emits a stream of phonemes.

speech-recognition Speech Recognition

Paper
Add Code

Overcoming the Curse of Sentence Length for Neural Machine Translation using Automatic Segmentation

no code implementations • WS 2014 • Jean Pouget-Abadie, Dzmitry Bahdanau, Bart van Merrienboer, Kyunghyun Cho, Yoshua Bengio

The authors of (Cho et al., 2014a) have shown that the recently introduced neural network translation systems suffer from a significant drop in translation quality when translating long sentences, unlike existing phrase-based translation systems.

Machine Translation Sentence +1

Paper
Add Code

Learning Phrase Representations using RNN Encoder--Decoder for Statistical Machine Translation

no code implementations • EMNLP 2014 • Kyunghyun Cho, Bart van Merri{\"e}nboer, Caglar Gulcehre, Dzmitry Bahdanau, Fethi Bougares, Holger Schwenk, Yoshua Bengio

Language Modelling Machine Translation +2

Paper
Add Code

On the Properties of Neural Machine Translation: Encoder--Decoder Approaches

no code implementations • WS 2014 • Kyunghyun Cho, Bart van Merri{\"e}nboer, Dzmitry Bahdanau, Yoshua Bengio

Machine Translation Translation

Paper
Add Code

Combating False Negatives in Adversarial Imitation Learning

no code implementations • 2 Feb 2020 • Konrad Zolna, Chitwan Saharia, Leonard Boussioux, David Yu-Tung Hui, Maxime Chevalier-Boisvert, Dzmitry Bahdanau, Yoshua Bengio

In adversarial imitation learning, a discriminator is trained to differentiate agent episodes from expert demonstrations representing the desired behavior.

Imitation Learning

Paper
Add Code

Towards Ecologically Valid Research on Language User Interfaces

no code implementations • 28 Jul 2020 • Harm de Vries, Dzmitry Bahdanau, Christopher Manning

To this end, we describe what we deem an ideal methodology for machine learning research on LUIs and categorize five common ways in which recent benchmarks deviate from it.

BIG-bench Machine Learning valid

Paper
Add Code

Jointly Learning Truth-Conditional Denotations and Groundings using Parallel Attention

no code implementations • 14 Apr 2021 • Leon Bergen, Dzmitry Bahdanau, Timothy J. O'Donnell

We present a model that jointly learns the denotations of words together with their groundings using a truth-conditional semantics.

Question Answering Visual Question Answering

Paper
Add Code

Compositional Generalization in Dependency Parsing

no code implementations • ACL 2022 • Emily Goodwin, Siva Reddy, Timothy J. O'Donnell, Dzmitry Bahdanau

To test compositional generalization in semantic parsing, Keysers et al. (2020) introduced Compositional Freebase Queries (CFQ).

Dependency Parsing Semantic Parsing

Paper
Add Code

Evaluating the Text-to-SQL Capabilities of Large Language Models

no code implementations • 15 Mar 2022 • Nitarshan Rajkumar, Raymond Li, Dzmitry Bahdanau

We perform an empirical evaluation of Text-to-SQL capabilities of the Codex language model.

Language Modelling Text-To-SQL

Paper
Add Code

On the Compositional Generalization Gap of In-Context Learning

no code implementations • 15 Nov 2022 • Arian Hosseini, Ankit Vani, Dzmitry Bahdanau, Alessandro Sordoni, Aaron Courville

In this work, we look at the gap between the in-distribution (ID) and out-of-distribution (OOD) performance of such models in semantic parsing tasks with in-context learning.

In-Context Learning Semantic Parsing

Paper
Add Code

The Stack: 3 TB of permissively licensed source code

no code implementations • 20 Nov 2022 • Denis Kocetkov, Raymond Li, Loubna Ben allal, Jia Li, Chenghao Mou, Carlos Muñoz Ferrandis, Yacine Jernite, Margaret Mitchell, Sean Hughes, Thomas Wolf, Dzmitry Bahdanau, Leandro von Werra, Harm de Vries

Large Language Models (LLMs) play an ever-increasing role in the field of Artificial Intelligence (AI)--not only for natural language processing but also for code understanding and generation.

Paper
Add Code

RepoFusion: Training Code Models to Understand Your Repository

no code implementations • 19 Jun 2023 • Disha Shrivastava, Denis Kocetkov, Harm de Vries, Dzmitry Bahdanau, Torsten Scholak

We find these results to be a novel and compelling demonstration of the gains that training with repository context can bring.

Code Completion

Paper
Add Code

In-Context Learning for Text Classification with Many Labels

no code implementations • 19 Sep 2023 • Aristides Milios, Siva Reddy, Dzmitry Bahdanau

We analyze the performance across number of in-context examples and different model scales, showing that larger models are necessary to effectively and consistently make use of larger context lengths for ICL.

In-Context Learning intent-classification +6

Paper
Add Code

MAGNIFICo: Evaluating the In-Context Learning Ability of Large Language Models to Generalize to Novel Interpretations

1 code implementation • 18 Oct 2023 • Arkil Patel, Satwik Bhattamishra, Siva Reddy, Dzmitry Bahdanau

Additionally, our analysis uncovers the semantic predispositions in LLMs and reveals the impact of recency bias for information presented in long contexts.

In-Context Learning Semantic Parsing +1

Paper
Code

Evaluating In-Context Learning of Libraries for Code Generation

no code implementations • 16 Nov 2023 • Arkil Patel, Siva Reddy, Dzmitry Bahdanau, Pradeep Dasigi

Contemporary Large Language Models (LLMs) exhibit a high degree of code generation and comprehension capability.

Code Generation In-Context Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.