no code implementations • EMNLP 2021 • Meghana Moorthy Bhat, Alessandro Sordoni, Subhabrata Mukherjee
While pre-trained language models have obtained state-of-the-art performance for several natural language understanding tasks, they are quite opaque in terms of their decision-making process.
no code implementations • COLING 2022 • Michaela Socolof, Jacob Louis Hoover, Richard Futrell, Alessandro Sordoni, Timothy J. O’Donnell
Morphological systems across languages vary when it comes to the relation between form and meaning.
1 code implementation • ACL 2022 • Yikang Shen, Shawn Tan, Alessandro Sordoni, Peng Li, Jie zhou, Aaron Courville
We introduce a new model, the Unsupervised Dependency Graph Network (UDGN), that can induce dependency structures from raw corpora and the masked language modeling task.
no code implementations • 2 Oct 2024 • Arian Hosseini, Alessandro Sordoni, Daniel Toyama, Aaron Courville, Rishabh Agarwal
We study the depth of grade-school math (GSM) problem-solving capabilities of LLMs.
1 code implementation • 2 Oct 2024 • Amirhossein Kazemnejad, Milad Aghajohari, Eva Portelance, Alessandro Sordoni, Siva Reddy, Aaron Courville, Nicolas Le Roux
In this work, we systematically evaluate the efficacy of value networks and reveal their significant shortcomings in reasoning-heavy LLM tasks, showing that they barely outperform a random baseline when comparing alternative steps.
no code implementations • 13 Aug 2024 • Prateek Yadav, Colin Raffel, Mohammed Muqeeth, Lucas Caccia, Haokun Liu, Tianlong Chen, Mohit Bansal, Leshem Choshen, Alessandro Sordoni
The availability of performant pre-trained models has led to a proliferation of fine-tuned expert models that are specialized to a particular domain or task.
no code implementations • 20 Jul 2024 • Silviu Pitis, Ziang Xiao, Nicolas Le Roux, Alessandro Sordoni
To address these challenges, we consider the two-step preference modeling procedure that first resolves the under-specification by selecting a context, and then evaluates preference with respect to the chosen context.
1 code implementation • 24 May 2024 • Sophie Xhonneux, Alessandro Sordoni, Stephan Günnemann, Gauthier Gidel, Leo Schwinn
We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses: the first makes the model robust on continuous embedding attacks computed on an adversarial behaviour dataset; the second ensures the usefulness of the final model by fine-tuning on utility data.
1 code implementation • 18 May 2024 • Oleksiy Ostapenko, Zhan Su, Edoardo Maria Ponti, Laurent Charlin, Nicolas Le Roux, Matheus Pereira, Lucas Caccia, Alessandro Sordoni
The growing number of parameter-efficient adaptations of a base large language model (LLM) calls for studying whether we can reuse such trained adapters to improve performance for new tasks.
no code implementations • 9 Feb 2024 • Arian Hosseini, Xingdi Yuan, Nikolay Malkin, Aaron Courville, Alessandro Sordoni, Rishabh Agarwal
Common self-improvement approaches for large language models (LLMs), such as STaR, iteratively fine-tune LLMs on self-generated solutions to improve their problem-solving ability.
no code implementations • 9 Oct 2023 • Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni
To encourage a more structural generation of CoT steps, we propose a hierarchical generation scheme: we let the LM generate a planning token at the start of each reasoning step, intuitively serving as a high-level plan of the current step, and add their embeddings to the model parameters.
1 code implementation • NeurIPS 2023 • Alessandro Sordoni, Xingdi Yuan, Marc-Alexandre Côté, Matheus Pereira, Adam Trischler, Ziang Xiao, Arian Hosseini, Friederike Niedtner, Nicolas Le Roux
Thus, they can be seen as stochastic language layers in a language network, where the learnable parameters are the natural language prompts at each layer.
no code implementations • 15 Nov 2022 • Arian Hosseini, Ankit Vani, Dzmitry Bahdanau, Alessandro Sordoni, Aaron Courville
In this work, we look at the gap between the in-distribution (ID) and out-of-distribution (OOD) performance of such models in semantic parsing tasks with in-context learning.
1 code implementation • NeurIPS 2023 • Lucas Caccia, Edoardo Ponti, Zhan Su, Matheus Pereira, Nicolas Le Roux, Alessandro Sordoni
We find that routing is most beneficial during multi-task pre-training rather than during few-shot adaptation and propose $\texttt{MHR}$-$\mu$, which discards routing and fine-tunes the average of the pre-trained adapters on each downstream tasks.
no code implementations • 2 Jun 2022 • Yuchen Lu, Zhen Liu, Aristide Baratin, Romain Laroche, Aaron Courville, Alessandro Sordoni
We address the problem of evaluating the quality of self-supervised learning (SSL) models without access to supervised labels, while being agnostic to the architecture, learning algorithm or data manipulation used during training.
no code implementations • ICLR 2022 • Benjamin LeBrun, Alessandro Sordoni, Timothy J. O'Donnell
To address this gap, we develop a controlled evaluation scheme which uses generative models trained on natural data as artificial languages from which we can exactly compute sequence probabilities.
1 code implementation • ACL 2022 • He Bai, Tong Wang, Alessandro Sordoni, Peng Shi
Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs.
2 code implementations • 28 Feb 2022 • Edoardo M. Ponti, Alessandro Sordoni, Yoshua Bengio, Siva Reddy
By jointly learning these and a task-skill allocation matrix, the network for each task is instantiated as the average of the parameters of active skills.
no code implementations • NAACL 2022 • Ian Porada, Alessandro Sordoni, Jackie Chi Kit Cheung
Transformer models pre-trained with a masked-language-modeling objective (e. g., BERT) encode commonsense knowledge as evidenced by behavioral probes; however, the extent to which this knowledge is acquired by systematic inference over the semantics of the pre-training corpora is an open question.
no code implementations • ICLR 2022 • Shawn Tan, Chin-wei Huang, Alessandro Sordoni, Aaron Courville
Addtionally, since the support of the marginal $q(z)$ is bounded and the support of prior $p(z)$ is not, we propose renormalising the prior distribution over the support of $q(z)$.
no code implementations • 29 Sep 2021 • Yuchen Lu, Zhen Liu, Alessandro Sordoni, Aristide Baratin, Romain Laroche, Aaron Courville
In this work, we argue that representations induced by self-supervised learning (SSL) methods should both be expressive and learnable.
no code implementations • 17 Sep 2021 • Meghana Moorthy Bhat, Alessandro Sordoni, Subhabrata Mukherjee
While pre-trained language models have obtained state-of-the-art performance for several natural language understanding tasks, they are quite opaque in terms of their decision-making process.
1 code implementation • CoNLL (EMNLP) 2021 • Eva Portelance, Michael C. Frank, Dan Jurafsky, Alessandro Sordoni, Romain Laroche
By the age of two, children tend to assume that new word categories are based on objects' shape, rather than their color or texture; this assumption is called the shape bias.
no code implementations • 25 Jun 2021 • Alessandro Sordoni, Nouha Dziri, Hannes Schulz, Geoff Gordon, Phil Bachman, Remi Tachet
We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views.
1 code implementation • NAACL 2021 • Arian Hosseini, Siva Reddy, Dzmitry Bahdanau, R Devon Hjelm, Alessandro Sordoni, Aaron Courville
To improve language models in this regard, we propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences from a raw text corpus.
1 code implementation • EMNLP 2021 • Jacob Louis Hoover, Alessandro Sordoni, Wenyu Du, Timothy J. O'Donnell
Are pairs of words that tend to occur together also likely to stand in a linguistic dependency?
no code implementations • 1 Jan 2021 • Alessandro Sordoni, Nouha Dziri, Hannes Schulz, Geoff Gordon, Remi Tachet des Combes, Philip Bachman
In this paper, we transform each view into a set of subviews and then decompose the original MI bound into a sum of bounds involving conditional MI between the subviews.
no code implementations • NAACL 2021 • Yikang Shen, Shawn Tan, Alessandro Sordoni, Siva Reddy, Aaron Courville
In the present work, we propose a new syntax-aware language model: Syntactic Ordered Memory (SOM).
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Shawn Tan, Yikang Shen, Timothy J. O'Donnell, Alessandro Sordoni, Aaron Courville
We model the recursive production property of context-free grammars for natural and synthetic languages.
1 code implementation • EMNLP 2020 • Tu Vu, Tong Wang, Tsendsuren Munkhdalai, Alessandro Sordoni, Adam Trischler, Andrew Mattarella-Micke, Subhransu Maji, Mohit Iyyer
We also develop task embeddings that can be used to predict the most transferable source tasks for a given target task, and we validate their effectiveness in experiments controlled for source and target data size.
no code implementations • 3 Mar 2020 • Igor Shalyminov, Alessandro Sordoni, Adam Atkinson, Hannes Schulz
Domain adaptation has recently become a key problem in dialogue systems research.
no code implementations • 20 Dec 2019 • Laura Dietz, Bhaskar Mitra, Jeremy Pickens, Hana Anber, Sandeep Avula, Asia Biega, Adrian Boteanu, Shubham Chatterjee, Jeff Dalton, Shiri Dori-Hacohen, John Foley, Henry Feild, Ben Gamari, Rosie Jones, Pallika Kanani, Sumanta Kashyapi, Widad Machmouchi, Matthew Mitsui, Steve Nole, Alexandre Tachard Passos, Jordan Ramsdell, Adam Roegiest, David Smith, Alessandro Sordoni
The vision of HIPstIR is that early stage information retrieval (IR) researchers get together to develop a future for non-mainstream ideas and research agendas in IR.
no code implementations • EACL 2021 • Yadollah Yaghoobzadeh, Soroush Mehri, Remi Tachet, T. J. Hazen, Alessandro Sordoni
Neural NLP models tend to rely on spurious correlations between labels and input features to perform their tasks.
Natural Language Inference Natural Language Understanding +2
1 code implementation • NeurIPS 2019 • Yikang Shen, Shawn Tan, Arian Hosseini, Zhouhan Lin, Alessandro Sordoni, Aaron Courville
Inspired by Ordered Neurons (Shen et al., 2018), we introduce a new attention-based mechanism and use its cumulative probability to control the writing and erasing operation of the memory.
1 code implementation • NeurIPS 2019 • Tsendsuren Munkhdalai, Alessandro Sordoni, Tong Wang, Adam Trischler
We augment recurrent neural networks with an external memory mechanism that builds upon recent progress in metalearning.
3 code implementations • ICLR 2019 • Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, Geoffrey J. Gordon
Inspired by the phenomenon of catastrophic forgetting, we investigate the learning dynamics of neural networks as they train on single classification tasks.
no code implementations • NeurIPS 2018 • Sandeep Subramanian, Sai Rajeswar Mudumba, Alessandro Sordoni, Adam Trischler, Aaron C. Courville, Chris Pal
We generate outlines with an adversarial model trained to approximate the distribution of sentences in a latent space induced by general-purpose sentence encoders.
7 code implementations • ICLR 2019 • Yikang Shen, Shawn Tan, Alessandro Sordoni, Aaron Courville
When a larger constituent ends, all of the smaller constituents that are nested within it must also be closed.
1 code implementation • 11 Jul 2018 • Philip Bachman, Riashat Islam, Alessandro Sordoni, Zafarali Ahmed
We introduce a deep generative model for functions.
2 code implementations • 29 Jun 2018 • Xingdi Yuan, Marc-Alexandre Côté, Alessandro Sordoni, Romain Laroche, Remi Tachet des Combes, Matthew Hausknecht, Adam Trischler
We propose a recurrent RL agent with an episodic exploration mechanism that helps discovering good policies in text-based game environments.
no code implementations • ICML 2018 • Nan Rosemary Ke, Konrad Zolna, Alessandro Sordoni, Zhouhan Lin, Adam Trischler, Yoshua Bengio, Joelle Pineau, Laurent Charlin, Chris Pal
We evaluate this method on several types of tasks with different attributes.
Ranked #3 on Open-Domain Question Answering on SearchQA (Unigram Acc metric)
2 code implementations • ACL 2018 • Yikang Shen, Zhouhan Lin, Athul Paul Jacob, Alessandro Sordoni, Aaron Courville, Yoshua Bengio
In this work, we propose a novel constituency parsing scheme.
3 code implementations • ICML 2018 • Amjad Almahairi, Sai Rajeswar, Alessandro Sordoni, Philip Bachman, Aaron Courville
Learning inter-domain mappings from unpaired data can improve performance in structured prediction tasks, such as image segmentation, by reducing the need for paired data.
1 code implementation • NeurIPS 2017 • Anirudh Goyal, Alessandro Sordoni, Marc-Alexandre Côté, Nan Rosemary Ke, Yoshua Bengio
Stochastic recurrent models have been successful in capturing the variability observed in natural sequential data such as speech.
2 code implementations • ICLR 2018 • Dmitriy Serdyuk, Nan Rosemary Ke, Alessandro Sordoni, Adam Trischler, Chris Pal, Yoshua Bengio
We propose a simple technique for encouraging generative RNNs to plan ahead.
no code implementations • ICML 2017 • Philip Bachman, Alessandro Sordoni, Adam Trischler
We introduce a model that learns active learning algorithms via metalearning.
4 code implementations • WS 2017 • Xingdi Yuan, Tong Wang, Caglar Gulcehre, Alessandro Sordoni, Philip Bachman, Sandeep Subramanian, Saizheng Zhang, Adam Trischler
We propose a recurrent neural model that generates natural-language questions from documents, conditioned on answers.
no code implementations • 8 Dec 2016 • Philip Bachman, Alessandro Sordoni, Adam Trischler
We develop a general problem setting for training and testing the ability of agents to gather information efficiently.
2 code implementations • WS 2017 • Adam Trischler, Tong Wang, Xingdi Yuan, Justin Harris, Alessandro Sordoni, Philip Bachman, Kaheer Suleman
We present NewsQA, a challenging machine comprehension dataset of over 100, 000 human-generated question-answer pairs.
1 code implementation • 7 Jun 2016 • Alessandro Sordoni, Philip Bachman, Adam Trischler, Yoshua Bengio
We propose a novel neural attention architecture to tackle machine comprehension tasks, such as answering Cloze-style queries with respect to a document.
Ranked #3 on Question Answering on Children's Book Test (Accuracy-NE metric)
9 code implementations • 19 May 2016 • Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, Yoshua Bengio
Sequential data often possesses a hierarchical structure with complex dependencies between subsequences, such as found between the utterances in a dialogue.
7 code implementations • 17 Jul 2015 • Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, Joelle Pineau
We investigate the task of building open domain, conversational dialogue systems based on large dialogue corpora using generative models.
4 code implementations • 8 Jul 2015 • Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob G. Simonsen, Jian-Yun Nie
Our novel hierarchical recurrent encoder-decoder architecture allows the model to be sensitive to the order of queries in the context while avoiding data sparsity.
no code implementations • IJCNLP 2015 • Michel Galley, Chris Brockett, Alessandro Sordoni, Yangfeng Ji, Michael Auli, Chris Quirk, Margaret Mitchell, Jianfeng Gao, Bill Dolan
We introduce Discriminative BLEU (deltaBLEU), a novel metric for intrinsic evaluation of generated text in tasks that admit a diverse range of possible outputs.
no code implementations • HLT 2015 • Alessandro Sordoni, Michel Galley, Michael Auli, Chris Brockett, Yangfeng Ji, Margaret Mitchell, Jian-Yun Nie, Jianfeng Gao, Bill Dolan
We present a novel response generation system that can be trained end to end on large quantities of unstructured Twitter conversations.