Search Results for author: Alessandro Sordoni

Found 55 papers, 29 papers with code

Self-training with Few-shot Rationalization

no code implementations EMNLP 2021 Meghana Moorthy Bhat, Alessandro Sordoni, Subhabrata Mukherjee

While pre-trained language models have obtained state-of-the-art performance for several natural language understanding tasks, they are quite opaque in terms of their decision-making process.

Decision Making Natural Language Understanding

Unsupervised Dependency Graph Network

1 code implementation ACL 2022 Yikang Shen, Shawn Tan, Alessandro Sordoni, Peng Li, Jie zhou, Aaron Courville

We introduce a new model, the Unsupervised Dependency Graph Network (UDGN), that can induce dependency structures from raw corpora and the masked language modeling task.

Language Modelling Masked Language Modeling +3

VinePPO: Unlocking RL Potential For LLM Reasoning Through Refined Credit Assignment

1 code implementation2 Oct 2024 Amirhossein Kazemnejad, Milad Aghajohari, Eva Portelance, Alessandro Sordoni, Siva Reddy, Aaron Courville, Nicolas Le Roux

In this work, we systematically evaluate the efficacy of value networks and reveal their significant shortcomings in reasoning-heavy LLM tasks, showing that they barely outperform a random baseline when comparing alternative steps.

GSM8K Math +1

A Survey on Model MoErging: Recycling and Routing Among Specialized Experts for Collaborative Learning

no code implementations13 Aug 2024 Prateek Yadav, Colin Raffel, Mohammed Muqeeth, Lucas Caccia, Haokun Liu, Tianlong Chen, Mohit Bansal, Leshem Choshen, Alessandro Sordoni

The availability of performant pre-trained models has led to a proliferation of fine-tuned expert models that are specialized to a particular domain or task.

Survey

Improving Context-Aware Preference Modeling for Language Models

no code implementations20 Jul 2024 Silviu Pitis, Ziang Xiao, Nicolas Le Roux, Alessandro Sordoni

To address these challenges, we consider the two-step preference modeling procedure that first resolves the under-specification by selecting a context, and then evaluates preference with respect to the chosen context.

Efficient Adversarial Training in LLMs with Continuous Attacks

1 code implementation24 May 2024 Sophie Xhonneux, Alessandro Sordoni, Stephan Günnemann, Gauthier Gidel, Leo Schwinn

We propose a fast adversarial training algorithm (C-AdvUL) composed of two losses: the first makes the model robust on continuous embedding attacks computed on an adversarial behaviour dataset; the second ensures the usefulness of the final model by fine-tuning on utility data.

Towards Modular LLMs by Building and Reusing a Library of LoRAs

1 code implementation18 May 2024 Oleksiy Ostapenko, Zhan Su, Edoardo Maria Ponti, Laurent Charlin, Nicolas Le Roux, Matheus Pereira, Lucas Caccia, Alessandro Sordoni

The growing number of parameter-efficient adaptations of a base large language model (LLM) calls for studying whether we can reuse such trained adapters to improve performance for new tasks.

Language Modelling Large Language Model

V-STaR: Training Verifiers for Self-Taught Reasoners

no code implementations9 Feb 2024 Arian Hosseini, Xingdi Yuan, Nikolay Malkin, Aaron Courville, Alessandro Sordoni, Rishabh Agarwal

Common self-improvement approaches for large language models (LLMs), such as STaR, iteratively fine-tune LLMs on self-generated solutions to improve their problem-solving ability.

Code Generation Math

Guiding Language Model Reasoning with Planning Tokens

no code implementations9 Oct 2023 Xinyi Wang, Lucas Caccia, Oleksiy Ostapenko, Xingdi Yuan, William Yang Wang, Alessandro Sordoni

To encourage a more structural generation of CoT steps, we propose a hierarchical generation scheme: we let the LM generate a planning token at the start of each reasoning step, intuitively serving as a high-level plan of the current step, and add their embeddings to the model parameters.

Language Modelling Math

On the Compositional Generalization Gap of In-Context Learning

no code implementations15 Nov 2022 Arian Hosseini, Ankit Vani, Dzmitry Bahdanau, Alessandro Sordoni, Aaron Courville

In this work, we look at the gap between the in-distribution (ID) and out-of-distribution (OOD) performance of such models in semantic parsing tasks with in-context learning.

In-Context Learning Semantic Parsing

Multi-Head Adapter Routing for Cross-Task Generalization

1 code implementation NeurIPS 2023 Lucas Caccia, Edoardo Ponti, Zhan Su, Matheus Pereira, Nicolas Le Roux, Alessandro Sordoni

We find that routing is most beneficial during multi-task pre-training rather than during few-shot adaptation and propose $\texttt{MHR}$-$\mu$, which discards routing and fine-tunes the average of the pre-trained adapters on each downstream tasks.

parameter-efficient fine-tuning

Using Representation Expressiveness and Learnability to Evaluate Self-Supervised Learning Methods

no code implementations2 Jun 2022 Yuchen Lu, Zhen Liu, Aristide Baratin, Romain Laroche, Aaron Courville, Alessandro Sordoni

We address the problem of evaluating the quality of self-supervised learning (SSL) models without access to supervised labels, while being agnostic to the architecture, learning algorithm or data manipulation used during training.

Domain Generalization Self-Supervised Learning

Evaluating Distributional Distortion in Neural Language Modeling

no code implementations ICLR 2022 Benjamin LeBrun, Alessandro Sordoni, Timothy J. O'Donnell

To address this gap, we develop a controlled evaluation scheme which uses generative models trained on natural data as artificial languages from which we can exactly compute sequence probabilities.

Language Modelling

Better Language Model with Hypernym Class Prediction

1 code implementation ACL 2022 He Bai, Tong Wang, Alessandro Sordoni, Peng Shi

Class-based language models (LMs) have been long devised to address context sparsity in $n$-gram LMs.

Language Modelling

Combining Modular Skills in Multitask Learning

2 code implementations28 Feb 2022 Edoardo M. Ponti, Alessandro Sordoni, Yoshua Bengio, Siva Reddy

By jointly learning these and a task-skill allocation matrix, the network for each task is instantiated as the average of the parameters of active skills.

Instruction Following reinforcement-learning +2

Does Pre-training Induce Systematic Inference? How Masked Language Models Acquire Commonsense Knowledge

no code implementations NAACL 2022 Ian Porada, Alessandro Sordoni, Jackie Chi Kit Cheung

Transformer models pre-trained with a masked-language-modeling objective (e. g., BERT) encode commonsense knowledge as evidenced by behavioral probes; however, the extent to which this knowledge is acquired by systematic inference over the semantics of the pre-training corpora is an open question.

Language Modelling Masked Language Modeling +1

Learning to Dequantise with Truncated Flows

no code implementations ICLR 2022 Shawn Tan, Chin-wei Huang, Alessandro Sordoni, Aaron Courville

Addtionally, since the support of the marginal $q(z)$ is bounded and the support of prior $p(z)$ is not, we propose renormalising the prior distribution over the support of $q(z)$.

Variational Inference

Learnability and Expressiveness in Self-Supervised Learning

no code implementations29 Sep 2021 Yuchen Lu, Zhen Liu, Alessandro Sordoni, Aristide Baratin, Romain Laroche, Aaron Courville

In this work, we argue that representations induced by self-supervised learning (SSL) methods should both be expressive and learnable.

Data Augmentation Self-Supervised Learning

Self-training with Few-shot Rationalization: Teacher Explanations Aid Student in Few-shot NLU

no code implementations17 Sep 2021 Meghana Moorthy Bhat, Alessandro Sordoni, Subhabrata Mukherjee

While pre-trained language models have obtained state-of-the-art performance for several natural language understanding tasks, they are quite opaque in terms of their decision-making process.

Decision Making Natural Language Understanding

The Emergence of the Shape Bias Results from Communicative Efficiency

1 code implementation CoNLL (EMNLP) 2021 Eva Portelance, Michael C. Frank, Dan Jurafsky, Alessandro Sordoni, Romain Laroche

By the age of two, children tend to assume that new word categories are based on objects' shape, rather than their color or texture; this assumption is called the shape bias.

Decomposed Mutual Information Estimation for Contrastive Representation Learning

no code implementations25 Jun 2021 Alessandro Sordoni, Nouha Dziri, Hannes Schulz, Geoff Gordon, Phil Bachman, Remi Tachet

We propose decomposing the full MI estimation problem into a sum of smaller estimation problems by splitting one of the views into progressively more informed subviews and by applying the chain rule on MI between the decomposed views.

Data Augmentation Dialogue Generation +2

Understanding by Understanding Not: Modeling Negation in Language Models

1 code implementation NAACL 2021 Arian Hosseini, Siva Reddy, Dzmitry Bahdanau, R Devon Hjelm, Alessandro Sordoni, Aaron Courville

To improve language models in this regard, we propose to augment the language modeling objective with an unlikelihood objective that is based on negated generic sentences from a raw text corpus.

Language Modelling Negation

Linguistic Dependencies and Statistical Dependence

1 code implementation EMNLP 2021 Jacob Louis Hoover, Alessandro Sordoni, Wenyu Du, Timothy J. O'Donnell

Are pairs of words that tend to occur together also likely to stand in a linguistic dependency?

Decomposing Mutual Information for Representation Learning

no code implementations1 Jan 2021 Alessandro Sordoni, Nouha Dziri, Hannes Schulz, Geoff Gordon, Remi Tachet des Combes, Philip Bachman

In this paper, we transform each view into a set of subviews and then decompose the original MI bound into a sum of bounds involving conditional MI between the subviews.

Dialogue Generation Representation Learning

Exploring and Predicting Transferability across NLP Tasks

1 code implementation EMNLP 2020 Tu Vu, Tong Wang, Tsendsuren Munkhdalai, Alessandro Sordoni, Adam Trischler, Andrew Mattarella-Micke, Subhransu Maji, Mohit Iyyer

We also develop task embeddings that can be used to predict the most transferable source tasks for a given target task, and we validate their effectiveness in experiments controlled for source and target data size.

Language Modelling Part-Of-Speech Tagging +4

Ordered Memory

1 code implementation NeurIPS 2019 Yikang Shen, Shawn Tan, Arian Hosseini, Zhouhan Lin, Alessandro Sordoni, Aaron Courville

Inspired by Ordered Neurons (Shen et al., 2018), we introduce a new attention-based mechanism and use its cumulative probability to control the writing and erasing operation of the memory.

ListOps

Metalearned Neural Memory

1 code implementation NeurIPS 2019 Tsendsuren Munkhdalai, Alessandro Sordoni, Tong Wang, Adam Trischler

We augment recurrent neural networks with an external memory mechanism that builds upon recent progress in metalearning.

Question Answering reinforcement-learning +2

An Empirical Study of Example Forgetting during Deep Neural Network Learning

3 code implementations ICLR 2019 Mariya Toneva, Alessandro Sordoni, Remi Tachet des Combes, Adam Trischler, Yoshua Bengio, Geoffrey J. Gordon

Inspired by the phenomenon of catastrophic forgetting, we investigate the learning dynamics of neural networks as they train on single classification tasks.

General Classification

Towards Text Generation with Adversarially Learned Neural Outlines

no code implementations NeurIPS 2018 Sandeep Subramanian, Sai Rajeswar Mudumba, Alessandro Sordoni, Adam Trischler, Aaron C. Courville, Chris Pal

We generate outlines with an adversarial model trained to approximate the distribution of sentences in a latent space induced by general-purpose sentence encoders.

Sentence Text Generation

Counting to Explore and Generalize in Text-based Games

2 code implementations29 Jun 2018 Xingdi Yuan, Marc-Alexandre Côté, Alessandro Sordoni, Romain Laroche, Remi Tachet des Combes, Matthew Hausknecht, Adam Trischler

We propose a recurrent RL agent with an episodic exploration mechanism that helps discovering good policies in text-based game environments.

text-based games

Augmented CycleGAN: Learning Many-to-Many Mappings from Unpaired Data

3 code implementations ICML 2018 Amjad Almahairi, Sai Rajeswar, Alessandro Sordoni, Philip Bachman, Aaron Courville

Learning inter-domain mappings from unpaired data can improve performance in structured prediction tasks, such as image segmentation, by reducing the need for paired data.

Image Segmentation Semantic Segmentation +1

Towards Information-Seeking Agents

no code implementations8 Dec 2016 Philip Bachman, Alessandro Sordoni, Adam Trischler

We develop a general problem setting for training and testing the ability of agents to gather information efficiently.

reinforcement-learning Reinforcement Learning +1

Iterative Alternating Neural Attention for Machine Reading

1 code implementation7 Jun 2016 Alessandro Sordoni, Philip Bachman, Adam Trischler, Yoshua Bengio

We propose a novel neural attention architecture to tackle machine comprehension tasks, such as answering Cloze-style queries with respect to a document.

Ranked #3 on Question Answering on Children's Book Test (Accuracy-NE metric)

Question Answering Reading Comprehension

A Hierarchical Latent Variable Encoder-Decoder Model for Generating Dialogues

9 code implementations19 May 2016 Iulian Vlad Serban, Alessandro Sordoni, Ryan Lowe, Laurent Charlin, Joelle Pineau, Aaron Courville, Yoshua Bengio

Sequential data often possesses a hierarchical structure with complex dependencies between subsequences, such as found between the utterances in a dialogue.

Decoder Response Generation

Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models

7 code implementations17 Jul 2015 Iulian V. Serban, Alessandro Sordoni, Yoshua Bengio, Aaron Courville, Joelle Pineau

We investigate the task of building open domain, conversational dialogue systems based on large dialogue corpora using generative models.

Decoder Word Embeddings

A Hierarchical Recurrent Encoder-Decoder For Generative Context-Aware Query Suggestion

4 code implementations8 Jul 2015 Alessandro Sordoni, Yoshua Bengio, Hossein Vahabi, Christina Lioma, Jakob G. Simonsen, Jian-Yun Nie

Our novel hierarchical recurrent encoder-decoder architecture allows the model to be sensitive to the order of queries in the context while avoiding data sparsity.

Decoder

deltaBLEU: A Discriminative Metric for Generation Tasks with Intrinsically Diverse Targets

no code implementations IJCNLP 2015 Michel Galley, Chris Brockett, Alessandro Sordoni, Yangfeng Ji, Michael Auli, Chris Quirk, Margaret Mitchell, Jianfeng Gao, Bill Dolan

We introduce Discriminative BLEU (deltaBLEU), a novel metric for intrinsic evaluation of generated text in tasks that admit a diverse range of possible outputs.

Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.