Search Results for author: Hady Elsahar

Machine learning is shifting towards general-purpose pretrained generative models, trained in a self-supervised manner on large amounts of data, which can then be applied to solve a large number of tasks.

Abstractive Text Summarization Code Generation

116

Paper
Code

On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting

2 code implementations • 1 Jun 2022 • Tomasz Korbak, Hady Elsahar, Germán Kruszewski, Marc Dymetman

Here we explore the theoretical connections between the two paradigms, and show that methods such as KL-control developed for RM can also be construed as belonging to DM.

Language Modelling Reinforcement Learning (RL) +1

116

Paper
Code

Zero-Shot Question Generation from Knowledge Graphs for Unseen Predicates and Entity Types

1 code implementation • NAACL 2018 • Hady Elsahar, Christophe Gravier, Frederique Laforest

We present a neural model for question generation from knowledge base triples in a "Zero-Shot" setup, that is generating questions for triples containing predicates, subject types or object types that were not seen at training time.

Ranked #14 on Zero-shot Text Search on BEIR

Knowledge Graphs Question Generation +2

Paper
Code

Energy-Based Models for Code Generation under Compilability Constraints

1 code implementation • 9 Jun 2021 • Tomasz Korbak, Hady Elsahar, Marc Dymetman, Germán Kruszewski

Neural language models can be successfully trained on source code, leading to applications such as code completion.

Code Completion Code Generation

Paper
Code

T-REx: A Large Scale Alignment of Natural Language with Knowledge Base Triples

1 code implementation • LREC 2018 • Hady Elsahar, Pavlos Vougiouklis, Arslen Remaci, Christophe Gravier, Jonathon Hare, Frederique Laforest, Elena Simperl

Entity Linking Knowledge Base Population +3

Paper
Code

Unsupervised Open Relation Extraction

1 code implementation • 22 Jan 2018 • Hady Elsahar, Elena Demidova, Simon Gottschalk, Christophe Gravier, Frederique Laforest

We explore methods to extract relations between named entities from free text in an unsupervised setting.

Clustering Relation +2

Paper
Code

Neural Wikipedian: Generating Textual Summaries from Knowledge Base Triples

1 code implementation • 1 Nov 2017 • Pavlos Vougiouklis, Hady Elsahar, Lucie-Aimée Kaffee, Christoph Gravier, Frederique Laforest, Jonathon Hare, Elena Simperl

We explore the problem of generating natural language summaries for Semantic Web data.

Paper
Code

Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata

1 code implementation • NAACL 2018 • Lucie-Aimée Kaffee, Hady Elsahar, Pavlos Vougiouklis, Christophe Gravier, Frédérique Laforest, Jonathon Hare, Elena Simperl

While Wikipedia exists in 287 languages, its content is unevenly distributed among them.

Sentence

Paper
Code

High Recall Open IE for Relation Discovery

no code implementations • IJCNLP 2017 • Hady Elsahar, Christophe Gravier, Frederique Laforest

Relation Discovery discovers predicates (relation types) from a text corpus relying on the co-occurrence of two named entities in the same sentence.

Open Information Extraction Relation +3

Paper
Add Code

Unsupervised Aspect-Based Multi-Document Abstractive Summarization

no code implementations • WS 2019 • Maximin Coavoux, Hady Elsahar, Matthias Gall{\'e}

User-generated reviews of products or services provide valuable information to customers.

Abstractive Text Summarization Clustering +4

Paper
Add Code

To Annotate or Not? Predicting Performance Drop under Domain Shift

no code implementations • IJCNLP 2019 • Hady Elsahar, Matthias Gall{\'e}

In this paper, we study the problem of predicting the performance drop of modern NLP models under domain-shift, in the absence of any target domain labels.

General Classification POS +3

Paper
Add Code

Self-Supervised and Controlled Multi-Document Opinion Summarization

no code implementations • EACL 2021 • Hady Elsahar, Maximin Coavoux, Matthias Gallé, Jos Rozen

We address the problem of unsupervised abstractive summarization of collections of user generated reviews with self-supervision and control.

Abstractive Text Summarization Opinion Summarization

Paper
Add Code

Symbol-Shift Equivariant Neural Networks

no code implementations • 1 Jan 2021 • David Salinas, Hady Elsahar

Neural networks have been shown to have poor compositionality abilities: while they can produce sophisticated output given sufficient data, they perform patchy generalization and fail to generalize to new symbols (e. g. switching a name in a sentence by a less frequent one or one not seen yet).

Question Answering Sentence +1

Paper
Add Code

References in Wikipedia: The Editors' Perspective

no code implementations • 24 Feb 2021 • Lucie-Aimée Kaffee, Hady Elsahar

References are an essential part of Wikipedia.

Paper
Add Code

On Reward Maximization and Distribution Matching for Fine-Tuning Language Models

no code implementations • 29 Sep 2021 • Tomasz Korbak, Hady Elsahar, Germán Kruszewski, Marc Dymetman

The availability of large pre-trained models is changing the landscape of Machine Learning research and practice, moving from a "training from scratch" to a "fine-tuning'' paradigm.

Language Modelling Reinforcement Learning (RL) +1

Paper
Add Code

Unsupervised and Distributional Detection of Machine-Generated Text

no code implementations • 4 Nov 2021 • Matthias Gallé, Jos Rozen, Germán Kruszewski, Hady Elsahar

We propose a method to detect those machine-generated documents leveraging repeated higher-order n-grams, which we show over-appear in machine-generated text as compared to human ones.

Text Generation

Paper
Add Code

Sampling from Discrete Energy-Based Models with Quality/Efficiency Trade-offs

no code implementations • 10 Dec 2021 • Bryan Eikema, Germán Kruszewski, Hady Elsahar, Marc Dymetman

We show that we can sample from such EBMs with arbitrary precision at the cost of sampling efficiency.

Paraphrase Generation

Paper
Add Code

Documenting Geographically and Contextually Diverse Data Sources: The BigScience Catalogue of Language Data and Resources

no code implementations • 25 Jan 2022 • Angelina McMillan-Major, Zaid Alyafeai, Stella Biderman, Kimbo Chen, Francesco De Toni, Gérard Dupont, Hady Elsahar, Chris Emezue, Alham Fikri Aji, Suzana Ilić, Nurulaqilla Khamis, Colin Leong, Maraim Masoud, Aitor Soroa, Pedro Ortiz Suarez, Zeerak Talat, Daniel van Strien, Yacine Jernite

In recent years, large-scale data collection efforts have prioritized the amount of data collected in order to improve the modeling capabilities of large language models.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.