Search Results for author: Fabio Petroni

Found 37 papers, 23 papers with code

Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks

9 code implementations • NeurIPS 2020 • Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, Douwe Kiela

Large pre-trained language models have been shown to store factual knowledge in their parameters, and achieve state-of-the-art results when fine-tuned on downstream NLP tasks.

Ranked #4 on Question Answering on WebQuestions

Fact Verification Question Answering +3

125,059

Paper
Code

Language Models as Knowledge Bases?

1 code implementation • IJCNLP 2019 • Fabio Petroni, Tim Rocktäschel, Patrick Lewis, Anton Bakhtin, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel

Recent progress in pretraining language models on large textual corpora led to a surge of improvements for downstream NLP tasks.

Language Modelling Open-Domain Question Answering

1,304

Paper
Code

Scalable Zero-shot Entity Linking with Dense Entity Retrieval

3 code implementations • EMNLP 2020 • Ledell Wu, Fabio Petroni, Martin Josifoski, Sebastian Riedel, Luke Zettlemoyer

This paper introduces a conceptually simple, scalable, and highly effective BERT-based entity linking model, along with an extensive evaluation of its accuracy-speed trade-off.

Entity Embeddings Entity Linking +3

1,132

Paper
Code

KILT: a Benchmark for Knowledge Intensive Language Tasks

3 code implementations • NAACL 2021 • Fabio Petroni, Aleksandra Piktus, Angela Fan, Patrick Lewis, Majid Yazdani, Nicola De Cao, James Thorne, Yacine Jernite, Vladimir Karpukhin, Jean Maillard, Vassilis Plachouras, Tim Rocktäschel, Sebastian Riedel

We test both task-specific and general baselines, evaluating downstream performance in addition to the ability of the models to provide provenance.

Ranked #3 on Entity Linking on KILT: WNED-CWEB

Entity Linking Fact Checking +4

884

Paper
Code

Autoregressive Entity Retrieval

2 code implementations • ICLR 2021 • Nicola De Cao, Gautier Izacard, Sebastian Riedel, Fabio Petroni

For instance, Encyclopedias such as Wikipedia are structured by entities (e. g., one per Wikipedia article).

Ranked #1 on Entity Linking on Derczynski

Entity Disambiguation Entity Linking +3

738

Paper
Code

Multilingual Autoregressive Entity Linking

1 code implementation • 23 Mar 2021 • Nicola De Cao, Ledell Wu, Kashyap Popat, Mikel Artetxe, Naman Goyal, Mikhail Plekhanov, Luke Zettlemoyer, Nicola Cancedda, Sebastian Riedel, Fabio Petroni

Moreover, in a zero-shot setting on languages with no training data at all, mGENRE treats the target language as a latent variable that is marginalized at prediction time.

Ranked #2 on Entity Disambiguation on Mewsli-9 (using extra training data)

Entity Disambiguation Entity Linking

738

Paper
Code

The Web Is Your Oyster -- Knowledge-Intensive NLP against a Very Large Web Corpus

2 code implementations • 18 Dec 2021 • Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Dmytro Okhonko, Samuel Broscheit, Gautier Izacard, Patrick Lewis, Barlas Oğuz, Edouard Grave, Wen-tau Yih, Sebastian Riedel

In order to address increasing demands of real-world applications, the research for knowledge-intensive NLP (KI-NLP) should advance by capturing the challenges of a truly open-domain environment: web-scale knowledge, lack of structure, inconsistent quality and noise.

Common Sense Reasoning Retrieval

551

Paper
Code

A Memory Efficient Baseline for Open Domain Question Answering

1 code implementation • 30 Dec 2020 • Gautier Izacard, Fabio Petroni, Lucas Hosseini, Nicola De Cao, Sebastian Riedel, Edouard Grave

Recently, retrieval systems based on dense representations have led to important improvements in open-domain question answering, and related tasks.

Dimensionality Reduction Open-Domain Question Answering +3

525

Paper
Code

Atlas: Few-shot Learning with Retrieval Augmented Language Models

1 code implementation • 5 Aug 2022 • Gautier Izacard, Patrick Lewis, Maria Lomeli, Lucas Hosseini, Fabio Petroni, Timo Schick, Jane Dwivedi-Yu, Armand Joulin, Sebastian Riedel, Edouard Grave

Retrieval augmented models are known to excel at knowledge intensive tasks without the need for as many parameters, but it is unclear whether they work in few-shot settings.

Ranked #1 on Question Answering on Natural Questions

Fact Checking Few-Shot Learning +6

477

Paper
Code

MiniHack the Planet: A Sandbox for Open-Ended Reinforcement Learning Research

1 code implementation • 27 Sep 2021 • Mikayel Samvelyan, Robert Kirk, Vitaly Kurin, Jack Parker-Holder, Minqi Jiang, Eric Hambro, Fabio Petroni, Heinrich Küttler, Edward Grefenstette, Tim Rocktäschel

By leveraging the full set of entities and environment dynamics from NetHack, one of the richest grid-based video games, MiniHack allows designing custom RL testbeds that are fast and convenient to use.

NetHack reinforcement-learning +2

451

Paper
Code

Autoregressive Search Engines: Generating Substrings as Document Identifiers

2 code implementations • 22 Apr 2022 • Michele Bevilacqua, Giuseppe Ottaviano, Patrick Lewis, Wen-tau Yih, Sebastian Riedel, Fabio Petroni

Knowledge-intensive language tasks require NLP systems to both provide the correct answer and retrieve supporting evidence for it in a given corpus.

Information Retrieval Retrieval

270

Paper
Code

Lost in the Middle: How Language Models Use Long Contexts

4 code implementations • 6 Jul 2023 • Nelson F. Liu, Kevin Lin, John Hewitt, Ashwin Paranjape, Michele Bevilacqua, Fabio Petroni, Percy Liang

While recent language models have the ability to take long contexts as input, relatively little is known about how well they use longer context.

Language Modelling Position +2

261

Paper
Code

Improving Wikipedia Verifiability with AI

1 code implementation • 8 Jul 2022 • Fabio Petroni, Samuel Broscheit, Aleksandra Piktus, Patrick Lewis, Gautier Izacard, Lucas Hosseini, Jane Dwivedi-Yu, Maria Lomeli, Timo Schick, Pierre-Emmanuel Mazaré, Armand Joulin, Edouard Grave, Sebastian Riedel

Hence, maintaining and improving the quality of Wikipedia references is an important challenge and there is a pressing need for better tools to assist humans in this effort.

Citation Recommendation Fact Checking

181

Paper
Code

SAFE: Self-Attentive Function Embeddings for Binary Similarity

3 code implementations • 13 Nov 2018 • Luca Massarelli, Giuseppe Antonio Di Luna, Fabio Petroni, Leonardo Querzoni, Roberto Baldoni

We report the results from a quantitative and qualitative analysis that show how SAFE provides a noticeable performance improvement with respect to previous solutions.

Malware Analysis Vulnerability Detection

164

Paper
Code

EditEval: An Instruction-Based Benchmark for Text Improvements

1 code implementation • 27 Sep 2022 • Jane Dwivedi-Yu, Timo Schick, Zhengbao Jiang, Maria Lomeli, Patrick Lewis, Gautier Izacard, Edouard Grave, Sebastian Riedel, Fabio Petroni

Evaluation of text generation to date has primarily focused on content created sequentially, rather than improvements on a piece of text.

Text Generation

138

Paper
Code

Cutting Down on Prompts and Parameters: Simple Few-Shot Learning with Language Models

2 code implementations • Findings (ACL) 2022 • Robert L. Logan IV, Ivana Balažević, Eric Wallace, Fabio Petroni, Sameer Singh, Sebastian Riedel

Prompting language models (LMs) with training examples and task descriptions has been seen as critical to recent successes in few-shot learning.

Few-Shot Learning Prompt Engineering

106

Paper
Code

GenIE: Generative Information Extraction

1 code implementation • NAACL 2022 • Martin Josifoski, Nicola De Cao, Maxime Peyrard, Fabio Petroni, Robert West

Structured and grounded representation of text is typically formalized by closed information extraction, the problem of extracting an exhaustive set of (subject, relation, object) triplets that are consistent with a predefined set of entities and relations from a knowledge base schema.

Paper
Code

Learning To Recognize Procedural Activities with Distant Supervision

1 code implementation • CVPR 2022 • Xudong Lin, Fabio Petroni, Gedas Bertasius, Marcus Rohrbach, Shih-Fu Chang, Lorenzo Torresani

In this paper we consider the problem of classifying fine-grained, multi-step activities (e. g., cooking different recipes, making disparate home improvements, creating various forms of arts and crafts) from long videos spanning up to several minutes.

Ranked #3 on Video Classification on Breakfast

Action Classification Language Modelling +1

Paper
Code

Entity Tagging: Extracting Entities in Text Without Mention Supervision

1 code implementation • 13 Sep 2022 • Christina Du, Kashyap Popat, Louis Martin, Fabio Petroni

Detection and disambiguation of all entities in text is a crucial task for a wide range of applications.

Entity Linking

Paper
Code

Concept Matching for Low-Resource Classification

1 code implementation • 1 Jun 2020 • Federico Errica, Ludovic Denoyer, Bora Edizel, Fabio Petroni, Vassilis Plachouras, Fabrizio Silvestri, Sebastian Riedel

We propose a model to tackle classification tasks in the presence of very little training data.

General Classification text-classification +1

Paper
Code

EDIN: An End-to-end Benchmark and Pipeline for Unknown Entity Discovery and Indexing

1 code implementation • 25 May 2022 • Nora Kassner, Fabio Petroni, Mikhail Plekhanov, Sebastian Riedel, Nicola Cancedda

This paper created the Unknown Entity Discovery and Indexing (EDIN) benchmark where unknown entities, that is entities without a description in the knowledge base and labeled mentions, have to be integrated into an existing entity linking system.

Entity Linking Novel Concepts +1

Paper
Code

Can discrete information extraction prompts generalize across language models?

1 code implementation • 20 Feb 2023 • Nathanaël Carraz Rakotonirina, Roberto Dessì, Fabio Petroni, Sebastian Riedel, Marco Baroni

We study whether automatically-induced prompts that effectively extract information from a language model can also be used, out-of-the-box, to probe other language models for the same information.

Language Modelling slot-filling +1

Paper
Code

How Decoding Strategies Affect the Verifiability of Generated Text

1 code implementation • Findings of the Association for Computational Linguistics 2020 • Luca Massarelli, Fabio Petroni, Aleksandra Piktus, Myle Ott, Tim Rocktäschel, Vassilis Plachouras, Fabrizio Silvestri, Sebastian Riedel

A generated sentence is verifiable if it can be corroborated or disproved by Wikipedia, and we find that the verifiability of generated text strongly depends on the decoding strategy.

Language Modelling Natural Language Understanding +2

Paper
Code

Unsupervised Features Extraction for Binary Similarity Using Graph Embedding Neural Networks

no code implementations • 23 Oct 2018 • Roberto Baldoni, Giuseppe Antonio Di Luna, Luca Massarelli, Fabio Petroni, Leonardo Querzoni

Furthermore, we report on a qualitative analysis of functions embeddings.

Graph Embedding Malware Analysis +3

Paper
Add Code

attr2vec: Jointly Learning Word and Contextual Attribute Embeddings with Factorization Machines

no code implementations • NAACL 2018 • Fabio Petroni, Vassilis Plachouras, Timothy Nugent, Jochen L. Leidner

Our experimental results on a text classification task demonstrate that using attr2vec to jointly learn embeddings for words and Part-of-Speech (POS) tags improves results compared to learning the embeddings independently.

Attribute Dependency Parsing +6

Paper
Add Code

A Comparison of Two Paraphrase Models for Taxonomy Augmentation

no code implementations • NAACL 2018 • Vassilis Plachouras, Fabio Petroni, Timothy Nugent, Jochen L. Leidner

Our results show that paraphrasing is a viable method to enrich a taxonomy with more terms, and that Moses consistently outperforms the sequence-to-sequence neural model.

Document Classification Machine Translation +3

Paper
Add Code

CORE: Context-Aware Open Relation Extraction with Factorization Machines

no code implementations • EMNLP 2015 • Fabio Petroni, Luciano Del Corro, Rainer Gemulla

Open Information Extraction Relation +1

Paper
Add Code

How Context Affects Language Models' Factual Predictions

no code implementations • AKBC 2020 • Fabio Petroni, Patrick Lewis, Aleksandra Piktus, Tim Rocktäschel, Yuxiang Wu, Alexander H. Miller, Sebastian Riedel

When pre-trained on large unsupervised textual corpora, language models are able to store and retrieve factual knowledge to some extent, making it possible to use them directly for zero-shot cloze-style question answering.

Information Retrieval Language Modelling +4

Paper
Add Code

Video Understanding as Machine Translation

no code implementations • 12 Jun 2020 • Bruno Korbar, Fabio Petroni, Rohit Girdhar, Lorenzo Torresani

With the advent of large-scale multimodal video datasets, especially sequences with audio or transcribed speech, there has been a growing interest in self-supervised learning of video representations.

Machine Translation Metric Learning +6

Paper
Add Code

Automatic Music Production Using Generative Adversarial Networks

no code implementations • 1 Jan 2021 • Giorgio Barnabò, Giovanni Trappolini, Lorenzo Lastilla, Cesare Campagnano, Angela Fan, Fabio Petroni, Fabrizio Silvestri

In this work, we propose a novel framework for $\textit{automatic music arrangement from raw audio in the frequency domain}$.

Image-to-Image Translation Music Generation

Paper
Add Code

Generating Fact Checking Briefs

no code implementations • EMNLP 2020 • Angela Fan, Aleksandra Piktus, Fabio Petroni, Guillaume Wenzek, Marzieh Saeidi, Andreas Vlachos, Antoine Bordes, Sebastian Riedel

Fact checking at scale is difficult -- while the number of active fact checking websites is growing, it remains too small for the needs of the contemporary media ecosystem.

Fact Checking Question Answering

Paper
Add Code

NeurIPS 2020 EfficientQA Competition: Systems, Analyses and Lessons Learned

no code implementations • 1 Jan 2021 • Sewon Min, Jordan Boyd-Graber, Chris Alberti, Danqi Chen, Eunsol Choi, Michael Collins, Kelvin Guu, Hannaneh Hajishirzi, Kenton Lee, Jennimaria Palomaki, Colin Raffel, Adam Roberts, Tom Kwiatkowski, Patrick Lewis, Yuxiang Wu, Heinrich Küttler, Linqing Liu, Pasquale Minervini, Pontus Stenetorp, Sebastian Riedel, Sohee Yang, Minjoon Seo, Gautier Izacard, Fabio Petroni, Lucas Hosseini, Nicola De Cao, Edouard Grave, Ikuya Yamada, Sonse Shimaoka, Masatoshi Suzuki, Shumpei Miyawaki, Shun Sato, Ryo Takahashi, Jun Suzuki, Martin Fajcik, Martin Docekal, Karel Ondrej, Pavel Smrz, Hao Cheng, Yelong Shen, Xiaodong Liu, Pengcheng He, Weizhu Chen, Jianfeng Gao, Barlas Oguz, Xilun Chen, Vladimir Karpukhin, Stan Peshterliev, Dmytro Okhonko, Michael Schlichtkrull, Sonal Gupta, Yashar Mehdad, Wen-tau Yih

We review the EfficientQA competition from NeurIPS 2020.

Open-Domain Question Answering Retrieval

Paper
Add Code

Multi-task Retrieval for Knowledge-Intensive Tasks

no code implementations • ACL 2021 • Jean Maillard, Vladimir Karpukhin, Fabio Petroni, Wen-tau Yih, Barlas Oğuz, Veselin Stoyanov, Gargi Ghosh

Retrieving relevant contexts from a large corpus is a crucial step for tasks such as open-domain question answering and fact checking.

Fact Checking Open-Domain Question Answering +1

Paper
Add Code

CycleDRUMS: Automatic Drum Arrangement For Bass Lines Using CycleGAN

no code implementations • 1 Apr 2021 • Giorgio Barnabò, Giovanni Trappolini, Lorenzo Lastilla, Cesare Campagnano, Angela Fan, Fabio Petroni, Fabrizio Silvestri

The two main research threads in computer-based music generation are: the construction of autonomous music-making systems, and the design of computer-based environments to assist musicians.

Image-to-Image Translation Music Generation +2

Paper
Add Code

Boosted Dense Retriever

no code implementations • NAACL 2022 • Patrick Lewis, Barlas Oğuz, Wenhan Xiong, Fabio Petroni, Wen-tau Yih, Sebastian Riedel

DrBoost is trained in stages: each component model is learned sequentially and specialized by focusing only on retrieval mistakes made by the current ensemble.

Quantization Retrieval

Paper
Add Code

Open Vocabulary Extreme Classification Using Generative Models

no code implementations • Findings (ACL) 2022 • Daniel Simig, Fabio Petroni, Pouya Yanki, Kashyap Popat, Christina Du, Sebastian Riedel, Majid Yazdani

To develop systems that simplify this process, we introduce the task of open vocabulary XMC (OXMC): given a piece of content, predict a set of labels, some of which may be outside of the known tag set.

Classification Extreme Multi-Label Classification +2

Paper
Add Code

PEER: A Collaborative Language Model

no code implementations • 24 Aug 2022 • Timo Schick, Jane Dwivedi-Yu, Zhengbao Jiang, Fabio Petroni, Patrick Lewis, Gautier Izacard, Qingfei You, Christoforos Nalmpantis, Edouard Grave, Sebastian Riedel

Textual content is often the output of a collaborative writing process: We start with an initial draft, ask for suggestions, and repeatedly make changes.

Language Modelling

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.