Search Results for author: Rodrigo Nogueira

Found 38 papers, 25 papers with code

InPars: Data Augmentation for Information Retrieval using Large Language Models

1 code implementation10 Feb 2022 Luiz Bonifacio, Hugo Abonizio, Marzieh Fadaee, Rodrigo Nogueira

In this work, we harness the few-shot capabilities of large pretrained language models as synthetic data generators for IR tasks.

Data Augmentation Information Retrieval +2

To Tune or Not To Tune? Zero-shot Models for Legal Case Entailment

1 code implementation7 Feb 2022 Guilherme Moraes Rosa, Ruan Chaves Rodrigues, Roberto de Alencar Lotufo, Rodrigo Nogueira

For that, we participated in the legal case entailment task of COLIEE 2021, in which we use such models with no adaptations to the target domain.

Pretrained Language Models

Sequence-to-Sequence Models for Extracting Information from Registration and Legal Documents

1 code implementation14 Jan 2022 Ramon Pires, Fábio C. de Souza, Guilherme Rosa, Roberto A. Lotufo, Rodrigo Nogueira

A typical information extraction pipeline consists of token- or span-level classification models coupled with a series of pre- and post-processing scripts.

Open Information Extraction Question Answering

mMARCO: A Multilingual Version of the MS MARCO Passage Ranking Dataset

1 code implementation31 Aug 2021 Luiz Henrique Bonifacio, Vitor Jeronymo, Hugo Queiroz Abonizio, Israel Campiotti, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira

In this work, we present mMARCO, a multilingual version of the MS MARCO passage ranking dataset comprising 13 languages that was created using machine translation.

Information Retrieval Machine Translation +3

A cost-benefit analysis of cross-lingual transfer methods

2 code implementations14 May 2021 Guilherme Moraes Rosa, Luiz Henrique Bonifacio, Leandro Rodrigues de Souza, Roberto Lotufo, Rodrigo Nogueira

An effective method for cross-lingual transfer is to fine-tune a bilingual or multilingual model on a supervised dataset in one language and evaluating it on another language in a zero-shot manner.

Cross-Lingual Transfer Translation

Investigating the Limitations of Transformers with Simple Arithmetic Tasks

1 code implementation25 Feb 2021 Rodrigo Nogueira, Zhiying Jiang, Jimmy Lin

In this work, we investigate if the surface form of a number has any influence on how sequence-to-sequence language models learn simple arithmetic tasks such as addition and subtraction across a wide range of values.

Pretrained Language Models

Pyserini: An Easy-to-Use Python Toolkit to Support Replicable IR Research with Sparse and Dense Representations

1 code implementation19 Feb 2021 Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, Rodrigo Nogueira

Pyserini is an easy-to-use Python toolkit that supports replicable IR research by providing effective first-stage retrieval in a multi-stage ranking architecture.

Information Retrieval

The Expando-Mono-Duo Design Pattern for Text Ranking with Pretrained Sequence-to-Sequence Models

1 code implementation14 Jan 2021 Ronak Pradeep, Rodrigo Nogueira, Jimmy Lin

We propose a design pattern for tackling text ranking problems, dubbed "Expando-Mono-Duo", that has been empirically validated for a number of ad hoc retrieval tasks in different domains.

Document Ranking

Designing Templates for Eliciting Commonsense Knowledge from Pretrained Sequence-to-Sequence Models

no code implementations COLING 2020 Jheng-Hong Yang, Sheng-Chieh Lin, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin

While internalized {``}implicit knowledge{''} in pretrained transformers has led to fruitful progress in many natural language understanding tasks, how to most effectively elicit such knowledge remains an open question.

Multiple-choice Natural Language Understanding +1

Scientific Claim Verification with VERT5ERINI

no code implementations EACL (Louhi) 2021 Ronak Pradeep, Xueguang Ma, Rodrigo Nogueira, Jimmy Lin

This work describes the adaptation of a pretrained sequence-to-sequence model to the task of scientific claim verification in the biomedical domain.

Pretrained Transformers for Text Ranking: BERT and Beyond

1 code implementation NAACL 2021 Jimmy Lin, Rodrigo Nogueira, Andrew Yates

There are two themes that pervade our survey: techniques for handling long documents, beyond typical sentence-by-sentence processing in NLP, and techniques for addressing the tradeoff between effectiveness (i. e., result quality) and efficiency (e. g., query latency, model and index size).

Information Retrieval

PTT5: Pretraining and validating the T5 model on Brazilian Portuguese data

3 code implementations20 Aug 2020 Diedre Carmo, Marcos Piau, Israel Campiotti, Rodrigo Nogueira, Roberto Lotufo

In natural language processing (NLP), there is a need for more resources in Portuguese, since much of the data used in the state-of-the-art research is in other languages.

Lite Training Strategies for Portuguese-English and English-Portuguese Translation

1 code implementation WMT (EMNLP) 2020 Alexandre Lopes, Rodrigo Nogueira, Roberto Lotufo, Helio Pedrini

Despite the widespread adoption of deep learning for machine translation, it is still expensive to develop high-quality translation models.

Machine Translation Translation

Covidex: Neural Ranking Models and Keyword Search Infrastructure for the COVID-19 Open Research Dataset

1 code implementation EMNLP (sdp) 2020 Edwin Zhang, Nikhil Gupta, Raphael Tang, Xiao Han, Ronak Pradeep, Kuang Lu, Yue Zhang, Rodrigo Nogueira, Kyunghyun Cho, Hui Fang, Jimmy Lin

We present Covidex, a search engine that exploits the latest neural ranking models to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI.

Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset

no code implementations ACL 2020 Edwin Zhang, Nikhil Gupta, Rodrigo Nogueira, Kyunghyun Cho, Jimmy Lin

The Neural Covidex is a search engine that exploits the latest neural ranking architectures to provide information access to the COVID-19 Open Research Dataset (CORD-19) curated by the Allen Institute for AI.

Decision Making

Rapidly Bootstrapping a Question Answering Dataset for COVID-19

1 code implementation23 Apr 2020 Raphael Tang, Rodrigo Nogueira, Edwin Zhang, Nikhil Gupta, Phuong Cam, Kyunghyun Cho, Jimmy Lin

We present CovidQA, the beginnings of a question answering dataset specifically designed for COVID-19, built by hand from knowledge gathered from Kaggle's COVID-19 Open Research Dataset Challenge.

Question Answering

Rapidly Deploying a Neural Search Engine for the COVID-19 Open Research Dataset: Preliminary Thoughts and Lessons Learned

1 code implementation10 Apr 2020 Edwin Zhang, Nikhil Gupta, Rodrigo Nogueira, Kyunghyun Cho, Jimmy Lin

We present the Neural Covidex, a search engine that exploits the latest neural ranking architectures to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI.

Decision Making

Conversational Question Reformulation via Sequence-to-Sequence Architectures and Pretrained Language Models

no code implementations4 Apr 2020 Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin

This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs).

Pretrained Language Models Task-Oriented Dialogue Systems

TTTTTackling WinoGrande Schemas

no code implementations18 Mar 2020 Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin

We applied the T5 sequence-to-sequence model to tackle the AI2 WinoGrande Challenge by decomposing each example into two input text strings, each containing a hypothesis, and using the probabilities assigned to the "entailment" token as a score of the hypothesis.

Electricity Theft Detection with self-attention

1 code implementation14 Feb 2020 Paulo Finardi, Israel Campiotti, Gustavo Plensack, Rafael Derradi de Souza, Rodrigo Nogueira, Gustavo Pinheiro, Roberto Lotufo

In this work we propose a novel self-attention mechanism model to address electricity theft detection on an imbalanced realistic dataset that presents a daily electricity consumption provided by State Grid Corporation of China.

Navigation-Based Candidate Expansion and Pretrained Language Models for Citation Recommendation

no code implementations23 Jan 2020 Rodrigo Nogueira, Zhiying Jiang, Kyunghyun Cho, Jimmy Lin

Citation recommendation systems for the scientific literature, to help authors find papers that should be cited, have the potential to speed up discoveries and uncover new routes for scientific exploration.

Citation Recommendation Domain Adaptation +3

Meta Answering for Machine Reading

no code implementations11 Nov 2019 Benjamin Borschinger, Jordan Boyd-Graber, Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Michelle Chen Huebscher, Wojciech Gajewski, Yannic Kilcher, Rodrigo Nogueira, Lierni Sestorain Saralegu

We investigate a framework for machine reading, inspired by real world information-seeking problems, where a meta question answering system interacts with a black box environment.

Question Answering Reading Comprehension

Multi-Stage Document Ranking with BERT

2 code implementations31 Oct 2019 Rodrigo Nogueira, Wei Yang, Kyunghyun Cho, Jimmy Lin

The advent of deep neural networks pre-trained via language modeling tasks has spurred a number of successful applications in natural language processing.

Document Ranking Language Modelling

Portuguese Named Entity Recognition using BERT-CRF

1 code implementation23 Sep 2019 Fábio Souza, Rodrigo Nogueira, Roberto Lotufo

Recent advances in language representation using neural networks have made it viable to transfer the learned internal states of a trained model to downstream natural language processing tasks, such as named entity recognition (NER) and question answering.

Named Entity Recognition NER +1

Learning Representations and Agents for Information Retrieval

no code implementations16 Aug 2019 Rodrigo Nogueira

We argue, however, that although this approach has been very successful for tasks such as machine translation, storing the world's knowledge as parameters of a learning machine can be very hard.

Information Retrieval Machine Translation +1

Document Expansion by Query Prediction

4 code implementations17 Apr 2019 Rodrigo Nogueira, Wei Yang, Jimmy Lin, Kyunghyun Cho

One technique to improve the retrieval effectiveness of a search engine is to expand documents with terms that are related or representative of the documents' content. From the perspective of a question answering system, this might comprise questions the document can potentially answer.

Passage Re-Ranking Question Answering +1

Multi-agent query reformulation: Challenges and the role of diversity

no code implementations ICLR Workshop drlStructPred 2019 Rodrigo Nogueira, Jannis Bulian, Massimiliano Ciaramita

We investigate methods to efficiently learn diverse strategies in reinforcement learning for a generative structured prediction problem: query reformulation.

Question Answering reinforcement-learning +1

Passage Re-ranking with BERT

6 code implementations13 Jan 2019 Rodrigo Nogueira, Kyunghyun Cho

Recently, neural models pretrained on a language modeling task, such as ELMo (Peters et al., 2017), OpenAI GPT (Radford et al., 2018), and BERT (Devlin et al., 2018), have achieved impressive results on various natural language processing tasks such as question-answering and natural language inference.

Ranked #2 on Passage Re-Ranking on MS MARCO (using extra training data)

Passage Re-Ranking Passage Retrieval +1

Learning to Coordinate Multiple Reinforcement Learning Agents for Diverse Query Reformulation

no code implementations ICLR 2019 Rodrigo Nogueira, Jannis Bulian, Massimiliano Ciaramita

We propose a method to efficiently learn diverse strategies in reinforcement learning for query reformulation in the tasks of document retrieval and question answering.

Question Answering reinforcement-learning

Task-Oriented Query Reformulation with Reinforcement Learning

2 code implementations EMNLP 2017 Rodrigo Nogueira, Kyunghyun Cho

In this work, we introduce a query reformulation system based on a neural network that rewrites a query to maximize the number of relevant documents returned.

reinforcement-learning

End-to-End Goal-Driven Web Navigation

1 code implementation NeurIPS 2016 Rodrigo Nogueira, Kyunghyun Cho

We propose a goal-driven web navigation as a benchmark task for evaluating an agent with abilities to understand natural language and plan on partially observed environments.

Decision Making Question Answering

Cannot find the paper you are looking for? You can Submit a new open access paper.