1 code implementation • EMNLP (sustainlp) 2020 • Ji Xin, Rodrigo Nogueira, YaoLiang Yu, Jimmy Lin
Pre-trained language models such as BERT have shown their effectiveness in various tasks.
1 code implementation • 10 Feb 2022 • Luiz Bonifacio, Hugo Abonizio, Marzieh Fadaee, Rodrigo Nogueira
In this work, we harness the few-shot capabilities of large pretrained language models as synthetic data generators for IR tasks.
1 code implementation • 7 Feb 2022 • Guilherme Moraes Rosa, Ruan Chaves Rodrigues, Roberto de Alencar Lotufo, Rodrigo Nogueira
For that, we participated in the legal case entailment task of COLIEE 2021, in which we use such models with no adaptations to the target domain.
1 code implementation • 14 Jan 2022 • Ramon Pires, Fábio C. de Souza, Guilherme Rosa, Roberto A. Lotufo, Rodrigo Nogueira
A typical information extraction pipeline consists of token- or span-level classification models coupled with a series of pre- and post-processing scripts.
no code implementations • 4 Sep 2021 • Leandro Rodrigues de Souza, Rodrigo Nogueira, Roberto Lotufo
Pretrained multilingual models have become a de facto default approach for zero-shot cross-lingual transfer.
1 code implementation • 31 Aug 2021 • Luiz Henrique Bonifacio, Vitor Jeronymo, Hugo Queiroz Abonizio, Israel Campiotti, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira
In this work, we present mMARCO, a multilingual version of the MS MARCO passage ranking dataset comprising 13 languages that was created using machine translation.
2 code implementations • 14 May 2021 • Guilherme Moraes Rosa, Luiz Henrique Bonifacio, Leandro Rodrigues de Souza, Roberto Lotufo, Rodrigo Nogueira
An effective method for cross-lingual transfer is to fine-tune a bilingual or multilingual model on a supervised dataset in one language and evaluating it on another language in a zero-shot manner.
1 code implementation • 26 Apr 2021 • Guilherme Moraes Rosa, Ruan Chaves Rodrigues, Roberto Lotufo, Rodrigo Nogueira
We describe our single submission to task 1 of COLIEE 2021.
1 code implementation • 25 Feb 2021 • Rodrigo Nogueira, Zhiying Jiang, Jimmy Lin
In this work, we investigate if the surface form of a number has any influence on how sequence-to-sequence language models learn simple arithmetic tasks such as addition and subtraction across a wide range of values.
1 code implementation • 19 Feb 2021 • Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, Rodrigo Nogueira
Pyserini is an easy-to-use Python toolkit that supports replicable IR research by providing effective first-stage retrieval in a multi-stage ranking architecture.
1 code implementation • 14 Jan 2021 • Ronak Pradeep, Rodrigo Nogueira, Jimmy Lin
We propose a design pattern for tackling text ranking problems, dubbed "Expando-Mono-Duo", that has been empirically validated for a number of ad hoc retrieval tasks in different domains.
no code implementations • COLING 2020 • Jheng-Hong Yang, Sheng-Chieh Lin, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin
While internalized {``}implicit knowledge{''} in pretrained transformers has led to fruitful progress in many natural language understanding tasks, how to most effectively elicit such knowledge remains an open question.
no code implementations • EACL (Louhi) 2021 • Ronak Pradeep, Xueguang Ma, Rodrigo Nogueira, Jimmy Lin
This work describes the adaptation of a pretrained sequence-to-sequence model to the task of scientific claim verification in the biomedical domain.
1 code implementation • NAACL 2021 • Jimmy Lin, Rodrigo Nogueira, Andrew Yates
There are two themes that pervade our survey: techniques for handling long documents, beyond typical sentence-by-sentence processing in NLP, and techniques for addressing the tradeoff between effectiveness (i. e., result quality) and efficiency (e. g., query latency, model and index size).
1 code implementation • 19 Sep 2020 • Gabriela Surita, Rodrigo Nogueira, Roberto Lotufo
What are the latent questions on some textual data?
3 code implementations • 20 Aug 2020 • Diedre Carmo, Marcos Piau, Israel Campiotti, Rodrigo Nogueira, Roberto Lotufo
In natural language processing (NLP), there is a need for more resources in Portuguese, since much of the data used in the state-of-the-art research is in other languages.
1 code implementation • WMT (EMNLP) 2020 • Alexandre Lopes, Rodrigo Nogueira, Roberto Lotufo, Helio Pedrini
Despite the widespread adoption of deep learning for machine translation, it is still expensive to develop high-quality translation models.
1 code implementation • EMNLP (sdp) 2020 • Edwin Zhang, Nikhil Gupta, Raphael Tang, Xiao Han, Ronak Pradeep, Kuang Lu, Yue Zhang, Rodrigo Nogueira, Kyunghyun Cho, Hui Fang, Jimmy Lin
We present Covidex, a search engine that exploits the latest neural ranking models to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI.
no code implementations • ACL 2020 • Edwin Zhang, Nikhil Gupta, Rodrigo Nogueira, Kyunghyun Cho, Jimmy Lin
The Neural Covidex is a search engine that exploits the latest neural ranking architectures to provide information access to the COVID-19 Open Research Dataset (CORD-19) curated by the Allen Institute for AI.
no code implementations • 5 May 2020 • Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin
Conversational search plays a vital role in conversational information seeking.
1 code implementation • 23 Apr 2020 • Raphael Tang, Rodrigo Nogueira, Edwin Zhang, Nikhil Gupta, Phuong Cam, Kyunghyun Cho, Jimmy Lin
We present CovidQA, the beginnings of a question answering dataset specifically designed for COVID-19, built by hand from knowledge gathered from Kaggle's COVID-19 Open Research Dataset Challenge.
1 code implementation • 10 Apr 2020 • Edwin Zhang, Nikhil Gupta, Rodrigo Nogueira, Kyunghyun Cho, Jimmy Lin
We present the Neural Covidex, a search engine that exploits the latest neural ranking architectures to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI.
no code implementations • 4 Apr 2020 • Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin
This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs).
no code implementations • 18 Mar 2020 • Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin
We applied the T5 sequence-to-sequence model to tackle the AI2 WinoGrande Challenge by decomposing each example into two input text strings, each containing a hypothesis, and using the probabilities assigned to the "entailment" token as a score of the hypothesis.
2 code implementations • Findings of the Association for Computational Linguistics 2020 • Rodrigo Nogueira, Zhiying Jiang, Jimmy Lin
We investigate this observation further by varying target words to probe the model's use of latent knowledge.
Ranked #1 on
Ad-Hoc Information Retrieval
on TREC Robust04
1 code implementation • 14 Feb 2020 • Paulo Finardi, Israel Campiotti, Gustavo Plensack, Rafael Derradi de Souza, Rodrigo Nogueira, Gustavo Pinheiro, Roberto Lotufo
In this work we propose a novel self-attention mechanism model to address electricity theft detection on an imbalanced realistic dataset that presents a daily electricity consumption provided by State Grid Corporation of China.
no code implementations • 23 Jan 2020 • Rodrigo Nogueira, Zhiying Jiang, Kyunghyun Cho, Jimmy Lin
Citation recommendation systems for the scientific literature, to help authors find papers that should be cited, have the potential to speed up discoveries and uncover new routes for scientific exploration.
no code implementations • 11 Nov 2019 • Benjamin Borschinger, Jordan Boyd-Graber, Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Michelle Chen Huebscher, Wojciech Gajewski, Yannic Kilcher, Rodrigo Nogueira, Lierni Sestorain Saralegu
We investigate a framework for machine reading, inspired by real world information-seeking problems, where a meta question answering system interacts with a black box environment.
2 code implementations • 31 Oct 2019 • Rodrigo Nogueira, Wei Yang, Kyunghyun Cho, Jimmy Lin
The advent of deep neural networks pre-trained via language modeling tasks has spurred a number of successful applications in natural language processing.
1 code implementation • 23 Sep 2019 • Fábio Souza, Rodrigo Nogueira, Roberto Lotufo
Recent advances in language representation using neural networks have made it viable to transfer the learned internal states of a trained model to downstream natural language processing tasks, such as named entity recognition (NER) and question answering.
no code implementations • 16 Aug 2019 • Rodrigo Nogueira
We argue, however, that although this approach has been very successful for tasks such as machine translation, storing the world's knowledge as parameters of a learning machine can be very hard.
4 code implementations • 17 Apr 2019 • Rodrigo Nogueira, Wei Yang, Jimmy Lin, Kyunghyun Cho
One technique to improve the retrieval effectiveness of a search engine is to expand documents with terms that are related or representative of the documents' content. From the perspective of a question answering system, this might comprise questions the document can potentially answer.
Ranked #1 on
Passage Re-Ranking
on MS MARCO
no code implementations • ICLR Workshop drlStructPred 2019 • Rodrigo Nogueira, Jannis Bulian, Massimiliano Ciaramita
We investigate methods to efficiently learn diverse strategies in reinforcement learning for a generative structured prediction problem: query reformulation.
6 code implementations • 13 Jan 2019 • Rodrigo Nogueira, Kyunghyun Cho
Recently, neural models pretrained on a language modeling task, such as ELMo (Peters et al., 2017), OpenAI GPT (Radford et al., 2018), and BERT (Devlin et al., 2018), have achieved impressive results on various natural language processing tasks such as question-answering and natural language inference.
Ranked #2 on
Passage Re-Ranking
on MS MARCO
(using extra training data)
no code implementations • ICLR 2019 • Rodrigo Nogueira, Jannis Bulian, Massimiliano Ciaramita
We propose a method to efficiently learn diverse strategies in reinforcement learning for query reformulation in the tasks of document retrieval and question answering.
no code implementations • 24 Sep 2018 • Kexin Huang, Rodrigo Nogueira
Epistasis (gene-gene interaction) is crucial to predicting genetic disease.
2 code implementations • EMNLP 2017 • Rodrigo Nogueira, Kyunghyun Cho
In this work, we introduce a query reformulation system based on a neural network that rewrites a query to maximize the number of relevant documents returned.
1 code implementation • NeurIPS 2016 • Rodrigo Nogueira, Kyunghyun Cho
We propose a goal-driven web navigation as a benchmark task for evaluating an agent with abilities to understand natural language and plan on partially observed environments.