1 code implementation • EMNLP (sustainlp) 2020 • Ji Xin, Rodrigo Nogueira, YaoLiang Yu, Jimmy Lin
Pre-trained language models such as BERT have shown their effectiveness in various tasks.
1 code implementation • 26 Apr 2023 • Hansi Zeng, Surya Kallumadi, Zaid Alibadi, Rodrigo Nogueira, Hamed Zamani
Developing a universal model that can efficiently and effectively respond to a wide range of information access requests -- from retrieval to recommendation to question answering -- has been a long-lasting goal in the information retrieval community.
no code implementations • 16 Apr 2023 • Ramon Pires, Hugo Abonizio, Thales Sales Almeida, Rodrigo Nogueira
By evaluating on datasets originally conceived in the target language as well as translated ones, we study the contributions of language-specific pretraining in terms of 1) capturing linguistic nuances and structures inherent to the target language, and 2) enriching the model's knowledge about a domain or culture.
no code implementations • 3 Apr 2023 • Jimmy Lin, David Alfonso-Hermelo, Vitor Jeronymo, Ehsan Kamalloo, Carlos Lassance, Rodrigo Nogueira, Odunayo Ogundepo, Mehdi Rezagholizadeh, Nandan Thakur, Jheng-Hong Yang, Xinyu Zhang
The advent of multilingual language models has generated a resurgence of interest in cross-lingual information retrieval (CLIR), which is the task of searching documents in one language with queries from another.
1 code implementation • 29 Mar 2023 • Desnes Nunes, Ricardo Primi, Ramon Pires, Roberto Lotufo, Rodrigo Nogueira
The present study aims to explore the capabilities of Language Models (LMs) in tackling high-stakes multiple-choice tests, represented here by the Exame Nacional do Ensino M\'edio (ENEM), a multidisciplinary entrance examination widely adopted by Brazilian universities.
1 code implementation • 28 Mar 2023 • Vitor Jeronymo, Roberto Lotufo, Rodrigo Nogueira
This paper reports on a study of cross-lingual information retrieval (CLIR) using the mT5-XXL reranker on the NeuCLIR track of TREC 2022.
1 code implementation • 25 Jan 2023 • Fernando Ferraretto, Thiago Laitz, Roberto Lotufo, Rodrigo Nogueira
Recent work has shown that inducing a large language model (LLM) to generate explanations prior to outputting an answer is an effective strategy to improve performance on a wide range of reasoning tasks.
1 code implementation • 4 Jan 2023 • Vitor Jeronymo, Luiz Bonifacio, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Jakub Zavrel, Rodrigo Nogueira
Recently, InPars introduced a method to efficiently use large language models (LLMs) in information retrieval tasks: via few-shot examples, an LLM is induced to generate relevant queries for documents.
1 code implementation • 19 Dec 2022 • Jayr Pereira, Robson Fidalgo, Roberto Lotufo, Rodrigo Nogueira
This paper proposes a question-answering system that can answer questions whose supporting evidence is spread over multiple (potentially long) documents.
1 code implementation • 12 Dec 2022 • Guilherme Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira
We find that the number of parameters and early query-document interactions of cross-encoders play a significant role in the generalization ability of retrieval models.
no code implementations • 26 Oct 2022 • Thales Sales Almeida, Thiago Laitz, João Seródio, Luiz Henrique Bonifacio, Roberto Lotufo, Rodrigo Nogueira
We compare our system with Microsoft's Biomedical Search and show that our design choices led to a much cost-effective system with competitive QPS while having close to state-of-the-art results on a wide range of public benchmarks.
no code implementations • 27 Sep 2022 • Vitor Jeronymo, Mauricio Nascimento, Roberto Lotufo, Rodrigo Nogueira
Robust 2004 is an information retrieval benchmark whose large number of judgments per query make it a reliable evaluation dataset.
1 code implementation • COLING 2022 • Hugo Abonizio, Leandro Rodrigues de Souza, Roberto Lotufo, Rodrigo Nogueira
The zero-shot cross-lingual ability of models pretrained on multilingual and even monolingual corpora has spurred many hypotheses to explain this intriguing empirical result.
1 code implementation • 24 Aug 2022 • Mirelle Bueno, Carlos Gemmell, Jeffrey Dalton, Roberto Lotufo, Rodrigo Nogueira
Our experimental results show that generating step-by-step rationales and introducing marker tokens are both required for effective extrapolation.
no code implementations • 9 Aug 2022 • Vitor Jeronymo, Guilherme Rosa, Surya Kallumadi, Roberto Lotufo, Rodrigo Nogueira
In this work we describe our submission to the product ranking task of the Amazon KDD Cup 2022.
1 code implementation • 6 Jun 2022 • Guilherme Moraes Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira
This has made distilled and dense models, due to latency constraints, the go-to choice for deployment in real-world retrieval applications.
Ranked #1 on
Citation Prediction
on SciDocs (BEIR)
1 code implementation • 30 May 2022 • Guilherme Moraes Rosa, Luiz Bonifacio, Vitor Jeronymo, Hugo Abonizio, Roberto Lotufo, Rodrigo Nogueira
Recent work has shown that language models scaled to billions of parameters, such as GPT-3, perform remarkably well in zero-shot and few-shot scenarios.
1 code implementation • 10 Feb 2022 • Luiz Bonifacio, Hugo Abonizio, Marzieh Fadaee, Rodrigo Nogueira
In this work, we harness the few-shot capabilities of large pretrained language models as synthetic data generators for IR tasks.
1 code implementation • 7 Feb 2022 • Guilherme Moraes Rosa, Ruan Chaves Rodrigues, Roberto de Alencar Lotufo, Rodrigo Nogueira
For that, we participated in the legal case entailment task of COLIEE 2021, in which we use such models with no adaptations to the target domain.
1 code implementation • 14 Jan 2022 • Ramon Pires, Fábio C. de Souza, Guilherme Rosa, Roberto A. Lotufo, Rodrigo Nogueira
A typical information extraction pipeline consists of token- or span-level classification models coupled with a series of pre- and post-processing scripts.
no code implementations • 4 Sep 2021 • Leandro Rodrigues de Souza, Rodrigo Nogueira, Roberto Lotufo
Pretrained multilingual models have become a de facto default approach for zero-shot cross-lingual transfer.
1 code implementation • 31 Aug 2021 • Luiz Bonifacio, Vitor Jeronymo, Hugo Queiroz Abonizio, Israel Campiotti, Marzieh Fadaee, Roberto Lotufo, Rodrigo Nogueira
In this work, we present mMARCO, a multilingual version of the MS MARCO passage ranking dataset comprising 13 languages that was created using machine translation.
2 code implementations • 14 May 2021 • Guilherme Moraes Rosa, Luiz Henrique Bonifacio, Leandro Rodrigues de Souza, Roberto Lotufo, Rodrigo Nogueira
An effective method for cross-lingual transfer is to fine-tune a bilingual or multilingual model on a supervised dataset in one language and evaluating it on another language in a zero-shot manner.
1 code implementation • 26 Apr 2021 • Guilherme Moraes Rosa, Ruan Chaves Rodrigues, Roberto Lotufo, Rodrigo Nogueira
We describe our single submission to task 1 of COLIEE 2021.
1 code implementation • 25 Feb 2021 • Rodrigo Nogueira, Zhiying Jiang, Jimmy Lin
In this work, we investigate if the surface form of a number has any influence on how sequence-to-sequence language models learn simple arithmetic tasks such as addition and subtraction across a wide range of values.
1 code implementation • 19 Feb 2021 • Jimmy Lin, Xueguang Ma, Sheng-Chieh Lin, Jheng-Hong Yang, Ronak Pradeep, Rodrigo Nogueira
Pyserini is an easy-to-use Python toolkit that supports replicable IR research by providing effective first-stage retrieval in a multi-stage ranking architecture.
Cultural Vocal Bursts Intensity Prediction
Information Retrieval
+1
1 code implementation • 14 Jan 2021 • Ronak Pradeep, Rodrigo Nogueira, Jimmy Lin
We propose a design pattern for tackling text ranking problems, dubbed "Expando-Mono-Duo", that has been empirically validated for a number of ad hoc retrieval tasks in different domains.
no code implementations • COLING 2020 • Jheng-Hong Yang, Sheng-Chieh Lin, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin
While internalized {``}implicit knowledge{''} in pretrained transformers has led to fruitful progress in many natural language understanding tasks, how to most effectively elicit such knowledge remains an open question.
no code implementations • EACL (Louhi) 2021 • Ronak Pradeep, Xueguang Ma, Rodrigo Nogueira, Jimmy Lin
This work describes the adaptation of a pretrained sequence-to-sequence model to the task of scientific claim verification in the biomedical domain.
1 code implementation • NAACL 2021 • Jimmy Lin, Rodrigo Nogueira, Andrew Yates
There are two themes that pervade our survey: techniques for handling long documents, beyond typical sentence-by-sentence processing in NLP, and techniques for addressing the tradeoff between effectiveness (i. e., result quality) and efficiency (e. g., query latency, model and index size).
1 code implementation • 19 Sep 2020 • Gabriela Surita, Rodrigo Nogueira, Roberto Lotufo
What are the latent questions on some textual data?
3 code implementations • 20 Aug 2020 • Diedre Carmo, Marcos Piau, Israel Campiotti, Rodrigo Nogueira, Roberto Lotufo
In natural language processing (NLP), there is a need for more resources in Portuguese, since much of the data used in the state-of-the-art research is in other languages.
1 code implementation • WMT (EMNLP) 2020 • Alexandre Lopes, Rodrigo Nogueira, Roberto Lotufo, Helio Pedrini
Despite the widespread adoption of deep learning for machine translation, it is still expensive to develop high-quality translation models.
1 code implementation • EMNLP (sdp) 2020 • Edwin Zhang, Nikhil Gupta, Raphael Tang, Xiao Han, Ronak Pradeep, Kuang Lu, Yue Zhang, Rodrigo Nogueira, Kyunghyun Cho, Hui Fang, Jimmy Lin
We present Covidex, a search engine that exploits the latest neural ranking models to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI.
no code implementations • ACL 2020 • Edwin Zhang, Nikhil Gupta, Rodrigo Nogueira, Kyunghyun Cho, Jimmy Lin
The Neural Covidex is a search engine that exploits the latest neural ranking architectures to provide information access to the COVID-19 Open Research Dataset (CORD-19) curated by the Allen Institute for AI.
no code implementations • 5 May 2020 • Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin
Conversational search plays a vital role in conversational information seeking.
1 code implementation • 23 Apr 2020 • Raphael Tang, Rodrigo Nogueira, Edwin Zhang, Nikhil Gupta, Phuong Cam, Kyunghyun Cho, Jimmy Lin
We present CovidQA, the beginnings of a question answering dataset specifically designed for COVID-19, built by hand from knowledge gathered from Kaggle's COVID-19 Open Research Dataset Challenge.
1 code implementation • 10 Apr 2020 • Edwin Zhang, Nikhil Gupta, Rodrigo Nogueira, Kyunghyun Cho, Jimmy Lin
We present the Neural Covidex, a search engine that exploits the latest neural ranking architectures to provide information access to the COVID-19 Open Research Dataset curated by the Allen Institute for AI.
no code implementations • 4 Apr 2020 • Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin
This paper presents an empirical study of conversational question reformulation (CQR) with sequence-to-sequence architectures and pretrained language models (PLMs).
no code implementations • 18 Mar 2020 • Sheng-Chieh Lin, Jheng-Hong Yang, Rodrigo Nogueira, Ming-Feng Tsai, Chuan-Ju Wang, Jimmy Lin
We applied the T5 sequence-to-sequence model to tackle the AI2 WinoGrande Challenge by decomposing each example into two input text strings, each containing a hypothesis, and using the probabilities assigned to the "entailment" token as a score of the hypothesis.
2 code implementations • Findings of the Association for Computational Linguistics 2020 • Rodrigo Nogueira, Zhiying Jiang, Jimmy Lin
We investigate this observation further by varying target words to probe the model's use of latent knowledge.
Ranked #1 on
Ad-Hoc Information Retrieval
on TREC Robust04
1 code implementation • 14 Feb 2020 • Paulo Finardi, Israel Campiotti, Gustavo Plensack, Rafael Derradi de Souza, Rodrigo Nogueira, Gustavo Pinheiro, Roberto Lotufo
In this work we propose a novel self-attention mechanism model to address electricity theft detection on an imbalanced realistic dataset that presents a daily electricity consumption provided by State Grid Corporation of China.
no code implementations • 23 Jan 2020 • Rodrigo Nogueira, Zhiying Jiang, Kyunghyun Cho, Jimmy Lin
Citation recommendation systems for the scientific literature, to help authors find papers that should be cited, have the potential to speed up discoveries and uncover new routes for scientific exploration.
no code implementations • 11 Nov 2019 • Benjamin Borschinger, Jordan Boyd-Graber, Christian Buck, Jannis Bulian, Massimiliano Ciaramita, Michelle Chen Huebscher, Wojciech Gajewski, Yannic Kilcher, Rodrigo Nogueira, Lierni Sestorain Saralegu
We investigate a framework for machine reading, inspired by real world information-seeking problems, where a meta question answering system interacts with a black box environment.
2 code implementations • 31 Oct 2019 • Rodrigo Nogueira, Wei Yang, Kyunghyun Cho, Jimmy Lin
The advent of deep neural networks pre-trained via language modeling tasks has spurred a number of successful applications in natural language processing.
1 code implementation • 23 Sep 2019 • Fábio Souza, Rodrigo Nogueira, Roberto Lotufo
Recent advances in language representation using neural networks have made it viable to transfer the learned internal states of a trained model to downstream natural language processing tasks, such as named entity recognition (NER) and question answering.
no code implementations • 16 Aug 2019 • Rodrigo Nogueira
We argue, however, that although this approach has been very successful for tasks such as machine translation, storing the world's knowledge as parameters of a learning machine can be very hard.
4 code implementations • 17 Apr 2019 • Rodrigo Nogueira, Wei Yang, Jimmy Lin, Kyunghyun Cho
One technique to improve the retrieval effectiveness of a search engine is to expand documents with terms that are related or representative of the documents' content. From the perspective of a question answering system, this might comprise questions the document can potentially answer.
Ranked #1 on
Passage Re-Ranking
on TREC-PM
no code implementations • ICLR Workshop drlStructPred 2019 • Rodrigo Nogueira, Jannis Bulian, Massimiliano Ciaramita
We investigate methods to efficiently learn diverse strategies in reinforcement learning for a generative structured prediction problem: query reformulation.
6 code implementations • 13 Jan 2019 • Rodrigo Nogueira, Kyunghyun Cho
Recently, neural models pretrained on a language modeling task, such as ELMo (Peters et al., 2017), OpenAI GPT (Radford et al., 2018), and BERT (Devlin et al., 2018), have achieved impressive results on various natural language processing tasks such as question-answering and natural language inference.
Ranked #3 on
Passage Re-Ranking
on MS MARCO
(using extra training data)
no code implementations • ICLR 2019 • Rodrigo Nogueira, Jannis Bulian, Massimiliano Ciaramita
We propose a method to efficiently learn diverse strategies in reinforcement learning for query reformulation in the tasks of document retrieval and question answering.
no code implementations • 24 Sep 2018 • Kexin Huang, Rodrigo Nogueira
Epistasis (gene-gene interaction) is crucial to predicting genetic disease.
2 code implementations • EMNLP 2017 • Rodrigo Nogueira, Kyunghyun Cho
In this work, we introduce a query reformulation system based on a neural network that rewrites a query to maximize the number of relevant documents returned.
1 code implementation • NeurIPS 2016 • Rodrigo Nogueira, Kyunghyun Cho
We propose a goal-driven web navigation as a benchmark task for evaluating an agent with abilities to understand natural language and plan on partially observed environments.