no code implementations • EMNLP (sustainlp) 2020 • Parul Awasthy, Bishwaranjan Bhattacharjee, John Kender, Radu Florian
Transfer learning is a popular technique to learn a task using less training data and fewer compute resources.
no code implementations • ACL (CASE) 2021 • Ken Barker, Parul Awasthy, Jian Ni, Radu Florian
The NLI reranker uses a textual representation of target types that allows it to score the strength with which a type is implied by a text, without requiring training data for the types.
no code implementations • ACL (CASE) 2021 • Parul Awasthy, Jian Ni, Ken Barker, Radu Florian
In this paper, we present the event detection models and systems we have developed for Multilingual Protest News Detection - Shared Task 1 at CASE 2021.
no code implementations • 27 Feb 2025 • Parul Awasthy, Aashka Trivedi, Yulong Li, Mihaela Bornea, David Cox, Abraham Daniels, Martin Franz, Gabe Goodhart, Bhavani Iyer, Vishwajeet Kumar, Luis Lastras, Scott McCarley, Rudra Murthy, Vignesh P, Sara Rosenthal, Salim Roukos, Jaydeep Sen, Sukriti Sharma, Avirup Sil, Kate Soule, Arafat Sultan, Radu Florian
We introduce the Granite Embedding models, a family of encoder-based embedding models designed for retrieval tasks, spanning dense-retrieval and sparse retrieval architectures, with both English and Multilingual capabilities.
1 code implementation • 17 Sep 2024 • Young-suk Lee, Chulaka Gunasekara, Danish Contractor, Ramón Fernandez Astudillo, Radu Florian
We introduce a technique for multi-document grounded multi-turn synthetic dialog generation that incorporates three main ideas.
no code implementations • 17 Jun 2024 • Jasper Xian, Saron Samuel, Faraz Khoubsirat, Ronak Pradeep, Md Arafat Sultan, Radu Florian, Salim Roukos, Avirup Sil, Christopher Potts, Omar Khattab
We develop a method for training small-scale (under 100M parameter) neural information retrieval models with as few as 10 gold relevance labels.
1 code implementation • 26 Apr 2024 • Teresa Lynn, Malik H. Altakrori, Samar Mohamed Magdy, Rocktim Jyoti Das, Chenyang Lyu, Mohamed Nasr, Younes Samih, Kirill Chirkunov, Alham Fikri Aji, Preslav Nakov, Shantanu Godbole, Salim Roukos, Radu Florian, Nizar Habash
The rapid evolution of Natural Language Processing (NLP) has favoured major languages such as English, leaving a significant gap for many others due to limited resources.
1 code implementation • 2 Apr 2024 • Sara Rosenthal, Avirup Sil, Radu Florian, Salim Roukos
We present ClapNQ, a benchmark Long-form Question Answering dataset for the full RAG pipeline.
no code implementations • 27 Feb 2024 • Keshav Ramji, Young-suk Lee, Ramón Fernandez Astudillo, Md Arafat Sultan, Tahira Naseem, Asim Munawar, Radu Florian, Salim Roukos
It is often desirable for Large Language Models (LLMs) to capture multiple objectives when providing a response.
1 code implementation • 21 Oct 2023 • Young-suk Lee, Md Arafat Sultan, Yousef El-Kurdi, Tahira Naseem Asim Munawar, Radu Florian, Salim Roukos, Ramón Fernandez Astudillo
Using in-context learning (ICL) for data generation, techniques such as Self-Instruct (Wang et al., 2023) or the follow-up Alpaca (Taori et al., 2023) can train strong conversational agents with only a small amount of human supervision.
no code implementations • 26 May 2023 • Sadhana Kumaravel, Tahira Naseem, Ramon Fernandez Astudillo, Radu Florian, Salim Roukos
We evaluate our oracle and parser using the Abstract Meaning Representation (AMR) parsing 3. 0 corpus.
no code implementations • 24 Apr 2023 • Young-suk Lee, Ramón Fernandez Astudillo, Radu Florian, Tahira Naseem, Salim Roukos
Instruction fine-tuned language models on a collection of instruction annotated datasets (FLAN) have shown highly effective to improve model performance and generalization to unseen tasks.
1 code implementation • 1 Mar 2023 • Jon Saad-Falcon, Omar Khattab, Keshav Santhanam, Radu Florian, Martin Franz, Salim Roukos, Avirup Sil, Md Arafat Sultan, Christopher Potts
Many information retrieval tasks require large labeled datasets for fine-tuning.
1 code implementation • 23 Jan 2023 • Avirup Sil, Jaydeep Sen, Bhavani Iyer, Martin Franz, Kshitij Fadnis, Mihaela Bornea, Sara Rosenthal, Scott McCarley, Rong Zhang, Vishwajeet Kumar, Yulong Li, Md Arafat Sultan, Riyaz Bhat, Radu Florian, Salim Roukos
The field of Question Answering (QA) has made remarkable progress in recent years, thanks to the advent of large pre-trained language models, newer realistic benchmark datasets with leaderboards, and novel algorithms for key components such as retrievers and readers.
no code implementations • 2 Dec 2022 • Keshav Santhanam, Jon Saad-Falcon, Martin Franz, Omar Khattab, Avirup Sil, Radu Florian, Md Arafat Sultan, Salim Roukos, Matei Zaharia, Christopher Potts
Neural information retrieval (IR) systems have progressed rapidly in recent years, in large part due to the release of publicly available benchmarking tasks.
no code implementations • 16 Jun 2022 • Scott McCarley, Mihaela Bornea, Sara Rosenthal, Anthony Ferritto, Md Arafat Sultan, Avirup Sil, Radu Florian
Recent machine reading comprehension datasets include extractive and boolean questions but current approaches do not offer integrated support for answering both question types.
no code implementations • 15 May 2022 • Md Arafat Sultan, Avirup Sil, Radu Florian
Machine learning models are prone to overfitting their training (source) domains, which is commonly believed to be the reason why they falter in novel target domains.
2 code implementations • NAACL 2022 • Andrew Drozdov, Jiawei Zhou, Radu Florian, Andrew McCallum, Tahira Naseem, Yoon Kim, Ramon Fernandez Astudillo
These alignments are learned separately from parser training and require a complex pipeline of rule-based components, pre-processing, and post-processing to satisfy domain-specific constraints.
no code implementations • 20 Apr 2022 • Revanth Gangi Reddy, Bhavani Iyer, Md Arafat Sultan, Rong Zhang, Avirup Sil, Vittorio Castelli, Radu Florian, Salim Roukos
Neural passage retrieval is a new and promising approach in open retrieval question answering.
no code implementations • 26 Feb 2022 • Jian Ni, Gaetano Rossiello, Alfio Gliozzo, Radu Florian
Relation extraction (RE) is an important information extraction task which provides essential information to many NLP applications such as knowledge base population and question answering.
1 code implementation • NAACL 2022 • Tahira Naseem, Austin Blodgett, Sadhana Kumaravel, Tim O'Gorman, Young-suk Lee, Jeffrey Flanigan, Ramón Fernandez Astudillo, Radu Florian, Salim Roukos, Nathan Schneider
Despite extensive research on parsing of English sentences into Abstraction Meaning Representation (AMR) graphs, which are compared to gold graphs via the Smatch metric, full-document parsing into a unified graph representation lacks well-defined representation and evaluation.
no code implementations • 15 Dec 2021 • Mihaela Bornea, Ramon Fernandez Astudillo, Tahira Naseem, Nandana Mihindukulasooriya, Ibrahim Abdelaziz, Pavan Kapanipathi, Radu Florian, Salim Roukos
We propose a transition-based system to transpile Abstract Meaning Representation (AMR) into SPARQL for Knowledge Base Question Answering (KBQA).
Abstract Meaning Representation
Knowledge Base Question Answering
+1
no code implementations • 14 Dec 2021 • Sara Rosenthal, Mihaela Bornea, Avirup Sil, Radu Florian, Scott McCarley
Existing datasets that contain boolean questions, such as BoolQ and TYDI QA , provide the user with a YES/NO response to the question.
3 code implementations • NAACL 2022 • Young-suk Lee, Ramon Fernandez Astudillo, Thanh Lam Hoang, Tahira Naseem, Radu Florian, Salim Roukos
AMR parsing has experienced an unprecendented increase in performance in the last three years, due to a mixture of effects including architecture improvements and transfer learning.
Ranked #1 on
AMR Parsing
on LDC2020T02
(using extra training data)
1 code implementation • EMNLP 2021 • Jiawei Zhou, Tahira Naseem, Ramón Fernandez Astudillo, Young-suk Lee, Radu Florian, Salim Roukos
We provide a detailed comparison with recent progress in AMR parsing and show that the proposed parser retains the desirable properties of previous transition-based approaches, while being simpler and reaching the new parsing state of the art for AMR 2. 0, without the need for graph re-categorization.
Ranked #9 on
AMR Parsing
on LDC2017T10
(using extra training data)
no code implementations • ACL 2021 • Haoyang Wen, Anthony Ferritto, Heng Ji, Radu Florian, Avirup Sil
Existing models on Machine Reading Comprehension (MRC) require complex model architecture for effectively modeling long texts with paragraph representation and classification, thereby making inference computationally inefficient for production use.
1 code implementation • NAACL 2021 • Jiawei Zhou, Tahira Naseem, Ramón Fernandez Astudillo, Radu Florian
In this work, we propose a transition-based system that combines hard-attention over sentences with a target-side action pointer mechanism to decouple source tokens from node representations and address alignments.
Ranked #1 on
AMR Parsing
on LDC2014T12
no code implementations • EACL 2021 • Janaki Sheth, Young-suk Lee, Ramon Fernandez Astudillo, Tahira Naseem, Radu Florian, Salim Roukos, Todd Ward
We develop high performance multilingualAbstract Meaning Representation (AMR) sys-tems by projecting English AMR annotationsto other languages with weak supervision.
no code implementations • 10 Dec 2020 • Mihaela Bornea, Lin Pan, Sara Rosenthal, Radu Florian, Avirup Sil
Prior work on multilingual question answering has mostly focused on using large multilingual pre-trained language models (LM) to perform zero-shot language-wise learning: train a QA model on English and test on other languages.
no code implementations • 2 Dec 2020 • Revanth Gangi Reddy, Bhavani Iyer, Md Arafat Sultan, Rong Zhang, Avi Sil, Vittorio Castelli, Radu Florian, Salim Roukos
End-to-end question answering (QA) requires both information retrieval (IR) over a large document collection and machine reading comprehension (MRC) on the retrieved passages.
no code implementations • COLING 2020 • Anthony Ferritto, Sara Rosenthal, Mihaela Bornea, Kazi Hasan, Rishav Chakravarti, Salim Roukos, Radu Florian, Avi Sil
We also show how M-GAAMA can be used in downstream tasks by incorporating it into an END-TO-END-QA system using CFO (Chakravarti et al., 2019).
no code implementations • COLING 2020 • Rishav Chakravarti, Anthony Ferritto, Bhavani Iyer, Lin Pan, Radu Florian, Salim Roukos, Avi Sil
Building on top of the powerful BERTQA model, GAAMA provides a ∼2. 0{\%} absolute boost in F1 over the industry-scale state-of-the-art (SOTA) system on NQ.
no code implementations • COLING 2020 • Yousef El-Kurdi, Hiroshi Kanayama, Efsun Sarioglu Kayi, Vittorio Castelli, Todd Ward, Radu Florian
We present scalable Universal Dependency (UD) treebank synthesis techniques that exploit advances in language representation modeling which leverage vast amounts of unlabeled general-purpose multilingual text.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Young-suk Lee, Ramon Fernandez Astudillo, Tahira Naseem, Revanth Gangi Reddy, Radu Florian, Salim Roukos
Abstract Meaning Representation (AMR) parsing has experienced a notable growth in performance in the last two years, due both to the impact of transfer learning and the development of novel architectures specific to AMR.
Ranked #2 on
AMR Parsing
on LDC2014T12
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Ramon Fernandez Astudillo, Miguel Ballesteros, Tahira Naseem, Austin Blodgett, Radu Florian
Modeling the parser state is key to good performance in transition-based parsing.
Ranked #19 on
AMR Parsing
on LDC2017T10
no code implementations • 16 Oct 2020 • Jian Ni, Taesun Moon, Parul Awasthy, Radu Florian
Relation extraction (RE) is one of the most important tasks in information extraction, as it provides essential information for many NLP applications.
no code implementations • EMNLP 2020 • Rong Zhang, Revanth Gangi Reddy, Md Arafat Sultan, Vittorio Castelli, Anthony Ferritto, Radu Florian, Efsun Sarioglu Kayi, Salim Roukos, Avirup Sil, Todd Ward
Transfer learning techniques are particularly useful in NLP tasks where a sizable amount of high-quality annotated data is difficult to obtain.
no code implementations • EMNLP 2020 • Anthony Ferritto, Lin Pan, Rishav Chakravarti, Salim Roukos, Radu Florian, J. William Murdock, Avi Sil
We introduce ARES (A Reading Comprehension Ensembling Service): a novel Machine Reading Comprehension (MRC) demonstration system which utilizes an ensemble of models to increase F1 by 2. 3 points.
no code implementations • 15 Sep 2020 • Parul Awasthy, Tahira Naseem, Jian Ni, Taesun Moon, Radu Florian
The task of event detection and classification is central to most information retrieval applications.
no code implementations • 15 Sep 2020 • Parul Awasthy, Taesun Moon, Jian Ni, Radu Florian
Named Entity Recognition (NER) is an essential precursor task for many natural language applications, such as relation extraction or event extraction.
1 code implementation • ACL 2020 • Manuel Mager, Ramon Fernandez Astudillo, Tahira Naseem, Md. Arafat Sultan, Young-suk Lee, Radu Florian, Salim Roukos
Meaning Representations (AMRs) are broad-coverage sentence-level semantic graphs.
Ranked #10 on
AMR-to-Text Generation
on LDC2017T10
no code implementations • 19 Nov 2019 • Taesun Moon, Parul Awasthy, Jian Ni, Radu Florian
In this paper we investigate a single Named Entity Recognition model, based on a multilingual BERT, that is trained jointly on many languages simultaneously, and is able to decode these languages with better accuracy than models trained only on one language.
Ranked #1 on
Cross-Lingual NER
on CoNLL Dutch
2 code implementations • ACL 2020 • Vittorio Castelli, Rishav Chakravarti, Saswati Dana, Anthony Ferritto, Radu Florian, Martin Franz, Dinesh Garg, Dinesh Khandelwal, Scott McCarley, Mike McCawley, Mohamed Nasr, Lin Pan, Cezar Pendus, John Pitrelli, Saurabh Pujar, Salim Roukos, Andrzej Sakrajda, Avirup Sil, Rosario Uceda-Sosa, Todd Ward, Rong Zhang
We introduce TechQA, a domain-adaptation question answering dataset for the technical support domain.
no code implementations • IJCNLP 2019 • Jian Ni, Radu Florian
Relation extraction (RE) seeks to detect and classify semantic relationships between entities, which provides useful information for many NLP applications.
no code implementations • 30 Oct 2019 • Anthony Ferritto, Lin Pan, Rishav Chakravarti, Salim Roukos, Radu Florian, J. William Murdock, Avirup Sil
Many of the top question answering systems today utilize ensembling to improve their performance on tasks such as the Stanford Question Answering Dataset (SQuAD) and Natural Questions (NQ) challenges.
no code implementations • 11 Sep 2019 • Lin Pan, Rishav Chakravarti, Anthony Ferritto, Michael Glass, Alfio Gliozzo, Salim Roukos, Radu Florian, Avirup Sil
Existing literature on Question Answering (QA) mostly focuses on algorithmic novelty, data augmentation, or increasingly large pre-trained language models like XLNet and RoBERTa.
Ranked #5 on
Question Answering
on Natural Questions (long)
no code implementations • IJCNLP 2019 • Rishav Chakravarti, Cezar Pendus, Andrzej Sakrajda, Anthony Ferritto, Lin Pan, Michael Glass, Vittorio Castelli, J. William Murdock, Radu Florian, Salim Roukos, Avirup Sil
This paper introduces a novel orchestration framework, called CFO (COMPUTATION FLOW ORCHESTRATOR), for building, experimenting with, and deploying interactive NLP (Natural Language Processing) and IR (Information Retrieval) systems to production environments.
no code implementations • ACL 2019 • Tahira Naseem, Abhishek Shah, Hui Wan, Radu Florian, Salim Roukos, Miguel Ballesteros
Our work involves enriching the Stack-LSTM transition-based AMR parser (Ballesteros and Al-Onaizan, 2017) by augmenting training with Policy Learning and rewarding the Smatch score of sampled graphs.
Ranked #24 on
AMR Parsing
on LDC2017T10
no code implementations • 6 Sep 2018 • Linfeng Song, Zhiguo Wang, Mo Yu, Yue Zhang, Radu Florian, Daniel Gildea
Multi-hop reading comprehension focuses on one type of factoid question, where a system needs to properly integrate multiple pieces of evidence to correctly answer a question.
Ranked #2 on
Question Answering
on COMPLEXQUESTIONS
no code implementations • ACL 2018 • Gourab Kundu, Avirup Sil, Radu Florian, Wael Hamza
We propose an entity-centric neural cross-lingual coreference model that builds on multi-lingual embeddings and language-independent features.
no code implementations • 5 Dec 2017 • Avirup Sil, Gourab Kundu, Radu Florian, Wael Hamza
A major challenge in Entity Linking (EL) is making effective use of contextual information to disambiguate mentions to Wikipedia that might refer to different entities in different contexts.
Ranked #3 on
Entity Disambiguation
on TAC2010
no code implementations • ACL 2016 • Avirup Sil, Radu Florian
Entity linking (EL) is the task of disambiguating mentions in text by associating them with entries in a predefined database of mentions (persons, organizations, etc).
no code implementations • ACL 2017 • Jian Ni, Georgiana Dinu, Radu Florian
However, annotating NER data by human is expensive and time-consuming, and can be quite difficult for a new language.
no code implementations • EMNLP 2016 • Jian Ni, Radu Florian
Experimental results show that the proposed approaches are effective in improving the accuracy of such systems on unseen entities, especially when a system is applied to a new domain or it is trained with little training data (up to 18. 3 F1 score improvement).
Multilingual Named Entity Recognition
named-entity-recognition
+3
no code implementations • EMNLP 2017 • Lifu Huang, Avirup Sil, Heng Ji, Radu Florian
Slot Filling (SF) aims to extract the values of certain types of attributes (or slots, such as person:cities\_of\_residence) for a given entity from a large collection of source documents.
no code implementations • 13 Mar 2017 • Georgiana Dinu, Wael Hamza, Radu Florian
This paper describes an application of reinforcement learning to the mention detection task.
10 code implementations • 13 Feb 2017 • Zhiguo Wang, Wael Hamza, Radu Florian
Natural language sentence matching is a fundamental technology for a variety of tasks.
Ranked #17 on
Paraphrase Identification
on Quora Question Pairs
(Accuracy metric)
1 code implementation • 13 Dec 2016 • Zhiguo Wang, Haitao Mi, Wael Hamza, Radu Florian
Based on this dataset, we propose a Multi-Perspective Context Matching (MPCM) model, which is an end-to-end system that directly predicts the answer beginning and ending points in a passage.
Ranked #3 on
Open-Domain Question Answering
on SQuAD1.1
no code implementations • ACL 2013 • Lu Wang, Hema Raghavan, Vittorio Castelli, Radu Florian, Claire Cardie
We consider the problem of using sentence compression techniques to facilitate query-focused multi-document summarization.
no code implementations • 24 Feb 2016 • Thien Huu Nguyen, Avirup Sil, Georgiana Dinu, Radu Florian
One of the key challenges in natural language processing (NLP) is to yield good performance across application domains and languages.
no code implementations • TACL 2016 • Md. Arafat Sultan, Vittorio Castelli, Radu Florian
Answer sentence ranking and answer extraction are two key challenges in question answering that have traditionally been treated in isolation, i. e., as independent tasks.