Search Results for author: Cornelia Caragea

Found 81 papers, 39 papers with code

Improving Stance Detection with Multi-Dataset Learning and Knowledge Distillation

1 code implementation EMNLP 2021 Yingjie Li, Chenye Zhao, Cornelia Caragea

To address these challenges, first, we evaluate a multi-target and a multi-dataset training settings by training one model on each dataset and datasets of different domains, respectively.

Knowledge Distillation Stance Detection

CancerEmo: A Dataset for Fine-Grained Emotion Detection

no code implementations EMNLP 2020 Tiberiu Sosea, Cornelia Caragea

Emotions are an important element of human nature, often affecting the overall wellbeing of a person.

EnsyNet: A Dataset for Encouragement and Sympathy Detection

1 code implementation LREC 2022 Tiberiu Sosea, Cornelia Caragea

More and more people turn to Online Health Communities to seek social support during their illnesses.

Detecting Optimism in Tweets using Knowledge Distillation and Linguistic Analysis of Optimism

no code implementations LREC 2022 Ștefan Cobeli, Ioan-Bogdan Iordache, Shweta Yadav, Cornelia Caragea, Liviu P. Dinu, Dragoș Iliescu

Later, we devised a multi-task knowledge distillation framework to simultaneously learn the target task of optimism detection with the help of the auxiliary task of sentiment analysis and hate speech detection.

Hate Speech Detection Knowledge Distillation +1

On the Use of Web Search to Improve Scientific Collections

no code implementations EMNLP (sdp) 2020 Krutarth Patel, Cornelia Caragea, Sujatha Das Gollapalli

We were able to obtain ~267, 000 unique research papers through our fully-automated framework using ~76, 000 queries, resulting in almost 200, 000 more papers than the number of queries.

Knowledge Distillation with BERT for Image Tag-Based Privacy Prediction

no code implementations RANLP 2021 Chenye Zhao, Cornelia Caragea

Moreover, we utilize the idea of knowledge distillation to improve tag representations in a semi-supervised learning task.

Knowledge Distillation TAG

Towards Summarizing Healthcare Questions in Low-Resource Setting

no code implementations COLING 2022 Shweta Yadav, Cornelia Caragea

The current advancement in abstractive document summarization depends to a large extent on a considerable amount of human-annotated datasets.

Data Augmentation Diversity +1

Multimodal Semi-supervised Learning for Disaster Tweet Classification

1 code implementation COLING 2022 Iustin Sirbu, Tiberiu Sosea, Cornelia Caragea, Doina Caragea, Traian Rebedea

In this paper, we investigate how to leverage the copious amounts of unlabeled data generated on social media by disaster eyewitnesses and affected individuals during disaster events.

Classification

SciER: An Entity and Relation Extraction Dataset for Datasets, Methods, and Tasks in Scientific Documents

1 code implementation28 Oct 2024 Qi Zhang, Zhijia Chen, Huitong Pan, Cornelia Caragea, Longin Jan Latecki, Eduard Dragut

In this paper, we release a new entity and relation extraction dataset for entities related to datasets, methods, and tasks in scientific articles.

Relation Relation Extraction +1

Stanceformer: Target-Aware Transformer for Stance Detection

no code implementations9 Oct 2024 Krishna Garg, Cornelia Caragea

The task of Stance Detection involves discerning the stance expressed in a text towards a specific subject or target.

Aspect-Based Sentiment Analysis Sentiment Analysis +1

How Hard is this Test Set? NLI Characterization by Exploiting Training Dynamics

1 code implementation4 Oct 2024 Adrian Cosma, Stefan Ruseti, Mihai Dascalu, Cornelia Caragea

Natural Language Inference (NLI) evaluation is crucial for assessing language understanding models; however, popular datasets suffer from systematic spurious correlations that artificially inflate actual model performance.

Natural Language Inference

On the Design Space Between Transformers and Recursive Neural Nets

no code implementations3 Sep 2024 Jishnu Ray Chowdhury, Cornelia Caragea

In this paper, we study two classes of models, Recursive Neural Networks (RvNNs) and Transformers, and show that a tight connection between them emerges from the recent development of two recent models - Continuous Recursive Neural Networks (CRvNN) and Neural Data Routers (NDR).

Inductive Bias

Co-training for Low Resource Scientific Natural Language Inference

no code implementations20 Jun 2024 Mobashir Sadat, Cornelia Caragea

Scientific Natural Language Inference (NLI) is the task of predicting the semantic relation between a pair of sentences extracted from research articles.

Natural Language Inference

A Novel Cartography-Based Curriculum Learning Method Applied on RoNLI: The First Romanian Natural Language Inference Corpus

1 code implementation20 May 2024 Eduard Poesina, Cornelia Caragea, Radu Tudor Ionescu

Natural language inference (NLI), the task of recognizing the entailment relationship in sentence pairs, is an actively studied topic serving as a proxy for natural language understanding.

Machine Translation Natural Language Inference +5

EIVEN: Efficient Implicit Attribute Value Extraction using Multimodal LLM

no code implementations13 Apr 2024 Henry Peng Zou, Gavin Heqing Yu, Ziwei Fan, Dan Bu, Han Liu, Peng Dai, Dongmei Jia, Cornelia Caragea

To address these issues, we introduce EIVEN, a data- and parameter-efficient generative framework that pioneers the use of multimodal LLM for implicit attribute value extraction.

Attribute Attribute Value Extraction +1

MSciNLI: A Diverse Benchmark for Scientific Natural Language Inference

1 code implementation11 Apr 2024 Mobashir Sadat, Cornelia Caragea

Furthermore, we show that domain shift degrades the performance of scientific NLI models which demonstrates the diverse characteristics of different domains in our dataset.

Diversity Natural Language Inference +2

Investigating Recurrent Transformers with Dynamic Halt

1 code implementation1 Feb 2024 Jishnu Ray Chowdhury, Cornelia Caragea

In this paper, we comprehensively study the inductive biases of two major approaches to augmenting Transformers with a recurrent mechanism: (1) the approach of incorporating a depth-wise recurrence similar to Universal Transformers; and (2) the approach of incorporating a chunk-wise temporal recurrence like Temporal Latent Bottleneck.

Language Modelling ListOps

Language Models (Mostly) Do Not Consider Emotion Triggers When Predicting Emotion

1 code implementation16 Nov 2023 Smriti Singh, Cornelia Caragea, Junyi Jessy Li

Situations and events evoke emotions in humans, but to what extent do they inform the prediction of emotion detection models?

DeCrisisMB: Debiased Semi-Supervised Learning for Crisis Tweet Classification via Memory Bank

1 code implementation23 Oct 2023 Henry Peng Zou, Yue Zhou, Weizhi Zhang, Cornelia Caragea

During crisis events, people often use social media platforms such as Twitter to disseminate information about the situation, warnings, advice, and support.

Semi-Supervised Text Classification

CrisisMatch: Semi-Supervised Few-Shot Learning for Fine-Grained Disaster Tweet Classification

1 code implementation23 Oct 2023 Henry Peng Zou, Yue Zhou, Cornelia Caragea, Doina Caragea

The shared real-time information about natural disasters on social media platforms like Twitter and Facebook plays a critical role in informing volunteers, emergency managers, and response organizations.

Few-Shot Learning

MarginMatch: Improving Semi-Supervised Learning with Pseudo-Margins

1 code implementation CVPR 2023 Tiberiu Sosea, Cornelia Caragea

We introduce MarginMatch, a new SSL approach combining consistency regularization and pseudo-labeling, with its main novelty arising from the use of unlabeled data training dynamics to measure pseudo-label quality.

Pseudo Label

Unsupervised Extractive Summarization of Emotion Triggers

1 code implementation2 Jun 2023 Tiberiu Sosea, Hongli Zhan, Junyi Jessy Li, Cornelia Caragea

Second, we develop new unsupervised learning models that can jointly detect emotions and summarize their triggers.

Abstractive Text Summarization Extractive Summarization +1

Beam Tree Recursive Cells

1 code implementation31 May 2023 Jishnu Ray Chowdhury, Cornelia Caragea

We propose Beam Tree Recursive Cell (BT-Cell) - a backpropagation-friendly framework to extend Recursive Neural Networks (RvNNs) with beam search for latent structure induction.

ListOps

Monotonic Location Attention for Length Generalization

1 code implementation31 May 2023 Jishnu Ray Chowdhury, Cornelia Caragea

We explore different ways to utilize position-based cross-attention in seq2seq networks to enable length generalization in algorithmic tasks.

Position

Data Augmentation for Low-Resource Keyphrase Generation

1 code implementation29 May 2023 Krishna Garg, Jishnu Ray Chowdhury, Cornelia Caragea

Very few works address the problem of keyphrase generation in low-resource settings, but they still rely on a lot of additional unlabeled data for pretraining and on automatic methods for pseudo-annotations.

Data Augmentation Keyphrase Generation

DMDD: A Large-Scale Dataset for Dataset Mentions Detection

no code implementations19 May 2023 Huitong Pan, Qi Zhang, Eduard Dragut, Cornelia Caragea, Longin Jan Latecki

We use DMDD to establish baseline performance for dataset mention detection and linking.

Diversity

Neural Keyphrase Generation: Analysis and Evaluation

no code implementations27 Apr 2023 Tuhin Kundu, Jishnu Ray Chowdhury, Cornelia Caragea

Keyphrase generation aims at generating topical phrases from a given text either by copying from the original text (present keyphrases) or by producing new keyphrases (absent keyphrases) that capture the semantic meaning of the text.

Decoder Keyphrase Generation +2

Semantic Tokenizer for Enhanced Natural Language Processing

no code implementations24 Apr 2023 Sandeep Mehta, Darpan Shah, Ravindra Kulkarni, Cornelia Caragea

Traditionally, NLP performance improvement has been focused on improving models and increasing the number of model parameters.

Sentence Sentence Embeddings

Learning to Infer from Unlabeled Data: A Semi-supervised Learning Approach for Robust Natural Language Inference

1 code implementation5 Nov 2022 Mobashir Sadat, Cornelia Caragea

However, despite its substantial success on single sentence classification tasks where the challenge in making use of unlabeled data is to assign "good enough" pseudo-labels, for NLI tasks, the nature of unlabeled data is more complex: one of the sentences in the pair (usually the hypothesis) along with the class label are missing from the data and require human annotations, which makes SSL for NLI more challenging.

Language Modelling Natural Language Inference +3

Why Do You Feel This Way? Summarizing Triggers of Emotions in Social Media Posts

1 code implementation22 Oct 2022 Hongli Zhan, Tiberiu Sosea, Cornelia Caragea, Junyi Jessy Li

This paper takes a novel angle, namely, emotion detection and trigger summarization, aiming to both detect perceived emotions in text, and summarize events and their appraisals that trigger each emotion.

Emotion Detection and Trigger Summarization

A Data Cartography based MixUp for Pre-trained Language Models

1 code implementation NAACL 2022 Seo Yeon Park, Cornelia Caragea

MixUp is a data augmentation strategy where additional samples are generated during training by combining random pairs of training samples and their labels.

Data Augmentation Language Modelling

SciNLI: A Corpus for Natural Language Inference on Scientific Text

1 code implementation ACL 2022 Mobashir Sadat, Cornelia Caragea

Existing Natural Language Inference (NLI) datasets, while being instrumental in the advancement of Natural Language Understanding (NLU) research, are not related to scientific text.

Natural Language Inference Natural Language Understanding +1

On the Evaluation of Answer-Agnostic Paragraph-level Multi-Question Generation

1 code implementation9 Mar 2022 Jishnu Ray Chowdhury, Debanjan Mahata, Cornelia Caragea

Second, we compare different strategies to utilize a pre-trained seq2seq model to generate and select a set of questions related to a given paragraph.

Question Generation Question-Generation

Keyphrase Generation Beyond the Boundaries of Title and Abstract

1 code implementation13 Dec 2021 Krishna Garg, Jishnu Ray Chowdhury, Cornelia Caragea

Unlike prior large-scale datasets, FullTextKP includes the full text of the articles along with the title and abstract.

Decoder Keyphrase Generation

KPDrop: Improving Absent Keyphrase Generation

1 code implementation2 Dec 2021 Jishnu Ray Chowdhury, Seoyeon Park, Tuhin Kundu, Cornelia Caragea

Keyphrase generation is the task of generating phrases (keyphrases) that summarize the main topics of a given document.

Keyphrase Generation

Generating Summaries for Scientific Paper Review

no code implementations28 Sep 2021 Ana Sabina Uban, Cornelia Caragea

In this paper, we explore automatic review summary generation for scientific papers.

DeepZensols: Deep Natural Language Processing Framework

2 code implementations8 Sep 2021 Paul Landes, Barbara Di Eugenio, Cornelia Caragea

Reproducing results in publications by distributing publicly available source code is becoming ever more popular.

Stance Detection in COVID-19 Tweets

no code implementations ACL 2021 Kyle Glandt, Sarthak Khanal, Yingjie Li, Doina Caragea, Cornelia Caragea

The prevalence of the COVID-19 pandemic in day-to-day life has yielded large amounts of stance detection data on social media sites, as users turn to social media to share their views regarding various issues related to the pandemic, e. g. stay at home mandates and wearing face masks when out in public.

Domain Adaptation Stance Detection

eMLM: A New Pre-training Objective for Emotion Related Tasks

1 code implementation ACL 2021 Tiberiu Sosea, Cornelia Caragea

BERT has been shown to be extremely effective on a wide variety of natural language processing tasks, including sentiment analysis and emotion detection.

Language Modelling Sentiment Analysis

Emotion analysis and detection during COVID-19

no code implementations LREC 2022 Tiberiu Sosea, Chau Pham, Alexander Tekle, Cornelia Caragea, Junyi Jessy Li

Crises such as natural disasters, global pandemics, and social unrest continuously threaten our world and emotionally affect millions of people worldwide in distinct ways.

Domain Adaptation Emotion Recognition

Modeling Hierarchical Structures with Continuous Recursive Neural Networks

1 code implementation10 Jun 2021 Jishnu Ray Chowdhury, Cornelia Caragea

We also show that CRvNN performs comparably or better than prior latent structure models on real-world tasks such as sentiment analysis and natural language inference.

ListOps Natural Language Inference +1

Target-Aware Data Augmentation for Stance Detection

no code implementations NAACL 2021 Yingjie Li, Cornelia Caragea

The goal of stance detection is to identify whether the author of a text is in favor of, neutral or against a specific target.

Data Augmentation Language Modelling +3

Identifying Medical Self-Disclosure in Online Communities

no code implementations NAACL 2021 Mina Valizadeh, Pardis Ranjbar-Noiey, Cornelia Caragea, Natalie Parde

Self-disclosure in online health conversations may offer a host of benefits, including earlier detection and treatment of medical issues that may have otherwise gone unaddressed.

Exploiting Position and Contextual Word Embeddings for Keyphrase Extraction from Scientific Papers

no code implementations EACL 2021 Krutarth Patel, Cornelia Caragea

Keyphrases associated with research papers provide an effective way to find useful information in the large and growing scholarly digital collections.

Keyphrase Extraction Position +1

Scientific Keyphrase Identification and Classification by Pre-Trained Language Models Intermediate Task Transfer Learning

no code implementations COLING 2020 Seoyeon Park, Cornelia Caragea

Scientific keyphrase identification and classification is the task of detecting and classifying keyphrases from scholarly text with their types from a set of predefined classes.

Classification POS +2

Identifying Documents In-Scope of a Collection from Web Archives

no code implementations2 Sep 2020 Krutarth Patel, Cornelia Caragea, Mark Phillips, Nathaniel Fox

Web archive data usually contains high-quality documents that are very useful for creating specialized collections of documents, e. g., scientific digital libraries and repositories of technical reports.

Interpretable Multi-Step Reasoning with Knowledge Extraction on Complex Healthcare Question Answering

no code implementations6 Aug 2020 Ye Liu, Shaika Chowdhury, Chenwei Zhang, Cornelia Caragea, Philip S. Yu

Unlike most other QA tasks that focus on linguistic understanding, HeadQA requires deeper reasoning involving not only knowledge extraction, but also complex reasoning with healthcare knowledge.

Multiple-choice Question Answering

Cross-Lingual Disaster-related Multi-label Tweet Classification with Manifold Mixup

1 code implementation ACL 2020 Jishnu Ray Chowdhury, Cornelia Caragea, Doina Caragea

Distinguishing informative and actionable messages from a social media platform like Twitter is critical for facilitating disaster management.

General Classification Management +2

Dynamic Classification in Web Archiving Collections

no code implementations LREC 2020 Krutarth Patel, Cornelia Caragea, Mark Phillips

The Web archived data usually contains high-quality documents that are very useful for creating specialized collections of documents.

Classification General Classification

Detecting Perceived Emotions in Hurricane Disasters

1 code implementation ACL 2020 Shrey Desai, Cornelia Caragea, Junyi Jessy Li

Natural disasters (e. g., hurricanes) affect millions of people each year, causing widespread destruction in their wake.

On Identifying Hashtags in Disaster Twitter Data

1 code implementation5 Jan 2020 Jishnu Ray Chowdhury, Cornelia Caragea, Doina Caragea

Moreover, only a small number of tweets that contain actionable hashtags are useful for disaster response.

Disaster Response Multi-Task Learning

The Myth of Double-Blind Review Revisited: ACL vs. EMNLP

no code implementations IJCNLP 2019 Cornelia Caragea, Ana Uban, Liviu P. Dinu

We study this question on the ACL and EMNLP paper collections and present an analysis on how well deep learning techniques can infer the authors of a paper.

Keyphrase Extraction from Disaster-related Tweets

no code implementations17 Oct 2019 Jishnu Ray Chowdhury, Cornelia Caragea, Doina Caragea

Previously, joint training of two different layers of a stacked Recurrent Neural Network for keyword discovery and keyphrase extraction had been shown to be effective in extracting keyphrases from general Twitter data.

Keyphrase Extraction POS +1

Image Privacy Prediction Using Deep Neural Networks

1 code implementation8 Mar 2019 Ashwini Tonge, Cornelia Caragea

Thus, automatically predicting images' privacy to warn users about private or sensitive content before uploading these images on social networking sites has become a necessity in our current interconnected world.

Object Recognition TAG

Dynamic Deep Multi-modal Fusion for Image Privacy Prediction

no code implementations27 Feb 2019 Ashwini Tonge, Cornelia Caragea

In this paper, we propose an approach for fusing object, scene context, and image tags modalities derived from convolutional neural networks for accurately predicting the privacy of images shared online.

Fine-Grained Emotion Detection in Health-Related Online Posts

no code implementations EMNLP 2018 Hamed Khanpour, Cornelia Caragea

Detecting fine-grained emotions in online health communities provides insightful information about patients{'} emotional states.

Emotion Recognition

Exploring Optimism and Pessimism in Twitter Using Deep Learning

no code implementations EMNLP 2018 Cornelia Caragea, Liviu P. Dinu, Bogdan Dumitru

Identifying optimistic and pessimistic viewpoints and users from Twitter is useful for providing better social support to those who need such support, and for minimizing the negative influence among users and maximizing the spread of positive attitudes and ideas.

Deep Learning

Identifying Empathetic Messages in Online Health Communities

no code implementations IJCNLP 2017 Hamed Khanpour, Cornelia Caragea, Prakhar Biyani

Empathy captures one{'}s ability to correlate with and understand others{'} emotional states and experiences.

PositionRank: An Unsupervised Approach to Keyphrase Extraction from Scholarly Documents

no code implementations ACL 2017 Corina Florescu, Cornelia Caragea

In this paper, we propose PositionRank, an unsupervised model for keyphrase extraction from scholarly documents that incorporates information from all positions of a word{'}s occurrences into a biased PageRank.

Information Retrieval Keyphrase Extraction

Privacy Prediction of Images Shared on Social Media Sites Using Deep Features

no code implementations29 Oct 2015 Ashwini Tonge, Cornelia Caragea

In this paper, we present an approach to image privacy prediction that uses deep features and deep image tags as feature representations.

Entity-Specific Sentiment Classification of Yahoo News Comments

no code implementations11 Jun 2015 Prakhar Biyani, Cornelia Caragea, Narayan Bhamidipati

However, the problem of classifying the sentiment of user comments on news sites has not been addressed yet.

Classification General Classification +2

Keyword and Keyphrase Extraction Using Centrality Measures on Collocation Networks

no code implementations25 Jan 2014 Shibamouli Lahiri, Sagnik Ray Choudhury, Cornelia Caragea

Keyword and keyphrase extraction is an important problem in natural language processing, with applications ranging from summarization to semantic search to document clustering.

Clustering Keyphrase Extraction

Cannot find the paper you are looking for? You can Submit a new open access paper.