1 code implementation • EMNLP 2021 • Yingjie Li, Chenye Zhao, Cornelia Caragea
To address these challenges, first, we evaluate a multi-target and a multi-dataset training settings by training one model on each dataset and datasets of different domains, respectively.
no code implementations • EMNLP 2020 • Tiberiu Sosea, Cornelia Caragea
Emotions are an important element of human nature, often affecting the overall wellbeing of a person.
1 code implementation • LREC 2022 • Tiberiu Sosea, Cornelia Caragea
More and more people turn to Online Health Communities to seek social support during their illnesses.
no code implementations • LREC 2022 • Ștefan Cobeli, Ioan-Bogdan Iordache, Shweta Yadav, Cornelia Caragea, Liviu P. Dinu, Dragoș Iliescu
Later, we devised a multi-task knowledge distillation framework to simultaneously learn the target task of optimism detection with the help of the auxiliary task of sentiment analysis and hate speech detection.
no code implementations • EMNLP (sdp) 2020 • Krutarth Patel, Cornelia Caragea, Sujatha Das Gollapalli
We were able to obtain ~267, 000 unique research papers through our fully-automated framework using ~76, 000 queries, resulting in almost 200, 000 more papers than the number of queries.
no code implementations • RANLP 2021 • Chenye Zhao, Cornelia Caragea
Moreover, we utilize the idea of knowledge distillation to improve tag representations in a semi-supervised learning task.
1 code implementation • COLING 2022 • Paul Landes, Kunal Patel, Sean S. Huang, Adam Webb, Barbara Di Eugenio, Cornelia Caragea
The process by which sections in a document are demarcated and labeled is known as section identification.
Ranked #2 on Classification on MedSecId
no code implementations • COLING 2022 • Shweta Yadav, Cornelia Caragea
The current advancement in abstractive document summarization depends to a large extent on a considerable amount of human-annotated datasets.
1 code implementation • COLING 2022 • Iustin Sirbu, Tiberiu Sosea, Cornelia Caragea, Doina Caragea, Traian Rebedea
In this paper, we investigate how to leverage the copious amounts of unlabeled data generated on social media by disaster eyewitnesses and affected individuals during disaster events.
1 code implementation • Findings (EMNLP) 2021 • Mahshid Hosseini, Cornelia Caragea
Empathy is the link between self and others.
1 code implementation • 28 Oct 2024 • Qi Zhang, Zhijia Chen, Huitong Pan, Cornelia Caragea, Longin Jan Latecki, Eduard Dragut
In this paper, we release a new entity and relation extraction dataset for entities related to datasets, methods, and tasks in scientific articles.
no code implementations • 9 Oct 2024 • Krishna Garg, Cornelia Caragea
The task of Stance Detection involves discerning the stance expressed in a text towards a specific subject or target.
1 code implementation • 4 Oct 2024 • Adrian Cosma, Stefan Ruseti, Mihai Dascalu, Cornelia Caragea
Natural Language Inference (NLI) evaluation is crucial for assessing language understanding models; however, popular datasets suffer from systematic spurious correlations that artificially inflate actual model performance.
no code implementations • 3 Sep 2024 • Jishnu Ray Chowdhury, Cornelia Caragea
In this paper, we study two classes of models, Recursive Neural Networks (RvNNs) and Transformers, and show that a tight connection between them emerges from the recent development of two recent models - Continuous Recursive Neural Networks (CRvNN) and Neural Data Routers (NDR).
1 code implementation • 6 Jul 2024 • Huitong Pan, Qi Zhang, Cornelia Caragea, Eduard Dragut, Longin Jan Latecki
FlowLearn contains complex scientific flowcharts and simulated flowcharts.
Optical Character Recognition (OCR) Visual Question Answering (VQA)
no code implementations • 20 Jun 2024 • Mobashir Sadat, Cornelia Caragea
Scientific Natural Language Inference (NLI) is the task of predicting the semantic relation between a pair of sentences extracted from research articles.
no code implementations • 20 Jun 2024 • Huitong Pan, Qi Zhang, Cornelia Caragea, Eduard Dragut, Longin Jan Latecki
To the best of our knowledge, SciDMT is the largest corpus for scientific entity mention detection.
1 code implementation • 20 May 2024 • Eduard Poesina, Cornelia Caragea, Radu Tudor Ionescu
Natural language inference (NLI), the task of recognizing the entailment relationship in sentence pairs, is an actively studied topic serving as a proxy for natural language understanding.
1 code implementation • 24 Apr 2024 • Henry Peng Zou, Vinay Samuel, Yue Zhou, Weizhi Zhang, Liancheng Fang, Zihe Song, Philip S. Yu, Cornelia Caragea
To address these limitations, we present ImplicitAVE, the first, publicly available multimodal dataset for implicit attribute value extraction.
no code implementations • 13 Apr 2024 • Henry Peng Zou, Gavin Heqing Yu, Ziwei Fan, Dan Bu, Han Liu, Peng Dai, Dongmei Jia, Cornelia Caragea
To address these issues, we introduce EIVEN, a data- and parameter-efficient generative framework that pioneers the use of multimodal LLM for implicit attribute value extraction.
1 code implementation • 11 Apr 2024 • Mobashir Sadat, Cornelia Caragea
Furthermore, we show that domain shift degrades the performance of scientific NLI models which demonstrates the diverse characteristics of different domains in our dataset.
1 code implementation • 1 Feb 2024 • Jishnu Ray Chowdhury, Cornelia Caragea
In this paper, we comprehensively study the inductive biases of two major approaches to augmenting Transformers with a recurrent mechanism: (1) the approach of incorporating a depth-wise recurrence similar to Universal Transformers; and (2) the approach of incorporating a chunk-wise temporal recurrence like Temporal Latent Bottleneck.
1 code implementation • 16 Nov 2023 • Smriti Singh, Cornelia Caragea, Junyi Jessy Li
Situations and events evoke emotions in humans, but to what extent do they inform the prediction of emotion detection models?
1 code implementation • 23 Oct 2023 • Henry Peng Zou, Yue Zhou, Weizhi Zhang, Cornelia Caragea
During crisis events, people often use social media platforms such as Twitter to disseminate information about the situation, warnings, advice, and support.
1 code implementation • 23 Oct 2023 • Henry Peng Zou, Yue Zhou, Cornelia Caragea, Doina Caragea
The shared real-time information about natural disasters on social media platforms like Twitter and Facebook plays a critical role in informing volunteers, emergency managers, and response organizations.
1 code implementation • 23 Oct 2023 • Henry Peng Zou, Cornelia Caragea
However, existing approaches based on pseudo-labeling suffer from the issues of pseudo-label bias and error accumulation.
1 code implementation • CVPR 2023 • Tiberiu Sosea, Cornelia Caragea
We introduce MarginMatch, a new SSL approach combining consistency regularization and pseudo-labeling, with its main novelty arising from the use of unlabeled data training dynamics to measure pseudo-label quality.
no code implementations • 16 Aug 2023 • Tiberiu Sosea, Junyi Jessy Li, Cornelia Caragea
This contempt is in some cases expressed as sarcasm or irony.
1 code implementation • 2 Jun 2023 • Tiberiu Sosea, Hongli Zhan, Junyi Jessy Li, Cornelia Caragea
Second, we develop new unsupervised learning models that can jointly detect emotions and summarize their triggers.
1 code implementation • 31 May 2023 • Jishnu Ray Chowdhury, Cornelia Caragea
We propose Beam Tree Recursive Cell (BT-Cell) - a backpropagation-friendly framework to extend Recursive Neural Networks (RvNNs) with beam search for latent structure induction.
1 code implementation • 31 May 2023 • Jishnu Ray Chowdhury, Cornelia Caragea
We explore different ways to utilize position-based cross-attention in seq2seq networks to enable length generalization in algorithmic tasks.
1 code implementation • 29 May 2023 • Krishna Garg, Jishnu Ray Chowdhury, Cornelia Caragea
Very few works address the problem of keyphrase generation in low-resource settings, but they still rely on a lot of additional unlabeled data for pretraining and on automatic methods for pseudo-annotations.
no code implementations • 19 May 2023 • Huitong Pan, Qi Zhang, Eduard Dragut, Cornelia Caragea, Longin Jan Latecki
We use DMDD to establish baseline performance for dataset mention detection and linking.
no code implementations • 27 Apr 2023 • Tuhin Kundu, Jishnu Ray Chowdhury, Cornelia Caragea
Keyphrase generation aims at generating topical phrases from a given text either by copying from the original text (present keyphrases) or by producing new keyphrases (absent keyphrases) that capture the semantic meaning of the text.
no code implementations • 24 Apr 2023 • Sandeep Mehta, Darpan Shah, Ravindra Kulkarni, Cornelia Caragea
Traditionally, NLP performance improvement has been focused on improving models and increasing the number of model parameters.
1 code implementation • 5 Nov 2022 • Mobashir Sadat, Cornelia Caragea
For example, a paper can be assigned to several topics in a hierarchy tree.
Hierarchical Multi-label Classification Multi-Label Text Classification +2
1 code implementation • 5 Nov 2022 • Mobashir Sadat, Cornelia Caragea
However, despite its substantial success on single sentence classification tasks where the challenge in making use of unlabeled data is to assign "good enough" pseudo-labels, for NLI tasks, the nature of unlabeled data is more complex: one of the sentences in the pair (usually the hypothesis) along with the class label are missing from the data and require human annotations, which makes SSL for NLI more challenging.
1 code implementation • 22 Oct 2022 • Hongli Zhan, Tiberiu Sosea, Cornelia Caragea, Junyi Jessy Li
This paper takes a novel angle, namely, emotion detection and trigger summarization, aiming to both detect perceived emotions in text, and summarize events and their appraisals that trigger each emotion.
1 code implementation • NAACL 2022 • Seo Yeon Park, Cornelia Caragea
MixUp is a data augmentation strategy where additional samples are generated during training by combining random pairs of training samples and their labels.
no code implementations • ACL 2022 • Seo Yeon Park, Cornelia Caragea
A well-calibrated neural model produces confidence (probability outputs) closely approximated by the expected accuracy.
1 code implementation • ACL 2022 • Mobashir Sadat, Cornelia Caragea
Existing Natural Language Inference (NLI) datasets, while being instrumental in the advancement of Natural Language Understanding (NLU) research, are not related to scientific text.
Natural Language Inference Natural Language Understanding +1
1 code implementation • 9 Mar 2022 • Jishnu Ray Chowdhury, Debanjan Mahata, Cornelia Caragea
Second, we compare different strategies to utilize a pre-trained seq2seq model to generate and select a set of questions related to a given paragraph.
1 code implementation • 13 Dec 2021 • Krishna Garg, Jishnu Ray Chowdhury, Cornelia Caragea
Unlike prior large-scale datasets, FullTextKP includes the full text of the articles along with the title and abstract.
1 code implementation • 2 Dec 2021 • Jishnu Ray Chowdhury, Seoyeon Park, Tuhin Kundu, Cornelia Caragea
Keyphrase generation is the task of generating phrases (keyphrases) that summarize the main topics of a given document.
no code implementations • 28 Sep 2021 • Ana Sabina Uban, Cornelia Caragea
In this paper, we explore automatic review summary generation for scientific papers.
2 code implementations • 8 Sep 2021 • Paul Landes, Barbara Di Eugenio, Cornelia Caragea
Reproducing results in publications by distributing publicly available source code is becoming ever more popular.
no code implementations • ACL 2021 • Kyle Glandt, Sarthak Khanal, Yingjie Li, Doina Caragea, Cornelia Caragea
The prevalence of the COVID-19 pandemic in day-to-day life has yielded large amounts of stance detection data on social media sites, as users turn to social media to share their views regarding various issues related to the pandemic, e. g. stay at home mandates and wearing face masks when out in public.
1 code implementation • ACL 2021 • Tiberiu Sosea, Cornelia Caragea
BERT has been shown to be extremely effective on a wide variety of natural language processing tasks, including sentiment analysis and emotion detection.
no code implementations • LREC 2022 • Tiberiu Sosea, Chau Pham, Alexander Tekle, Cornelia Caragea, Junyi Jessy Li
Crises such as natural disasters, global pandemics, and social unrest continuously threaten our world and emotionally affect millions of people worldwide in distinct ways.
1 code implementation • 10 Jun 2021 • Jishnu Ray Chowdhury, Cornelia Caragea
We also show that CRvNN performs comparably or better than prior latent structure models on real-world tasks such as sentiment analysis and natural language inference.
no code implementations • NAACL 2021 • Yingjie Li, Cornelia Caragea
The goal of stance detection is to identify whether the author of a text is in favor of, neutral or against a specific target.
no code implementations • NAACL 2021 • Mina Valizadeh, Pardis Ranjbar-Noiey, Cornelia Caragea, Natalie Parde
Self-disclosure in online health conversations may offer a host of benefits, including earlier detection and treatment of medical issues that may have otherwise gone unaddressed.
no code implementations • EACL 2021 • Krutarth Patel, Cornelia Caragea
Keyphrases associated with research papers provide an effective way to find useful information in the large and growing scholarly digital collections.
no code implementations • COLING 2020 • Seoyeon Park, Cornelia Caragea
Scientific keyphrase identification and classification is the task of detecting and classifying keyphrases from scholarly text with their types from a set of predefined classes.
no code implementations • 2 Sep 2020 • Krutarth Patel, Cornelia Caragea, Mark Phillips, Nathaniel Fox
Web archive data usually contains high-quality documents that are very useful for creating specialized collections of documents, e. g., scientific digital libraries and repositories of technical reports.
no code implementations • 6 Aug 2020 • Ye Liu, Shaika Chowdhury, Chenwei Zhang, Cornelia Caragea, Philip S. Yu
Unlike most other QA tasks that focus on linguistic understanding, HeadQA requires deeper reasoning involving not only knowledge extraction, but also complex reasoning with healthcare knowledge.
1 code implementation • ACL 2020 • Jishnu Ray Chowdhury, Cornelia Caragea, Doina Caragea
Distinguishing informative and actionable messages from a social media platform like Twitter is critical for facilitating disaster management.
no code implementations • LREC 2020 • Krutarth Patel, Cornelia Caragea, Mark Phillips
The Web archived data usually contains high-quality documents that are very useful for creating specialized collections of documents.
1 code implementation • ACL 2020 • Shrey Desai, Cornelia Caragea, Junyi Jessy Li
Natural disasters (e. g., hurricanes) affect millions of people each year, causing widespread destruction in their wake.
1 code implementation • 5 Jan 2020 • Jishnu Ray Chowdhury, Cornelia Caragea, Doina Caragea
Moreover, only a small number of tweets that contain actionable hashtags are useful for disaster response.
no code implementations • IJCNLP 2019 • Yingjie Li, Cornelia Caragea
Stance detection aims to detect whether the opinion holder is in support of or against a given target.
no code implementations • IJCNLP 2019 • Cornelia Caragea, Ana Uban, Liviu P. Dinu
We study this question on the ACL and EMNLP paper collections and present an analysis on how well deep learning techniques can infer the authors of a paper.
no code implementations • 17 Oct 2019 • Jishnu Ray Chowdhury, Cornelia Caragea, Doina Caragea
Previously, joint training of two different layers of a stacked Recurrent Neural Network for keyword discovery and keyphrase extraction had been shown to be effective in extracting keyphrases from general Twitter data.
1 code implementation • 20 Jun 2019 • Athar Sefid, Jian Wu, Allen C. Ge, Jing Zhao, Lu Liu, Cornelia Caragea, Prasenjit Mitra, C. Lee Giles
We introduce a system designed to match scholarly document entities with noisy metadata against a reference dataset.
1 code implementation • 8 Mar 2019 • Ashwini Tonge, Cornelia Caragea
Thus, automatically predicting images' privacy to warn users about private or sensitive content before uploading these images on social networking sites has become a necessity in our current interconnected world.
no code implementations • 27 Feb 2019 • Ashwini Tonge, Cornelia Caragea
In this paper, we propose an approach for fusing object, scene context, and image tags modalities derived from convolutional neural networks for accurately predicting the privacy of images shared online.
no code implementations • EMNLP 2018 • Hamed Khanpour, Cornelia Caragea
Detecting fine-grained emotions in online health communities provides insightful information about patients{'} emotional states.
no code implementations • EMNLP 2018 • Cornelia Caragea, Liviu P. Dinu, Bogdan Dumitru
Identifying optimistic and pessimistic viewpoints and users from Twitter is useful for providing better social support to those who need such support, and for minimizing the negative influence among users and maximizing the spread of positive attitudes and ideas.
no code implementations • IJCNLP 2017 • Hamed Khanpour, Cornelia Caragea, Prakhar Biyani
Empathy captures one{'}s ability to correlate with and understand others{'} emotional states and experiences.
no code implementations • ACL 2017 • Corina Florescu, Cornelia Caragea
In this paper, we propose PositionRank, an unsupervised model for keyphrase extraction from scholarly documents that incorporates information from all positions of a word{'}s occurrences into a biased PageRank.
no code implementations • 29 Oct 2015 • Ashwini Tonge, Cornelia Caragea
In this paper, we present an approach to image privacy prediction that uses deep features and deep image tags as feature representations.
no code implementations • 11 Jun 2015 • Prakhar Biyani, Cornelia Caragea, Narayan Bhamidipati
However, the problem of classifying the sentiment of user comments on news sites has not been addressed yet.
no code implementations • 25 Jan 2014 • Shibamouli Lahiri, Sagnik Ray Choudhury, Cornelia Caragea
Keyword and keyphrase extraction is an important problem in natural language processing, with applications ranging from summarization to semantic search to document clustering.