This paper presents our approach for the MuP 2022 shared task —-Multi-Perspective Scientific Document Summarization, where the objective is to enable summarization models to explore methods for generating multi-perspective summaries for scientific papers.
To complement this evaluation, we propose a dynamic thresholding technique that adjusts the classifier’s sensitivity as a function of the number of posts a user has.
We introduce 'LADR' (Lexically-Accelerated Dense Retrieval), a simple-yet-effective approach that improves the efficiency of existing dense retrieval models without compromising on retrieval effectiveness.
Query-focused summarization (QFS) is a challenging task in natural language processing that generates summaries to address specific queries.
Recent Transformer-based summarization models have provided a promising approach to abstractive summarization.
Automatically generating short summaries from users' online mental health posts could save counselors' reading time and reduce their fatigue so that they can provide timely responses to those seeking help for improving their mental state.
The recent interest to tackle this problem motivated curation of scientific datasets, arXiv-Long and PubMed-Long, containing human-written summaries of 400-600 words, hence, providing a venue for research in generating long/extended summaries.
Some of these platforms, such as Reachout, are dedicated forums where the users register to seek help.
Ranked #1 on Text Summarization on MentSum
Recent models in developing summarization systems consist of millions of parameters and the model performance is highly dependent on the abundance of training data.
Ranked #1 on Extreme Summarization on TLDR9+
Despite the recent successes of transformer-based models in terms of effectiveness on a variety of tasks, their decisions often remain opaque to humans.
We then present our results on three long summarization datasets, arXiv-Long, PubMed-Long, and Longsumm.
Ranked #1 on Extended Summarization on Longsumm Val
Pretrained contextualized language models such as BERT and T5 have established a new state-of-the-art for ad-hoc search.
With worldwide concerns surrounding the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), there is a rapidly growing body of scientific literature on the virus.
This paper presents our systems for SemEval 2020 Shared Task 11: Detection of Propaganda Techniques in News Articles.
Offensive language detection is an important and challenging task in natural language processing.
We present an elegant and effective approach for addressing limitations in existing multi-label classification models by incorporating interaction matching, a concept shown to be useful for ad-hoc search result ranking.
Sequence-to-sequence (seq2seq) network is a well-established model for text summarization task.
We show that the proposed heuristics can be used to build a training curriculum that down-weights difficult samples early in the training process.
We also observe that the performance is additive with the current leading first-stage retrieval methods, further narrowing the gap between inexpensive and cost-prohibitive passage ranking approaches.
Deep pretrained transformer networks are effective at various ranking tasks, such as question answering and ad-hoc document ranking.
This allows medical practitioners to easily identify and learn from the reports in which their interpretation most substantially differed from that of the attending physician (who finalized the report).
While billions of non-English speaking users rely on search engines every day, the problem of ad-hoc information retrieval is rarely studied for non-English languages.
Automatically generating accurate summaries from clinical reports could save a clinician's time, improve summary coverage, and reduce errors.
We call this joint approach CEDR (Contextualized Embeddings for Document Ranking).
Ranked #3 on Ad-Hoc Information Retrieval on TREC Robust04
Self-reported diagnosis statements have been widely employed in studying language related to mental health in social media.
Mental health is a significant and growing public health concern.
In recent years, online communities have formed around suicide and self-harm prevention.
Neural abstractive summarization models have led to promising results in summarizing relatively short documents.
Ranked #4 on Unsupervised Extractive Summarization on Pubmed
SemEval 2018 Task 7 focuses on relation ex- traction and classification in scientific literature.
We propose methods for identifying posts in support communities that may indicate a risk of self-harm, and demonstrate that our approach outperforms strong previously proposed methods for identifying such posts.
Clinical TempEval 2017 (SemEval 2017 Task 12) addresses the task of cross-domain temporal extraction from clinical text.
We present a framework for scientific summarization which takes advantage of the citations and the scientific discourse structure.
Citation texts are sometimes not very informative or in some cases inaccurate by themselves; they need the appropriate context from the referenced paper to reflect its exact contributions.
Medical errors are leading causes of death in the US and as such, prevention of these errors is paramount to promoting health care.
Our analysis on the interaction of the moderators with the users further indicates that without an automatic way to identify critical content, it is indeed challenging for the moderators to provide timely response to the users in need.
Finally, we propose an alternative metric for summarization evaluation which is based on the content relevance between a system generated summary and the corresponding human written summaries.
With the rapid growth of social media, there is increasing potential to augment traditional public health surveillance methods with data from social media.