Search Results for author: Arman Cohan

Found 67 papers, 42 papers with code

Zero- and Few-Shot NLP with Pretrained Language Models

no code implementations ACL 2022 Iz Beltagy, Arman Cohan, Robert Logan IV, Sewon Min, Sameer Singh

The ability to efficiently learn from little-to-no data is critical to applying NLP to tasks where data collection is costly or otherwise difficult.

Few-Shot Learning

Struc-Bench: Are Large Language Models Really Good at Generating Complex Structured Data?

1 code implementation16 Sep 2023 Xiangru Tang, Yiming Zong, Jason Phang, Yilun Zhao, Wangchunshu Zhou, Arman Cohan, Mark Gerstein

In this study, we assess the capability of Current LLMs in generating complex structured data and propose a structure-aware fine-tuning approach as a solution to improve this ability.

ODSum: New Benchmarks for Open Domain Multi-Document Summarization

no code implementations16 Sep 2023 Yijie Zhou, Kejian Shi, Wencai Zhang, Yixin Liu, Yilun Zhao, Arman Cohan

Open-domain Multi-Document Summarization (ODMDS) is a critical tool for condensing vast arrays of documents into coherent, concise summaries.

Document Summarization Multi-Document Summarization +1

Peek Across: Improving Multi-Document Modeling via Cross-Document Question-Answering

1 code implementation24 May 2023 Avi Caciularu, Matthew E. Peters, Jacob Goldberger, Ido Dagan, Arman Cohan

The integration of multi-document pre-training objectives into language models has resulted in remarkable improvements in multi-document downstream tasks.

Question Answering Text Generation

A Controllable QA-based Framework for Decontextualization

no code implementations24 May 2023 Benjamin Newman, Luca Soldaini, Raymond Fok, Arman Cohan, Kyle Lo

We propose a question-answering framework for decontextualization that allows for better handling of user information needs and preferences when determining the scope of rewriting.

Question Answering

Large Language Models are Effective Table-to-Text Generators, Evaluators, and Feedback Providers

1 code implementation24 May 2023 Yilun Zhao, Haowei Zhang, Shengyun Si, Linyong Nan, Xiangru Tang, Arman Cohan

In this paper, we study the capabilities of LLMs for table-to-text generation tasks, particularly aiming to investigate their performance in generating natural language statements that can be logically entailed by a provided table.

Table-to-Text Generation

On Learning to Summarize with Large Language Models as References

1 code implementation23 May 2023 Yixin Liu, Alexander R. Fabbri, PengFei Liu, Dragomir Radev, Arman Cohan

Therefore, we investigate a new learning paradigm of text summarization models that considers the LLMs as the reference or the gold-standard oracle on commonly used summarization datasets such as the CNN/DailyMail dataset.

Contrastive Learning Text Summarization

Enhancing Few-shot Text-to-SQL Capabilities of Large Language Models: A Study on Prompt Design Strategies

no code implementations21 May 2023 Linyong Nan, Yilun Zhao, Weijin Zou, Narutatsu Ri, Jaesung Tae, Ellen Zhang, Arman Cohan, Dragomir Radev

In-context learning (ICL) has emerged as a new approach to various natural language processing tasks, utilizing large language models (LLMs) to make predictions based on context that has been supplemented with a few examples or task-specific instructions.

Question Answering Text-To-SQL

Inference-time Re-ranker Relevance Feedback for Neural Information Retrieval

no code implementations19 May 2023 Revanth Gangi Reddy, Pradeep Dasigi, Md Arafat Sultan, Arman Cohan, Avirup Sil, Heng Ji, Hannaneh Hajishirzi

Neural information retrieval often adopts a retrieve-and-rerank framework: a bi-encoder network first retrieves K (e. g., 100) candidates that are then re-ranked using a more powerful cross-encoder model to rank the better candidates higher.

Information Retrieval Retrieval

TESS: Text-to-Text Self-Conditioned Simplex Diffusion

1 code implementation15 May 2023 Rabeeh Karimi Mahabadi, Jaesung Tae, Hamish Ivison, James Henderson, Iz Beltagy, Matthew E. Peters, Arman Cohan

Diffusion models have emerged as a powerful paradigm for generation, obtaining strong performance in various domains with continuous-valued inputs.

Natural Language Understanding Paraphrase Generation +3

LongEval: Guidelines for Human Evaluation of Faithfulness in Long-form Summarization

1 code implementation30 Jan 2023 Kalpesh Krishna, Erin Bransom, Bailey Kuehl, Mohit Iyyer, Pradeep Dasigi, Arman Cohan, Kyle Lo

Motivated by our survey, we present LongEval, a set of guidelines for human evaluation of faithfulness in long-form summaries that addresses the following challenges: (1) How can we achieve high inter-annotator agreement on faithfulness scores?

SciRepEval: A Multi-Format Benchmark for Scientific Document Representations

1 code implementation23 Nov 2022 Amanpreet Singh, Mike D'Arcy, Arman Cohan, Doug Downey, Sergey Feldman

However, existing benchmarks for evaluating these representations fail to capture the diversity of relevant tasks.

SciFact-Open: Towards open-domain scientific claim verification

1 code implementation25 Oct 2022 David Wadden, Kyle Lo, Bailey Kuehl, Arman Cohan, Iz Beltagy, Lucy Lu Wang, Hannaneh Hajishirzi

While research on scientific claim verification has led to the development of powerful systems that appear to approach human performance, these approaches have yet to be tested in a realistic setting against large corpora of scientific literature.

Claim Verification Information Retrieval +1

Embedding Recycling for Language Models

1 code implementation11 Jul 2022 Jon Saad-Falcon, Amanpreet Singh, Luca Soldaini, Mike D'Arcy, Arman Cohan, Doug Downey

Real-world applications of neural language models often involve running many different models over the same corpus.

Question Answering Text Classification

Improving the Generalizability of Depression Detection by Leveraging Clinical Questionnaires

1 code implementation ACL 2022 Thong Nguyen, Andrew Yates, Ayah Zirikly, Bart Desmet, Arman Cohan

In dataset-transfer experiments on three social media datasets, we find that grounding the model in PHQ9's symptoms substantially improves its ability to generalize to out-of-distribution data compared to a standard BERT-based approach.

Depression Detection Domain Generalization

Generating Scientific Claims for Zero-Shot Scientific Fact Checking

1 code implementation ACL 2022 Dustin Wright, David Wadden, Kyle Lo, Bailey Kuehl, Arman Cohan, Isabelle Augenstein, Lucy Lu Wang

To address this challenge, we propose scientific claim generation, the task of generating one or more atomic and verifiable claims from scientific sentences, and demonstrate its usefulness in zero-shot fact checking for biomedical claims.

Fact Checking

MultiVerS: Improving scientific claim verification with weak supervision and full-document context

2 code implementations Findings (NAACL) 2022 David Wadden, Kyle Lo, Lucy Lu Wang, Arman Cohan, Iz Beltagy, Hannaneh Hajishirzi

Our approach outperforms two competitive baselines on three scientific claim verification datasets, with particularly strong performance in zero / few-shot domain adaptation experiments.

Claim Verification Domain Adaptation +1

Multi-Vector Models with Textual Guidance for Fine-Grained Scientific Document Similarity

1 code implementation NAACL 2022 Sheshera Mysore, Arman Cohan, Tom Hope

We present a new scientific document similarity model based on matching fine-grained aspects of texts.

PRIMERA: Pyramid-based Masked Sentence Pre-training for Multi-document Summarization

2 code implementations ACL 2022 Wen Xiao, Iz Beltagy, Giuseppe Carenini, Arman Cohan

We introduce PRIMERA, a pre-trained model for multi-document representation with a focus on summarization that reduces the need for dataset-specific architectures and large amounts of fine-tuning labeled data.

Abstractive Text Summarization Document Summarization +1

FLEX: Unifying Evaluation for Few-Shot NLP

2 code implementations NeurIPS 2021 Jonathan Bragg, Arman Cohan, Kyle Lo, Iz Beltagy

Few-shot NLP research is highly active, yet conducted in disjoint research threads with evaluation suites that lack challenging-yet-realistic testing setups and fail to employ careful experimental design.

Experimental Design Few-Shot Learning +1

Beyond Paragraphs: NLP for Long Sequences

1 code implementation NAACL 2021 Iz Beltagy, Arman Cohan, Hannaneh Hajishirzi, Sewon Min, Matthew E. Peters

In this tutorial, we aim at bringing interested NLP researchers up to speed about the recent and ongoing techniques for document-level representation learning.

Representation Learning

CDLM: Cross-Document Language Modeling

2 code implementations Findings (EMNLP) 2021 Avi Caciularu, Arman Cohan, Iz Beltagy, Matthew E. Peters, Arie Cattan, Ido Dagan

We introduce a new pretraining approach geared for multi-document language modeling, incorporating two key ideas into the masked language modeling self-supervised objective.

Citation Recommendation Coreference Resolution +6

ABNIRML: Analyzing the Behavior of Neural IR Models

2 code implementations2 Nov 2020 Sean MacAvaney, Sergey Feldman, Nazli Goharian, Doug Downey, Arman Cohan

Pretrained contextualized language models such as BERT and T5 have established a new state-of-the-art for ad-hoc search.

Language Modelling

SLEDGE-Z: A Zero-Shot Baseline for COVID-19 Literature Search

no code implementations EMNLP 2020 Sean MacAvaney, Arman Cohan, Nazli Goharian

With worldwide concerns surrounding the Severe Acute Respiratory Syndrome Coronavirus 2 (SARS-CoV-2), there is a rapidly growing body of scientific literature on the virus.

Re-Ranking

SLEDGE: A Simple Yet Effective Baseline for COVID-19 Scientific Knowledge Search

1 code implementation5 May 2020 Sean MacAvaney, Arman Cohan, Nazli Goharian

In this work, we present a search system called SLEDGE, which utilizes SciBERT to effectively re-rank articles.

Fact or Fiction: Verifying Scientific Claims

2 code implementations EMNLP 2020 David Wadden, Shanchuan Lin, Kyle Lo, Lucy Lu Wang, Madeleine van Zuylen, Arman Cohan, Hannaneh Hajishirzi

We introduce scientific claim verification, a new task to select abstracts from the research literature containing evidence that SUPPORTS or REFUTES a given scientific claim, and to identify rationales justifying each decision.

Claim Verification Domain Adaptation +1

SPECTER: Document-level Representation Learning using Citation-informed Transformers

5 code implementations ACL 2020 Arman Cohan, Sergey Feldman, Iz Beltagy, Doug Downey, Daniel S. Weld

We propose SPECTER, a new method to generate document-level embedding of scientific documents based on pretraining a Transformer language model on a powerful signal of document-level relatedness: the citation graph.

Citation Prediction Document Classification +3

Longformer: The Long-Document Transformer

16 code implementations10 Apr 2020 Iz Beltagy, Matthew E. Peters, Arman Cohan

To address this limitation, we introduce the Longformer with an attention mechanism that scales linearly with sequence length, making it easy to process documents of thousands of tokens or longer.

Language Modelling Question Answering +1

Ranking Significant Discrepancies in Clinical Reports

no code implementations18 Jan 2020 Sean MacAvaney, Arman Cohan, Nazli Goharian, Ross Filice

This allows medical practitioners to easily identify and learn from the reports in which their interpretation most substantially differed from that of the attending physician (who finalized the report).

SUPP.AI: Finding Evidence for Supplement-Drug Interactions

1 code implementation ACL 2020 Lucy Lu Wang, Oyvind Tafjord, Arman Cohan, Sarthak Jain, Sam Skjonsberg, Carissa Schoenick, Nick Botner, Waleed Ammar

We fine-tune the contextualized word representations of the RoBERTa language model using labeled DDI data, and apply the fine-tuned model to identify supplement interactions.

General Classification Language Modelling

Pretrained Language Models for Sequential Sentence Classification

1 code implementation IJCNLP 2019 Arman Cohan, Iz Beltagy, Daniel King, Bhavana Dalvi, Daniel S. Weld

As a step toward better document-level understanding, we explore classification of a sequence of sentences into their corresponding categories, a task that requires understanding sentences in context of the document.

Classification General Classification +1

Ontology-Aware Clinical Abstractive Summarization

no code implementations14 May 2019 Sean MacAvaney, Sajad Sotudeh, Arman Cohan, Nazli Goharian, Ish Talati, Ross W. Filice

Automatically generating accurate summaries from clinical reports could save a clinician's time, improve summary coverage, and reduce errors.

Abstractive Text Summarization

Structural Scaffolds for Citation Intent Classification in Scientific Publications

1 code implementation NAACL 2019 Arman Cohan, Waleed Ammar, Madeleine van Zuylen, Field Cady

Identifying the intent of a citation in scientific papers (e. g., background information, use of methods, comparing results) is critical for machine reading of individual publications and automated analysis of the scientific literature.

Citation Intent Classification Classification +5

SciBERT: A Pretrained Language Model for Scientific Text

5 code implementations IJCNLP 2019 Iz Beltagy, Kyle Lo, Arman Cohan

Obtaining large-scale annotated data for NLP tasks in the scientific domain is challenging and expensive.

 Ranked #1 on Sentence Classification on Paper Field (using extra training data)

Citation Intent Classification Dependency Parsing +6

Depression and Self-Harm Risk Assessment in Online Forums

no code implementations EMNLP 2017 Andrew Yates, Arman Cohan, Nazli Goharian

We propose methods for identifying posts in support communities that may indicate a risk of self-harm, and demonstrate that our approach outperforms strong previously proposed methods for identifying such posts.

Identifying Harm Events in Clinical Care through Medical Narratives

no code implementations15 Aug 2017 Arman Cohan, Allan Fong, Raj Ratwani, Nazli Goharian

Preventable medical errors are estimated to be among the leading causes of injury and death in the United States.

Scientific document summarization via citation contextualization and scientific discourse

no code implementations12 Jun 2017 Arman Cohan, Nazli Goharian

We present a framework for scientific summarization which takes advantage of the citations and the scientific discourse structure.

Document Summarization Scientific Document Summarization +1

Contextualizing Citations for Scientific Summarization using Word Embeddings and Domain Knowledge

no code implementations23 May 2017 Arman Cohan, Nazli Goharian

Citation texts are sometimes not very informative or in some cases inaccurate by themselves; they need the appropriate context from the referenced paper to reflect its exact contributions.

Word Embeddings

Scientific Article Summarization Using Citation-Context and Article's Discourse Structure

1 code implementation EMNLP 2015 Arman Cohan, Nazli Goharian

We propose a summarization approach for scientific articles which takes advantage of citation-context and the document discourse model.

A Neural Attention Model for Categorizing Patient Safety Events

no code implementations23 Feb 2017 Arman Cohan, Allan Fong, Nazli Goharian, Raj Ratwani

Medical errors are leading causes of death in the US and as such, prevention of these errors is paramount to promoting health care.

Triaging Content Severity in Online Mental Health Forums

no code implementations22 Feb 2017 Arman Cohan, Sydney Young, Andrew Yates, Nazli Goharian

Our analysis on the interaction of the moderators with the users further indicates that without an automatic way to identify critical content, it is indeed challenging for the moderators to provide timely response to the users in need.

Revisiting Summarization Evaluation for Scientific Articles

1 code implementation LREC 2016 Arman Cohan, Nazli Goharian

Finally, we propose an alternative metric for summarization evaluation which is based on the content relevance between a system generated summary and the corresponding human written summaries.

Text Summarization

Cannot find the paper you are looking for? You can Submit a new open access paper.