no code implementations • COLING 2022 • Fei-Tzin Lee, Miguel Ballesteros, Feng Nan, Kathleen McKeown
Large pretrained language models offer powerful generation capabilities, but cannot be reliably controlled at a sub-sentential level.
no code implementations • VarDial (COLING) 2020 • Alyssa Hwang, William R. Frey, Kathleen McKeown
Researchers in natural language processing have developed large, robust resources for understanding formal Standard American English (SAE), but we lack similar resources for variations of English, such as slang and African American English (AAE).
no code implementations • EMNLP 2020 • Chris Kedzie, Kathleen McKeown
We study the degree to which neural sequence-to-sequence models exhibit fine-grained controllability when performing natural language generation from a meaning representation.
no code implementations • NAACL (ACL) 2022 • Kasturi Bhattacharjee, Rashmi Gangadharaiah, Kathleen McKeown, Dan Roth
Users often leave feedback on a myriad of aspects of a product which, if leveraged successfully, can help yield useful insights that can lead to further improvements down the line.
no code implementations • COLING 2022 • Elsbeth Turcan, David Wan, Faisal Ladhak, Petra Galuscakova, Sukanta Sen, Svetlana Tchistiakova, Weijia Xu, Marine Carpuat, Kenneth Heafield, Douglas Oard, Kathleen McKeown
Query-focused summaries of foreign-language, retrieved documents can help a user understand whether a document is actually relevant to the query term.
1 code implementation • EMNLP 2021 • Manling Li, Tengfei Ma, Mo Yu, Lingfei Wu, Tian Gao, Heng Ji, Kathleen McKeown
Timeline Summarization identifies major events from a news collection and describes them following temporal order, with key dates tagged.
1 code implementation • 2 Mar 2024 • Melanie Subbiah, Sean Zhang, Lydia B. Chilton, Kathleen McKeown
We evaluate recent Large language Models (LLMs) on the challenging task of summarizing short stories, which can be lengthy, and include nuanced subtext or scrambled timelines.
no code implementations • 28 Feb 2024 • Alyssa Hwang, Kalpit Dixit, Miguel Ballesteros, Yassine Benajiba, Vittorio Castelli, Markus Dreyer, Mohit Bansal, Kathleen McKeown
We present NewsQs (news-cues), a dataset that provides question-answer pairs for multiple news documents.
no code implementations • 26 Feb 2024 • Todd Morrill, Zhaoyuan Deng, Yanda Chen, Amith Ananthram, Colin Wayne Leach, Kathleen McKeown
Based on these results showing the utility of social orientation tags for dialogue outcome prediction tasks, we release our data sets, code, and models that are fine-tuned to predict social orientation tags on dialogue utterances.
1 code implementation • 20 Feb 2024 • Liyan Tang, Igor Shalyminov, Amy Wing-mei Wong, Jon Burnsky, Jake W. Vincent, Yu'an Yang, Siffi Singh, Song Feng, Hwanjun Song, Hang Su, Lijia Sun, Yi Zhang, Saab Mansour, Kathleen McKeown
We find that there are diverse errors and error distributions in model-generated summaries and that non-LLM based metrics can capture all error types better than LLM-based evaluators.
no code implementations • 19 Feb 2024 • Yanda Chen, Chen Zhao, Zhou Yu, Kathleen McKeown, He He
Pre-trained language models (LMs) are capable of in-context learning (ICL): they can adapt to a task with only a few examples given in the prompt without any parameter update.
1 code implementation • 14 Nov 2023 • Yusen Zhang, Nan Zhang, Yixin Liu, Alexander Fabbri, Junru Liu, Ryo Kamoi, Xiaoxin Lu, Caiming Xiong, Jieyu Zhao, Dragomir Radev, Kathleen McKeown, Rui Zhang
However, current work in summarization metrics and Large Language Models (LLMs) evaluation has not explored fair abstractive summarization.
1 code implementation • 29 Aug 2023 • Zachary Horvitz, Ajay Patel, Chris Callison-Burch, Zhou Yu, Kathleen McKeown
Our parameter-efficient approach, ParaGuide, leverages paraphrase-conditioned diffusion models alongside gradient-based guidance from both off-the-shelf classifiers and strong existing style embedders to transform the style of text while preserving semantic information.
no code implementations • 10 Aug 2023 • Alexander Hanbo Li, Mingyue Shang, Evangelia Spiliopoulou, Jie Ma, Patrick Ng, Zhiguo Wang, Bonan Min, William Wang, Kathleen McKeown, Vittorio Castelli, Dan Roth, Bing Xiang
We present a novel approach for structured data-to-text generation that addresses the limitations of existing methods that primarily focus on specific types of structured data.
no code implementations • 17 Jul 2023 • Yanda Chen, Ruiqi Zhong, Narutatsu Ri, Chen Zhao, He He, Jacob Steinhardt, Zhou Yu, Kathleen McKeown
To answer these questions, we propose to evaluate $\textbf{counterfactual simulatability}$ of natural language explanations: whether an explanation can enable humans to precisely infer the model's outputs on diverse counterfactuals of the explained input.
1 code implementation • 29 May 2023 • Gengyu Wang, Kate Harwood, Lawrence Chillrud, Amith Ananthram, Melanie Subbiah, Kathleen McKeown
We present a new fact-checking benchmark, Check-COVID, that requires systems to verify claims about COVID-19 from news using evidence from scientific articles.
1 code implementation • 28 May 2023 • Griffin Adams, Alexander R. Fabbri, Faisal Ladhak, Kathleen McKeown, Noémie Elhadad
Similarly, on 1k samples from CNN / DM, we show that prompting GPT-3 to follow EDU plans outperforms sampling-based methods by 1. 05 ROUGE-2 F1 points.
1 code implementation • 27 May 2023 • Adam Storek, Melanie Subbiah, Kathleen McKeown
To address this problem, unsupervised selective rationalization produces rationales alongside predictions by chaining two jointly-trained components, a rationale generator and a predictor.
no code implementations • 23 May 2023 • Kung-Hsiang Huang, Hou Pong Chan, Kathleen McKeown, Heng Ji
We present a novel task, identifying manipulation of news on social media, which aims to detect manipulation in social media posts and identify manipulated or inserted information.
no code implementations • 23 May 2023 • Nicholas Deas, Jessi Grieser, Shana Kleiner, Desmond Patton, Elsbeth Turcan, Kathleen McKeown
We evaluate how well LLMs understand African American Language (AAL) in comparison to their performance on White Mainstream English (WME), the encouraged "standard" form of English taught in American classrooms.
no code implementations • 22 May 2023 • Ajay Patel, Delip Rao, Ansh Kothary, Kathleen McKeown, Chris Callison-Burch
Style representation learning builds content-independent representations of author style in text.
1 code implementation • 6 Mar 2023 • David Wan, Mengwen Liu, Kathleen McKeown, Markus Dreyer, Mohit Bansal
We present a systematic study of the effect of generation techniques such as beam search and nucleus sampling on faithfulness in abstractive summarization.
1 code implementation • 31 Jan 2023 • Tianyi Zhang, Faisal Ladhak, Esin Durmus, Percy Liang, Kathleen McKeown, Tatsunori B. Hashimoto
Large language models (LLMs) have shown promise for automatic summarization but the reasons behind their successes are poorly understood.
1 code implementation • 31 Jan 2023 • Melanie Subbiah, Amrita Bhattacharjee, Yilun Hua, Tharindu Kumarage, Huan Liu, Kathleen McKeown
Manipulated news online is a growing problem which necessitates the use of automated systems to curtail its spread.
1 code implementation • 25 Jan 2023 • Kung-Hsiang Huang, Siffi Singh, Xiaofei Ma, Wei Xiao, Feng Nan, Nicholas Dingwall, William Yang Wang, Kathleen McKeown
Missing information is a common issue of dialogue summarization where some information in the reference summaries is not covered in the generated summaries.
no code implementations • 20 Dec 2022 • Yukun Huang, Yanda Chen, Zhou Yu, Kathleen McKeown
We propose to combine in-context learning objectives with language modeling objectives to distill both the ability to read in-context examples and task knowledge to the smaller models.
1 code implementation • 21 Nov 2022 • Noah Bergam, Emily Allaway, Kathleen McKeown
As a natural extension of this political stance detection, we propose the more specialized task of legal stance detection with our new dataset SC-stance, which matches written opinions to legal questions.
no code implementations • COLING (CreativeSumm) 2022 • Divyansh Agarwal, Alexander R. Fabbri, Simeng Han, Wojciech Kryściński, Faisal Ladhak, Bryan Li, Kathleen McKeown, Dragomir Radev, Tianyi Zhang, Sam Wiseman
We detail the process of curating these datasets for the task, as well as the metrics used for the evaluation of the submissions.
no code implementations • 9 Nov 2022 • Hardy Hardy, Miguel Ballesteros, Faisal Ladhak, Muhammad Khalifa, Vittorio Castelli, Kathleen McKeown
Summarizing novel chapters is a difficult task due to the input length and the fact that sentences that appear in the desired summaries draw content from multiple places throughout the chapter.
no code implementations • 18 Oct 2022 • Sharon Levy, Emily Allaway, Melanie Subbiah, Lydia Chilton, Desmond Patton, Kathleen McKeown, William Yang Wang
Understanding what constitutes safe text is an important issue in natural language processing and can often prevent the deployment of models deemed harmful and unsafe.
no code implementations • 17 Oct 2022 • Alex Mei, Anisha Kabir, Sharon Levy, Melanie Subbiah, Emily Allaway, John Judge, Desmond Patton, Bruce Bimber, Kathleen McKeown, William Yang Wang
An increasingly prevalent problem for intelligent technologies is text safety, as uncontrolled systems may generate recommendations to their users that lead to injury or life-threatening consequences.
1 code implementation • 16 Sep 2022 • Yanda Chen, Chen Zhao, Zhou Yu, Kathleen McKeown, He He
In-context learning (ICL) suffers from oversensitivity to the prompt, making it unreliable in real-world scenarios.
no code implementations • 23 May 2022 • Anish Saha, Amith Ananthram, Emily Allaway, Heng Ji, Kathleen McKeown
Practitioners from many disciplines (e. g., political science) use expert-crafted taxonomies to make sense of large, unlabeled corpora.
no code implementations • 23 May 2022 • Emily Allaway, Jena D. Hwang, Chandra Bhagavatula, Kathleen McKeown, Doug Downey, Yejin Choi
Generics express generalizations about the world (e. g., birds can fly) that are not universally true (e. g., newborn birds and penguins cannot fly).
1 code implementation • 13 Apr 2022 • Griffin Adams, Han-Chin Shing, Qing Sun, Christopher Winestock, Kathleen McKeown, Noémie Elhadad
In real-world scenarios with naturally occurring datasets, reference summaries are noisy and may contain information that cannot be inferred from the source text.
1 code implementation • Findings (ACL) 2022 • Chao Zhao, Tenghao Huang, Somnath Basu Roy Chowdhury, Muthu Kumar Chandrasekaran, Kathleen McKeown, Snigdha Chaturvedi
A common method for extractive multi-document news summarization is to re-formulate it as a single-document summarization problem by concatenating all documents as a single meta-document.
1 code implementation • 10 Mar 2022 • Kung-Hsiang Huang, Kathleen McKeown, Preslav Nakov, Yejin Choi, Heng Ji
Despite recent advances in detecting fake news generated by neural models, their results are not readily applicable to effective detection of human-written disinformation.
no code implementations • 27 Nov 2021 • Fei-Tzin Lee, Chris Kedzie, Nakul Verma, Kathleen McKeown
Prior work in AMR-based summarization has automatically merged the individual sentence graphs into a document graph, but the method of merging and its effects on summary content selection have not been independently evaluated.
no code implementations • EMNLP 2021 • Muhammad Khalifa, Miguel Ballesteros, Kathleen McKeown
Dialogue summarization comes with its own peculiar challenges as opposed to news or scientific articles summarization.
1 code implementation • EMNLP (sustainlp) 2021 • Gengyu Wang, Xiaochen Hou, Diyi Yang, Kathleen McKeown, Jing Huang
Large pre-trained language models (PLMs) have led to great success on various commonsense question answering (QA) tasks in an end-to-end fashion.
2 code implementations • ACL 2022 • Faisal Ladhak, Esin Durmus, He He, Claire Cardie, Kathleen McKeown
Despite recent progress in abstractive summarization, systems still suffer from faithfulness errors.
no code implementations • ACL 2021 • Muhao Chen, Hongming Zhang, Qiang Ning, Manling Li, Heng Ji, Kathleen McKeown, Dan Roth
This tutorial targets researchers and practitioners who are interested in AI technologies that help machines understand natural language text, particularly real-world events described in the text.
no code implementations • ACL 2021 • Yi Fung, Christopher Thomas, Revanth Gangi Reddy, Sandeep Polisetty, Heng Ji, Shih-Fu Chang, Kathleen McKeown, Mohit Bansal, Avi Sil
To defend against machine-generated fake news, an effective mechanism is urgently needed.
no code implementations • ACL 2021 • Yanda Chen, Chris Kedzie, Suraj Nair, Petra Galuščáková, Rui Zhang, Douglas W. Oard, Kathleen McKeown
This paper proposes an approach to cross-language sentence selection in a low-resource setting.
1 code implementation • NAACL 2021 • Elsbeth Turcan, Smaranda Muresan, Kathleen McKeown
The problem of detecting psychological stress in online posts, and more broadly, of detecting people in distress or in need of help, is a sensitive application for which the ability to interpret models is vital.
1 code implementation • NAACL 2021 • Emily Allaway, Malavika Srikanth, Kathleen McKeown
Stance detection on social media can help to identify and understand slanted news or commentary in everyday life.
1 code implementation • ACL 2021 • Feng Nan, Cicero Nogueira dos santos, Henghui Zhu, Patrick Ng, Kathleen McKeown, Ramesh Nallapati, Dejiao Zhang, Zhiguo Wang, Andrew O. Arnold, Bing Xiang
A commonly observed problem with the state-of-the art abstractive summarization models is that the generated summaries can be factually inconsistent with the input documents.
no code implementations • EACL 2021 • David Wan, Chris Kedzie, Faisal Ladhak, Elsbeth Turcan, Petra Galuščáková, Elena Zotkina, Zhengping Jiang, Peter Bell, Kathleen McKeown
Typical ASR systems segment the input audio into utterances using purely acoustic information, which may not resemble the sentence-like units that are expected by conventional machine translation (MT) systems for Spoken Language Translation.
2 code implementations • NAACL 2021 • Dejiao Zhang, Feng Nan, Xiaokai Wei, Shangwen Li, Henghui Zhu, Kathleen McKeown, Ramesh Nallapati, Andrew Arnold, Bing Xiang
Unsupervised clustering aims at discovering the semantic categories of data according to some distance measured in the representation space.
Ranked #1 on Short Text Clustering on AG News
1 code implementation • EACL 2021 • Feng Nan, Ramesh Nallapati, Zhiguo Wang, Cicero Nogueira dos santos, Henghui Zhu, Dejiao Zhang, Kathleen McKeown, Bing Xiang
A key challenge for abstractive summarization is ensuring factual consistency of the generated summary with respect to the original document.
no code implementations • EACL 2021 • Kailash Karthik Saravanakumar, Miguel Ballesteros, Muthu Kumar Chandrasekaran, Kathleen McKeown
We propose a method for online news stream clustering that is a variant of the non-parametric streaming K-means algorithm.
no code implementations • 4 Dec 2020 • Amith Ananthram, Emily Allaway, Kathleen McKeown
General purpose relation extraction has recently seen considerable gains in part due to a massively data-intensive distant supervision technique from Soares et al. (2019) that produces state-of-the-art results across many benchmarks.
1 code implementation • COLING 2020 • Efsun Sarioglu Kayi, Linyong Nan, Bohan Qu, Mona Diab, Kathleen McKeown
We adopt cross-lingual embeddings constructed using different methods to extract features of the tweets, including a few state-of-the-art contextual embeddings such as BERT, RoBERTa and XLM-R. We train classifiers of different architectures on the extracted features.
no code implementations • COLING 2020 • Amith Ananthram, Emily Allaway, Kathleen McKeown
General purpose relation extraction has recently seen considerable gains in part due to a massively data-intensive distant supervision technique from Soares et al. (2019) that produces state-of-the-art results across many benchmarks.
no code implementations • Findings of the Association for Computational Linguistics 2020 • Dejiao Zhang, Ramesh Nallapati, Henghui Zhu, Feng Nan, Cicero Nogueira dos santos, Kathleen McKeown, Bing Xiang
Unsupervised domain adaptation addresses the problem of leveraging labeled data in a source domain to learn a well-performing model in a target domain where labels are unavailable.
Cross-Lingual Document Classification Document Classification +2
1 code implementation • WMT (EMNLP) 2020 • David Wan, Chris Kedzie, Faisal Ladhak, Marine Carpuat, Kathleen McKeown
In this paper, we present both autoregressive and non-autoregressive models for lexically constrained APE, demonstrating that our approach enables preservation of 95% of the terminologies and also improves translation quality on English-German benchmarks.
no code implementations • 19 Oct 2020 • David Wan, Zhengping Jiang, Chris Kedzie, Elsbeth Turcan, Peter Bell, Kathleen McKeown
In this work, we focus on improving ASR output segmentation in the context of low-resource language speech-to-text translation.
1 code implementation • EMNLP 2020 • Emily Allaway, Kathleen McKeown
Stance detection is an important component of understanding hidden influences in everyday life.
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Faisal Ladhak, Esin Durmus, Claire Cardie, Kathleen McKeown
As a set of baselines for further studies, we evaluate the performance of existing cross-lingual abstractive summarization methods on our dataset.
Abstractive Text Summarization Cross-Lingual Abstractive Summarization +2
no code implementations • EACL 2021 • Emily Allaway, Kathleen McKeown
Ideological attitudes and stance are often expressed through subtle meanings of words and phrases.
1 code implementation • ACL 2020 • Faisal Ladhak, Bryan Li, Yaser Al-Onaizan, Kathleen McKeown
We present a new summarization task, generating summaries of novel chapters using summary/chapter pairs from online study guides.
no code implementations • EMNLP 2020 • Miguel Ballesteros, Rishita Anubhai, Shuai Wang, Nima Pourdamghani, Yogarshi Vyas, Jie Ma, Parminder Bhatia, Kathleen McKeown, Yaser Al-Onaizan
In this paper, we propose a neural architecture and a set of training methods for ordering events by predicting temporal relations.
1 code implementation • WS 2019 • Chris Kedzie, Kathleen McKeown
Deep neural networks (DNN) are quickly becoming the de facto standard modeling method for many natural language generation (NLG) tasks.
1 code implementation • WS 2019 • Elsbeth Turcan, Kathleen McKeown
Stress is a nigh-universal human experience, particularly in the online world.
no code implementations • IJCNLP 2019 • Serina Chang, Kathleen McKeown
In this paper, we pose the question: do people talk about women and men in different ways?
no code implementations • 19 Aug 2019 • Ruiqi Zhong, Steven Shao, Kathleen McKeown
While the general task of textual sentiment classification has been widely studied, much less research looks specifically at sentiment between a specified source and target.
1 code implementation • NAACL 2019 • Tuhin Chakrabarty, Christopher Hidey, Kathleen McKeown
Claims are the central component of an argument.
no code implementations • 24 Feb 2019 • Aditi Chaudhary, Siddharth Dalmia, Junjie Hu, Xinjian Li, Austin Matthews, Aldrian Obaja Muis, Naoki Otani, Shruti Rijhwani, Zaid Sheikh, Nidhi Vyas, Xinyi Wang, Jiateng Xie, Ruochen Xu, Chunting Zhou, Peter J. Jansen, Yiming Yang, Lori Levin, Florian Metze, Teruko Mitamura, David R. Mortensen, Graham Neubig, Eduard Hovy, Alan W. black, Jaime Carbonell, Graham V. Horwood, Shabnam Tafreshi, Mona Diab, Efsun S. Kayi, Noura Farra, Kathleen McKeown
This paper describes the ARIEL-CMU submissions to the Low Resource Human Language Technologies (LoReHLT) 2018 evaluations for the tasks Machine Translation (MT), Entity Discovery and Linking (EDL), and detection of Situation Frames in Text and Speech (SF Text and Speech).
2 code implementations • EMNLP 2018 • Chris Kedzie, Kathleen McKeown, Hal Daume III
We carry out experiments with deep learning models of summarization across the domains of news, personal stories, meetings, and medical articles in order to understand how content selection is performed.
no code implementations • WS 2018 • Rohan Kshirsagar, Tyus Cukuvac, Kathleen McKeown, Susan McGregor
We present a neural-network based approach to classifying online hate speech in general, as well as racist and sexist speech in particular.
1 code implementation • EMNLP 2018 • Serina Chang, Ruiqi Zhong, Ethan Adams, Fei-Tzin Lee, Siddharth Varia, Desmond Patton, William Frey, Chris Kedzie, Kathleen McKeown
Gang-involved youth in cities such as Chicago have increasingly turned to social media to post about their experiences and intents online.
no code implementations • 23 Jul 2018 • Philipp Blandfort, Desmond Patton, William R. Frey, Svebor Karaman, Surabhi Bhargava, Fei-Tzin Lee, Siddharth Varia, Chris Kedzie, Michael B. Gaskell, Rossano Schifanella, Kathleen McKeown, Shih-Fu Chang
In this paper we partnered computer scientists with social work researchers, who have domain expertise in gang violence, to analyze how public tweets with images posted by youth who mention gang associations on Twitter can be leveraged to automatically detect psychosocial factors and conditions that could potentially assist social workers and violence outreach workers in prevention and early intervention programs.
no code implementations • IJCNLP 2017 • Or Biran, Kathleen McKeown
RDF ontologies provide structured data on entities in many domains and continue to grow in size and diversity.
no code implementations • 13 Aug 2017 • Tao Yu, Christopher Hidey, Owen Rambow, Kathleen McKeown
This model outperforms many deep learning models and achieves comparable results to other deep learning models with complex architectures on sentiment analysis datasets.
no code implementations • EACL 2017 • Noura Farra, Kathleen McKeown
We consider entity-level sentiment analysis in Arabic, a morphologically rich language with increasing resources.
no code implementations • COLING 2016 • Terra Blevins, Robert Kwiatkowski, Jamie MacBeth, Kathleen McKeown, Desmond Patton, Owen Rambow
Violence is a serious problems for cities like Chicago and has been exacerbated by the use of social media by gang-involved youths for taunting rival gangs.
no code implementations • 28 Sep 2016 • Desmond Upton Patton, Kathleen McKeown, Owen Rambow, Jamie MacBeth
The U. S. has the highest rate of firearm-related deaths when compared to other industrialized countries.
no code implementations • 12 May 2016 • Chris Kedzie, Fernando Diaz, Kathleen McKeown
We present a system based on sequential decision making for the online summarization of massive document streams, such as those found on the web.
no code implementations • LREC 2012 • Jacob Andreas, Sara Rosenthal, Kathleen McKeown
We introduce a new corpus of sentence-level agreement and disagreement annotations over LiveJournal and Wikipedia threads.