no code implementations • 30 Oct 2024 • Haoyi Qiu, Alexander R. Fabbri, Divyansh Agarwal, Kung-Hsiang Huang, Sarah Tan, Nanyun Peng, Chien-Sheng Wu
To address these, we introduce CASA, a benchmark designed to assess LLM agents' sensitivity to cultural and social norms across two web-based tasks: online shopping and social discussion forums.
no code implementations • 24 Apr 2024 • Divyansh Agarwal, Alexander R. Fabbri, Ben Risher, Philippe Laban, Shafiq Joty, Chien-Sheng Wu
We measure the mitigation effect of 7 black-box defense strategies, along with finetuning an open-source model to defend against leakage attempts.
no code implementations • 25 Sep 2023 • Tuhin Chakrabarty, Philippe Laban, Divyansh Agarwal, Smaranda Muresan, Chien-Sheng Wu
Inspired by the Torrance Test of Creative Thinking (TTCT), which measures creativity as a process, we use the Consensual Assessment Technique [3] and propose the Torrance Test of Creative Writing (TTCW) to evaluate creativity as a product.
no code implementations • 10 Jul 2023 • Md Junaid Mahmood, Pranaw Raj, Divyansh Agarwal, Suruchi Kumari, Pravendra Singh
To evaluate the performance of our proposed approach, we conduct experiments on two publicly available medical image classification benchmark datasets: the skin lesion classification (ISIC 2018) and the blood cell classification dataset (BCCD).
1 code implementation • 23 May 2023 • Philippe Laban, Wojciech Kryściński, Divyansh Agarwal, Alexander R. Fabbri, Caiming Xiong, Shafiq Joty, Chien-Sheng Wu
To address this, we propose a new protocol for inconsistency detection benchmark creation and implement it in a 10-domain benchmark called SummEdits.
1 code implementation • 17 Dec 2022 • Rui Meng, Ye Liu, Semih Yavuz, Divyansh Agarwal, Lifu Tu, Ning Yu, JianGuo Zhang, Meghana Bhat, Yingbo Zhou
In this study, we aim to develop unsupervised methods for improving dense retrieval models.
no code implementations • COLING (CreativeSumm) 2022 • Divyansh Agarwal, Alexander R. Fabbri, Simeng Han, Wojciech Kryściński, Faisal Ladhak, Bryan Li, Kathleen McKeown, Dragomir Radev, Tianyi Zhang, Sam Wiseman
We detail the process of curating these datasets for the task, as well as the metrics used for the evaluation of the submissions.
2 code implementations • 18 May 2021 • Wojciech Kryściński, Nazneen Rajani, Divyansh Agarwal, Caiming Xiong, Dragomir Radev
The majority of available text summarization datasets include short-form source documents that lack long-range causal and temporal dependencies, and often contain strong layout and stylistic biases.
no code implementations • 30 Nov 2020 • Divyansh Agarwal, Yuta Baba, Pratik Sachdeva, Tanya Tandon, Thomas Vetterli, Aziz Alghunaim
\textit{Tarjimly} aims to overcome the barriers by providing a platform capable of matching bilingual volunteers to displaced persons or aid workers in need of translating.
no code implementations • 6 Aug 2018 • Divyansh Agarwal, Nancy R. Zhang
The advantage of Semblance lies in its distribution free formulation and its ability to detect niche features by placing greater emphasis on similarity between observation pairs that fall at the outskirts of the data distribution, as opposed to those that fall towards the center.