Search Results for author: Shabnam Behzad

Found 11 papers, 8 papers with code

Data Checklist: On Unit-Testing Datasets with Usable Information

1 code implementation6 Aug 2024 Heidi C. Zhang, Shabnam Behzad, Kawin Ethayarajh, Dan Jurafsky

Model checklists (Ribeiro et al., 2020) have emerged as a useful tool for understanding the behavior of LLMs, analogous to unit-testing in software engineering.

MultiMUC: Multilingual Template Filling on MUC-4

1 code implementation29 Jan 2024 William Gantt, Shabnam Behzad, Hannah Youngeun An, Yunmo Chen, Aaron Steven White, Benjamin Van Durme, Mahsa Yarmohammadi

We introduce MultiMUC, the first multilingual parallel corpus for template filling, comprising translations of the classic MUC-4 template filling benchmark into five languages: Arabic, Chinese, Farsi, Korean, and Russian.

Machine Translation Translation

GENTLE: A Genre-Diverse Multilayer Challenge Set for English NLP and Linguistic Evaluation

1 code implementation3 Jun 2023 Tatsuya Aoyama, Shabnam Behzad, Luke Gessler, Lauren Levine, Jessica Lin, Yang Janet Liu, Siyao Peng, YIlun Zhu, Amir Zeldes

We evaluate state-of-the-art NLP systems on GENTLE and find severe degradation for at least some genres in their performance on all tasks, which indicates GENTLE's utility as an evaluation dataset for NLP systems.

coreference-resolution Dependency Parsing +2

AMALGUM -- A Free, Balanced, Multilayer English Web Corpus

1 code implementation LREC 2020 Luke Gessler, Siyao Peng, Yang Liu, YIlun Zhu, Shabnam Behzad, Amir Zeldes

We present a freely available, genre-balanced English web corpus totaling 4M tokens and featuring a large number of high-quality automatic annotation layers, including dependency trees, non-named entity annotations, coreference resolution, and discourse trees in Rhetorical Structure Theory.

coreference-resolution

A Cross-Genre Ensemble Approach to Robust Reddit Part of Speech Tagging

1 code implementation LREC 2020 Shabnam Behzad, Amir Zeldes

However, when these models are applied to other corpora with different genres, and especially user-generated data from the Web, we see substantial drops in performance.

Part-Of-Speech Tagging

Cannot find the paper you are looking for? You can Submit a new open access paper.