Search Results for author: Marzena Karpinska

Found 11 papers, 6 papers with code

Revisiting Statistical Laws of Semantic Shift in Romance Cognates

no code implementations • COLING 2022 • Yoshifumi Kawasaki, Maëlys Salingre, Marzena Karpinska, Hiroya Takamura, Ryo Nagata

This article revisits statistical relationships across Romance cognates between lexical semantic shift and six intra-linguistic variables, such as frequency and polysemy.

Word Embeddings

Paper
Add Code

FABLES: Evaluating faithfulness and content selection in book-length summarization

3 code implementations • 1 Apr 2024 • Yekyung Kim, Yapei Chang, Marzena Karpinska, Aparna Garimella, Varun Manjunatha, Kyle Lo, Tanya Goyal, Mohit Iyyer

While LLM-based auto-raters have proven reliable for factuality and coherence in other settings, we implement several LLM raters of faithfulness and find that none correlates strongly with human annotations, especially with regard to detecting unfaithful claims.

Long-Context Understanding

1,085

Paper
Code

Aurora-M: The First Open Source Multilingual Language Model Red-teamed according to the U.S. Executive Order

no code implementations • 30 Mar 2024 • Taishi Nakamura, Mayank Mishra, Simone Tedeschi, Yekun Chai, Jason T Stillerman, Felix Friedrich, Prateek Yadav, Tanmay Laud, Vu Minh Chien, Terry Yue Zhuo, Diganta Misra, Ben Bogin, Xuan-Son Vu, Marzena Karpinska, Arnav Varma Dantuluri, Wojciech Kusa, Tommaso Furlanello, Rio Yokota, Niklas Muennighoff, Suhas Pai, Tosin Adewumi, Veronika Laippala, Xiaozhe Yao, Adalberto Junior, Alpay Ariyak, Aleksandr Drozd, Jordan Clive, Kshitij Gupta, Liangyu Chen, Qi Sun, Ken Tsui, Noah Persaud, Nour Fahmy, Tianlong Chen, Mohit Bansal, Nicolo Monti, Tai Dang, Ziyang Luo, Tien-Tung Bui, Roberto Navigli, Virendra Mehta, Matthew Blumberg, Victor May, Huu Nguyen, Sampo Pyysalo

Pretrained language models underpin several AI applications, but their high computational cost for training limits accessibility.

Continual Pretraining Language Modelling

Paper
Add Code

Large language models effectively leverage document-level context for literary translation, but critical errors persist

1 code implementation • 6 Apr 2023 • Marzena Karpinska, Mohit Iyyer

Large language models (LLMs) are competitive with the state of the art on a wide range of sentence-level translation datasets.

Sentence Translation

Paper
Code

Paraphrasing evades detectors of AI-generated text, but retrieval is an effective defense

1 code implementation • NeurIPS 2023 • Kalpesh Krishna, Yixiao Song, Marzena Karpinska, John Wieting, Mohit Iyyer

To increase the robustness of AI-generated text detection to paraphrase attacks, we introduce a simple defense that relies on retrieving semantically-similar generations and must be maintained by a language model API provider.

Language Modelling Outlier Detection +3

110

Paper
Code

Exploring Document-Level Literary Machine Translation with Parallel Paragraphs from World Literature

1 code implementation • 25 Oct 2022 • Katherine Thai, Marzena Karpinska, Kalpesh Krishna, Bill Ray, Moira Inghilleri, John Wieting, Mohit Iyyer

Using Par3, we discover that expert literary translators prefer reference human translations over machine-translated paragraphs at a rate of 84%, while state-of-the-art automatic MT metrics do not correlate with those preferences.

Machine Translation Translation

Paper
Code

DEMETR: Diagnosing Evaluation Metrics for Translation

1 code implementation • 25 Oct 2022 • Marzena Karpinska, Nishant Raj, Katherine Thai, Yixiao Song, Ankita Gupta, Mohit Iyyer

While machine translation evaluation metrics based on string overlap (e. g., BLEU) have their limitations, their computations are transparent: the BLEU score assigned to a particular candidate translation can be traced back to the presence or absence of certain words.

Machine Translation Translation

Paper
Code

ezCoref: Towards Unifying Annotation Guidelines for Coreference Resolution

1 code implementation • 13 Oct 2022 • Ankita Gupta, Marzena Karpinska, Wenlong Zhao, Kalpesh Krishna, Jack Merullo, Luke Yeh, Mohit Iyyer, Brendan O'Connor

Large-scale, high-quality corpora are critical for advancing research in coreference resolution.

coreference-resolution

Paper
Code

The Perils of Using Mechanical Turk to Evaluate Open-Ended Text Generation

no code implementations • EMNLP 2021 • Marzena Karpinska, Nader Akoury, Mohit Iyyer

Recent text generation research has increasingly focused on open-ended domains such as story and poetry generation.

Text Generation

Paper
Add Code

NarrativeTime: Dense Temporal Annotation on a Timeline

no code implementations • 29 Aug 2019 • Anna Rogers, Marzena Karpinska, Ankita Gupta, Vladislav Lialin, Gregory Smelkov, Anna Rumshisky

For the past decade, temporal annotation has been sparse: only a small portion of event pairs in a text was annotated.

Chunking

Paper
Add Code

Subcharacter Information in Japanese Embeddings: When Is It Worth It?

no code implementations • WS 2018 • Marzena Karpinska, Bofang Li, Anna Rogers, Aleks Drozd, R

Languages with logographic writing systems present a difficulty for traditional character-level models.

Text Classification

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.