Search Results for author: Winston Wu

Found 29 papers, 7 papers with code

Deciphering and Characterizing Out-of-Vocabulary Words for Morphologically Rich Languages

no code implementations • COLING 2022 • Georgie Botev, Arya D. McCarthy, Winston Wu, David Yarowsky

This paper presents a detailed foundational empirical case study of the nature of out-of-vocabulary words encountered in modern text in a moderate-resource language such as Bulgarian, and a multi-faceted distributional analysis of the underlying word-formation processes that can aid in their compositional translation, tagging, parsing, language modeling, and other NLP tasks.

Language Modelling Machine Translation +1

Paper
Add Code

On Pronunciations in Wiktionary: Extraction and Experiments on Multilingual Syllabification and Stress Prediction

no code implementations • RANLP (BUCC) 2021 • Winston Wu, David Yarowsky

We constructed parsers for five non-English editions of Wiktionary, which combined with pronunciations from the English edition, comprises over 5. 3 million IPA pronunciations, the largest pronunciation lexicon of its kind.

Paper
Add Code

On the Robustness of Cognate Generation Models

no code implementations • LREC 2022 • Winston Wu, David Yarowsky

We evaluate two popular neural cognate generation models’ robustness to several types of human-plausible noise (deletion, duplication, swapping, and keyboard errors, as well as a new type of error, phonological errors).

Vocal Bursts Type Prediction

Paper
Add Code

Known Words Will Do: Unknown Concept Translation via Lexical Relations

no code implementations • loresmt (COLING) 2022 • Winston Wu, David Yarowsky

Translating into low-resource languages is challenging due to the scarcity of training data.

Translation

Paper
Add Code

Sequence Models for Computational Etymology of Borrowings

no code implementations • Findings (ACL) 2021 • Winston Wu, Kevin Duh, David Yarowsky

Paper
Add Code

MOKA: Moral Knowledge Augmentation for Moral Event Extraction

1 code implementation • 16 Nov 2023 • Xinliang Frederick Zhang, Winston Wu, Nick Beauchamp, Lu Wang

News media employ moral language to create memorable stories, and readers often engage with the content that align with their values.

Event Extraction Moral Scenarios

Paper
Code

Crossing the Aisle: Unveiling Partisan and Counter-Partisan Events in News Reporting

1 code implementation • 28 Oct 2023 • Kaijian Zou, Xinliang Frederick Zhang, Winston Wu, Nick Beauchamp, Lu Wang

We benchmark PAC to highlight the challenges of this task.

Paper
Code

You Are What You Annotate: Towards Better Models through Annotator Representations

1 code implementation • 24 May 2023 • Naihao Deng, Xinliang Frederick Zhang, Siyang Liu, Winston Wu, Lu Wang, Rada Mihalcea

Annotator disagreement is ubiquitous in natural language processing (NLP) tasks.

Paper
Code

EASE: An Easily-Customized Annotation System Powered by Efficiency Enhancement Mechanisms

no code implementations • 23 May 2023 • Naihao Deng, YiKai Liu, Mingye Chen, Winston Wu, Siyang Liu, Yulong Chen, Yue Zhang, Rada Mihalcea

Our results show that our system can meet the diverse needs of NLP researchers and significantly accelerate the annotation process.

Active Learning

Paper
Add Code

Has It All Been Solved? Open NLP Research Questions Not Solved by Large Language Models

no code implementations • 21 May 2023 • Oana Ignat, Zhijing Jin, Artem Abzaliev, Laura Biester, Santiago Castro, Naihao Deng, Xinyi Gao, Aylin Gunal, Jacky He, Ashkan Kazemi, Muhammad Khalifa, Namho Koh, Andrew Lee, Siyang Liu, Do June Min, Shinka Mori, Joan Nwatu, Veronica Perez-Rosas, Siqi Shen, Zekun Wang, Winston Wu, Rada Mihalcea

Not surprisingly, this has, in turn, made many NLP researchers -- especially those at the beginning of their careers -- worry about what NLP research area they should focus on.

Paper
Add Code

Institutional ownership and liquidity commonality: evidence from Australia

no code implementations • 7 Nov 2022 • Reza Bradrania, Robert Elliott, Winston Wu

We find that commonality in liquidity is higher for large stocks compared to small stocks in the cross-section of stocks, and the spread between the two has increased over the past two decades.

Paper
Add Code

Late Fusion with Triplet Margin Objective for Multimodal Ideology Prediction and Analysis

no code implementations • 4 Nov 2022 • Changyuan Qiu, Winston Wu, Xinliang Frederick Zhang, Lu Wang

In this work, we introduce the task of multimodal ideology prediction, where a model predicts binary or five-point scale ideological leanings, given a text-image pair with political content.

Paper
Add Code

Evaluating Neural Model Robustness for Machine Comprehension

no code implementations • EACL 2021 • Winston Wu, Dustin Arendt, Svitlana Volkova

We evaluate neural model robustness to adversarial attacks using different types of linguistic unit perturbations {--} character and word, and propose a new method for strategic sentence-level perturbations.

Adversarial Attack Reading Comprehension +2

Paper
Add Code

Neural Transduction for Multilingual Lexical Translation

no code implementations • COLING 2020 • Dylan Lewis, Winston Wu, Arya D. McCarthy, David Yarowsky

We present a method for completing multilingual translation dictionaries.

Translation

Paper
Add Code

Wiktionary Normalization of Translations and Morphological Information

no code implementations • COLING 2020 • Winston Wu, David Yarowsky

We extend the Yawipa Wiktionary Parser (Wu and Yarowsky, 2020) to extract and normalize translations from etymology glosses, and morphological form-of relations, resulting in 300K unique translations and over 4 million instances of 168 annotated morphological relations.

Translation

Paper
Add Code

The JHU Submission to the 2020 Duolingo Shared Task on Simultaneous Translation and Paraphrase for Language Education

no code implementations • WS 2020 • Huda Khayrallah, Jacob Bremerman, Arya D. McCarthy, Kenton Murray, Winston Wu, Matt Post

This paper presents the Johns Hopkins University submission to the 2020 Duolingo Shared Task on Simultaneous Translation and Paraphrase for Language Education (STAPLE).

Machine Translation Translation

Paper
Add Code

Computational Etymology and Word Emergence

no code implementations • LREC 2020 • Winston Wu, David Yarowsky

We developed an extensible, comprehensive Wiktionary parser that improves over several existing parsers.

Paper
Add Code

Evaluating Neural Machine Comprehension Model Robustness to Noisy Inputs and Adversarial Attacks

no code implementations • 1 May 2020 • Winston Wu, Dustin Arendt, Svitlana Volkova

We evaluate machine comprehension models' robustness to noise and adversarial attacks by performing novel perturbations at the character, word, and sentence level.

Reading Comprehension Sentence

Paper
Add Code

JHUBC's Submission to LT4HALA EvaLatin 2020

no code implementations • LREC 2020 • Winston Wu, Garrett Nicolai

We describe the JHUBC submission to the EvaLatin Shared task on lemmatization and part-of-speech tagging for Latin.

Lemmatization Part-Of-Speech Tagging +1

Paper
Add Code

The Johns Hopkins University Bible Corpus: 1600+ Tongues for Typological Exploration

no code implementations • LREC 2020 • Arya D. McCarthy, Rachel Wicks, Dylan Lewis, Aaron Mueller, Winston Wu, Oliver Adams, Garrett Nicolai, Matt Post, David Yarowsky

The corpus consists of over 4000 unique translations of the Christian Bible and counting.

Paper
Add Code

Multilingual Dictionary Based Construction of Core Vocabulary

no code implementations • LREC 2020 • Winston Wu, Garrett Nicolai, David Yarowsky

We propose a new functional definition and construction method for core vocabulary sets for multiple applications based on the relative coverage of a target concept in thousands of bilingual dictionaries.

Cognate Prediction Machine Translation +1

Paper
Add Code

Fine-grained Morphosyntactic Analysis and Generation Tools for More Than One Thousand Languages

no code implementations • LREC 2020 • Garrett Nicolai, Dylan Lewis, Arya D. McCarthy, Aaron Mueller, Winston Wu, David Yarowsky

Exploiting the broad translation of the Bible into the world{'}s languages, we train and distribute morphosyntactic tools for approximately one thousand languages, vastly outstripping previous distributions of tools devoted to the processing of inflectional morphology.

Translation

Paper
Add Code

An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource Languages

no code implementations • LREC 2020 • Aaron Mueller, Garrett Nicolai, Arya D. McCarthy, Dylan Lewis, Winston Wu, David Yarowsky

We find that best practices in this domain are highly language-specific: adding more languages to a training set is often better, but too many harms performance{---}the best number depends on the source language.

Low-Resource Neural Machine Translation Translation

Paper
Add Code

Modeling Color Terminology Across Thousands of Languages

1 code implementation • IJCNLP 2019 • Arya D. McCarthy, Winston Wu, Aaron Mueller, Bill Watson, David Yarowsky

There is an extensive history of scholarship into what constitutes a "basic" color term, as well as a broadly attested acquisition sequence of basic color terms across many languages, as articulated in the seminal work of Berlin and Kay (1969).