Search Results for author: Winston Wu

Found 29 papers, 7 papers with code

Deciphering and Characterizing Out-of-Vocabulary Words for Morphologically Rich Languages

no code implementations COLING 2022 Georgie Botev, Arya D. McCarthy, Winston Wu, David Yarowsky

This paper presents a detailed foundational empirical case study of the nature of out-of-vocabulary words encountered in modern text in a moderate-resource language such as Bulgarian, and a multi-faceted distributional analysis of the underlying word-formation processes that can aid in their compositional translation, tagging, parsing, language modeling, and other NLP tasks.

Language Modelling Machine Translation +1

On Pronunciations in Wiktionary: Extraction and Experiments on Multilingual Syllabification and Stress Prediction

no code implementations RANLP (BUCC) 2021 Winston Wu, David Yarowsky

We constructed parsers for five non-English editions of Wiktionary, which combined with pronunciations from the English edition, comprises over 5. 3 million IPA pronunciations, the largest pronunciation lexicon of its kind.

On the Robustness of Cognate Generation Models

no code implementations LREC 2022 Winston Wu, David Yarowsky

We evaluate two popular neural cognate generation models’ robustness to several types of human-plausible noise (deletion, duplication, swapping, and keyboard errors, as well as a new type of error, phonological errors).

Vocal Bursts Type Prediction

MOKA: Moral Knowledge Augmentation for Moral Event Extraction

1 code implementation16 Nov 2023 Xinliang Frederick Zhang, Winston Wu, Nick Beauchamp, Lu Wang

News media employ moral language to create memorable stories, and readers often engage with the content that align with their values.

Event Extraction Moral Scenarios

EASE: An Easily-Customized Annotation System Powered by Efficiency Enhancement Mechanisms

no code implementations23 May 2023 Naihao Deng, YiKai Liu, Mingye Chen, Winston Wu, Siyang Liu, Yulong Chen, Yue Zhang, Rada Mihalcea

Our results show that our system can meet the diverse needs of NLP researchers and significantly accelerate the annotation process.

Active Learning

Institutional ownership and liquidity commonality: evidence from Australia

no code implementations7 Nov 2022 Reza Bradrania, Robert Elliott, Winston Wu

We find that commonality in liquidity is higher for large stocks compared to small stocks in the cross-section of stocks, and the spread between the two has increased over the past two decades.

Late Fusion with Triplet Margin Objective for Multimodal Ideology Prediction and Analysis

no code implementations4 Nov 2022 Changyuan Qiu, Winston Wu, Xinliang Frederick Zhang, Lu Wang

In this work, we introduce the task of multimodal ideology prediction, where a model predicts binary or five-point scale ideological leanings, given a text-image pair with political content.

Evaluating Neural Model Robustness for Machine Comprehension

no code implementations EACL 2021 Winston Wu, Dustin Arendt, Svitlana Volkova

We evaluate neural model robustness to adversarial attacks using different types of linguistic unit perturbations {--} character and word, and propose a new method for strategic sentence-level perturbations.

Adversarial Attack Reading Comprehension +2

Wiktionary Normalization of Translations and Morphological Information

no code implementations COLING 2020 Winston Wu, David Yarowsky

We extend the Yawipa Wiktionary Parser (Wu and Yarowsky, 2020) to extract and normalize translations from etymology glosses, and morphological form-of relations, resulting in 300K unique translations and over 4 million instances of 168 annotated morphological relations.

Translation

The JHU Submission to the 2020 Duolingo Shared Task on Simultaneous Translation and Paraphrase for Language Education

no code implementations WS 2020 Huda Khayrallah, Jacob Bremerman, Arya D. McCarthy, Kenton Murray, Winston Wu, Matt Post

This paper presents the Johns Hopkins University submission to the 2020 Duolingo Shared Task on Simultaneous Translation and Paraphrase for Language Education (STAPLE).

Machine Translation Translation

Computational Etymology and Word Emergence

no code implementations LREC 2020 Winston Wu, David Yarowsky

We developed an extensible, comprehensive Wiktionary parser that improves over several existing parsers.

Evaluating Neural Machine Comprehension Model Robustness to Noisy Inputs and Adversarial Attacks

no code implementations1 May 2020 Winston Wu, Dustin Arendt, Svitlana Volkova

We evaluate machine comprehension models' robustness to noise and adversarial attacks by performing novel perturbations at the character, word, and sentence level.

Reading Comprehension Sentence

JHUBC's Submission to LT4HALA EvaLatin 2020

no code implementations LREC 2020 Winston Wu, Garrett Nicolai

We describe the JHUBC submission to the EvaLatin Shared task on lemmatization and part-of-speech tagging for Latin.

Lemmatization Part-Of-Speech Tagging +1

Multilingual Dictionary Based Construction of Core Vocabulary

no code implementations LREC 2020 Winston Wu, Garrett Nicolai, David Yarowsky

We propose a new functional definition and construction method for core vocabulary sets for multiple applications based on the relative coverage of a target concept in thousands of bilingual dictionaries.

Cognate Prediction Machine Translation +1

Fine-grained Morphosyntactic Analysis and Generation Tools for More Than One Thousand Languages

no code implementations LREC 2020 Garrett Nicolai, Dylan Lewis, Arya D. McCarthy, Aaron Mueller, Winston Wu, David Yarowsky

Exploiting the broad translation of the Bible into the world{'}s languages, we train and distribute morphosyntactic tools for approximately one thousand languages, vastly outstripping previous distributions of tools devoted to the processing of inflectional morphology.

Translation

An Analysis of Massively Multilingual Neural Machine Translation for Low-Resource Languages

no code implementations LREC 2020 Aaron Mueller, Garrett Nicolai, Arya D. McCarthy, Dylan Lewis, Winston Wu, David Yarowsky

We find that best practices in this domain are highly language-specific: adding more languages to a training set is often better, but too many harms performance{---}the best number depends on the source language.

Low-Resource Neural Machine Translation Translation

Modeling Color Terminology Across Thousands of Languages

1 code implementation IJCNLP 2019 Arya D. McCarthy, Winston Wu, Aaron Mueller, Bill Watson, David Yarowsky

There is an extensive history of scholarship into what constitutes a "basic" color term, as well as a broadly attested acquisition sequence of basic color terms across many languages, as articulated in the seminal work of Berlin and Kay (1969).

Cannot find the paper you are looking for? You can Submit a new open access paper.