Search Results for author: Kevin Duh

Found 95 papers, 18 papers with code

CLIRMatrix: A massively large collection of bilingual and multilingual datasets for Cross-Lingual Information Retrieval

no code implementations EMNLP 2020 Shuo Sun, Kevin Duh

We present CLIRMatrix, a massively large collection of bilingual and multilingual datasets for Cross-Lingual Information Retrieval extracted automatically from Wikipedia.

Information Retrieval

Non-autoregressive End-to-end Speech Translation with Parallel Autoregressive Rescoring

no code implementations9 Sep 2021 Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

We propose a unified NAR E2E-ST framework called Orthros, which has an NAR decoder and an auxiliary shallow AR decoder on top of the shared encoder.

Language Modelling Translation

Self-Guided Curriculum Learning for Neural Machine Translation

no code implementations10 May 2021 Lei Zhou, Liang Ding, Kevin Duh, Shinji Watanabe, Ryohei Sasano, Koichi Takeda

In the field of machine learning, the well-trained model is assumed to be able to recover the training labels, i. e. the synthetic labels predicted by the model should be as close to the ground-truth labels as possible.

Curriculum Learning Machine Translation +1

Adaptive Mixed Component LDA for Low Resource Topic Modeling

no code implementations EACL 2021 Suzanna Sia, Kevin Duh

Probabilistic topic models in low data resource scenarios are faced with less reliable estimates due to sparsity of discrete word co-occurrence counts, and do not have the luxury of retraining word or topic embeddings using neural methods.

Topic Models

Orthros: Non-autoregressive End-to-end Speech Translation with Dual-decoder

no code implementations25 Oct 2020 Hirofumi Inaguma, Yosuke Higuchi, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

Fast inference speed is an important goal towards real-world deployment of speech translation (ST) systems.

Translation

Very Deep Transformers for Neural Machine Translation

3 code implementations18 Aug 2020 Xiaodong Liu, Kevin Duh, Liyuan Liu, Jianfeng Gao

We explore the application of very deep Transformer models for Neural Machine Translation (NMT).

 Ranked #1 on Machine Translation on WMT2014 English-French (using extra training data)

Machine Translation Translation

CLIReval: Evaluating Machine Translation as a Cross-Lingual Information Retrieval Task

1 code implementation ACL 2020 Shuo Sun, Suzanna Sia, Kevin Duh

We present CLIReval, an easy-to-use toolkit for evaluating machine translation (MT) with the proxy task of cross-lingual information retrieval (CLIR).

Document Translation Information Retrieval +2

Modeling Document Interactions for Learning to Rank with Regularized Self-Attention

no code implementations8 May 2020 Shuo Sun, Kevin Duh

Learning to rank is an important task that has been successfully deployed in many real-world information retrieval systems.

Information Retrieval Learning-To-Rank

Benchmarking Neural and Statistical Machine Translation on Low-Resource African Languages

no code implementations LREC 2020 Kevin Duh, Paul McNamee, Matt Post, Brian Thompson

In this study, we benchmark state of the art statistical and neural machine translation systems on two African languages which do not have large amounts of resources: Somali and Swahili.

Machine Translation Translation

When Does Unsupervised Machine Translation Work?

no code implementations12 Apr 2020 Kelly Marchisio, Kevin Duh, Philipp Koehn

We additionally find that unsupervised MT performance declines when source and target languages use different scripts, and observe very poor performance on authentic low-resource language pairs.

Translation Unsupervised Machine Translation

Distill, Adapt, Distill: Training Small, In-Domain Models for Neural Machine Translation

no code implementations WS 2020 Mitchell A. Gordon, Kevin Duh

We explore best practices for training small, memory efficient machine translation models with sequence-level knowledge distillation in the domain adaptation setting.

Domain Adaptation Knowledge Distillation +2

Multilingual End-to-End Speech Translation

1 code implementation1 Oct 2019 Hirofumi Inaguma, Kevin Duh, Tatsuya Kawahara, Shinji Watanabe

In this paper, we propose a simple yet effective framework for multilingual end-to-end speech translation (ST), in which speech utterances in source languages are directly translated to the desired target languages with a universal sequence-to-sequence architecture.

automatic-speech-recognition Machine Translation +3

Broad-Coverage Semantic Parsing as Transduction

no code implementations IJCNLP 2019 Sheng Zhang, Xutai Ma, Kevin Duh, Benjamin Van Durme

We unify different broad-coverage semantic parsing tasks under a transduction paradigm, and propose an attention-based neural framework that incrementally builds a meaning representation via a sequence of semantic relations.

AMR Parsing UCCA Parsing

JHU System Description for the MADAR Arabic Dialect Identification Shared Task

no code implementations WS 2019 Tom Lippincott, Pamela Shapiro, Kevin Duh, Paul McNamee

Our submission to the MADAR shared task on Arabic dialect identification employed a language modeling technique called Prediction by Partial Matching, an ensemble of neural architectures, and sources of additional data for training word embeddings and auxiliary language models.

Dialect Identification Language Modelling +1

JHU 2019 Robustness Task System Description

no code implementations WS 2019 Matt Post, Kevin Duh

We describe the JHU submissions to the French{--}English, Japanese{--}English, and English{--}Japanese Robustness Task at WMT 2019.

Comparing Pipelined and Integrated Approaches to Dialectal Arabic Neural Machine Translation

no code implementations WS 2019 Pamela Shapiro, Kevin Duh

When translating diglossic languages such as Arabic, situations may arise where we would like to translate a text but do not know which dialect it is.

Dialect Identification Machine Translation +1

A Call for Prudent Choice of Subword Merge Operations in Neural Machine Translation

no code implementations WS 2019 Shuoyang Ding, Adithya Renduchintala, Kevin Duh

Most neural machine translation systems are built upon subword units extracted by methods such as Byte-Pair Encoding (BPE) or wordpiece.

Machine Translation Translation

AMR Parsing as Sequence-to-Graph Transduction

1 code implementation ACL 2019 Sheng Zhang, Xutai Ma, Kevin Duh, Benjamin Van Durme

Our experimental results outperform all previously reported SMATCH scores, on both AMR 2. 0 (76. 3% F1 on LDC2017T10) and AMR 1. 0 (70. 2% F1 on LDC2014T12).

AMR Parsing

Query Expansion for Cross-Language Question Re-Ranking

no code implementations16 Apr 2019 Muhammad Mahbubur Rahman, Sorami Hisamoto, Kevin Duh

Community question-answering (CQA) platforms have become very popular forums for asking and answering questions daily.

Community Question Answering Re-Ranking +1

The JHU Machine Translation Systems for WMT 2018

no code implementations WS 2018 Philipp Koehn, Kevin Duh, Brian Thompson

We report on the efforts of the Johns Hopkins University to develop neural machine translation systems for the shared task for news translation organized around the Conference for Machine Translation (WMT) 2018.

Machine Translation Translation

Cross-lingual Decompositional Semantic Parsing

no code implementations EMNLP 2018 Sheng Zhang, Xutai Ma, Rachel Rudinger, Kevin Duh, Benjamin Van Durme

We introduce the task of cross-lingual decompositional semantic parsing: mapping content provided in a source language into a decompositional semantic analysis based on a target language.

Semantic Parsing

Stochastic Answer Networks for SQuAD 2.0

5 code implementations24 Sep 2018 Xiaodong Liu, Wei Li, Yuwei Fang, Aerin Kim, Kevin Duh, Jianfeng Gao

This paper presents an extension of the Stochastic Answer Network (SAN), one of the state-of-the-art machine reading comprehension models, to be able to judge whether a question is unanswerable or not.

Machine Reading Comprehension Question Answering

Freezing Subnetworks to Analyze Domain Adaptation in Neural Machine Translation

1 code implementation WS 2018 Brian Thompson, Huda Khayrallah, Antonios Anastasopoulos, Arya D. McCarthy, Kevin Duh, Rebecca Marvin, Paul McNamee, Jeremy Gwinnup, Tim Anderson, Philipp Koehn

To better understand the effectiveness of continued training, we analyze the major components of a neural machine translation system (the encoder, decoder, and each embedding space) and consider each component's contribution to, and capacity for, domain adaptation.

Domain Adaptation Machine Translation +1

Character-Aware Decoder for Translation into Morphologically Rich Languages

no code implementations WS 2019 Adithya Renduchintala, Pamela Shapiro, Kevin Duh, Philipp Koehn

Neural machine translation (NMT) systems operate primarily on words (or sub-words), ignoring lower-level patterns of morphology.

Machine Translation Translation

BPE and CharCNNs for Translation of Morphology: A Cross-Lingual Comparison and Analysis

no code implementations5 Sep 2018 Pamela Shapiro, Kevin Duh

Neural Machine Translation (NMT) in low-resource settings and of morphologically rich languages is made difficult in part by data sparsity of vocabulary words.

Machine Translation Translation

Regularized Training Objective for Continued Training for Domain Adaptation in Neural Machine Translation

1 code implementation WS 2018 Huda Khayrallah, Brian Thompson, Kevin Duh, Philipp Koehn

Supervised domain adaptation{---}where a large generic corpus and a smaller in-domain corpus are both available for training{---}is a challenge for neural machine translation (NMT).

Domain Adaptation Machine Translation +1

How Do Source-side Monolingual Word Embeddings Impact Neural Machine Translation?

no code implementations5 Jun 2018 Shuoyang Ding, Kevin Duh

Using pre-trained word embeddings as input layer is a common practice in many natural language processing (NLP) tasks, but it is largely neglected for neural machine translation (NMT).

Machine Translation Translation +1

Cross-Lingual Learning-to-Rank with Shared Representations

no code implementations NAACL 2018 Shota Sasaki, Shuo Sun, Shigehiko Schamoni, Kevin Duh, Kentaro Inui

Cross-lingual information retrieval (CLIR) is a document retrieval task where the documents are written in a language different from that of the user{'}s query.

Information Retrieval Learning-To-Rank +1

Morphological Word Embeddings for Arabic Neural Machine Translation in Low-Resource Settings

no code implementations WS 2018 Pamela Shapiro, Kevin Duh

Neural machine translation has achieved impressive results in the last few years, but its success has been limited to settings with large amounts of parallel data.

Machine Translation Translation +2

Halo: Learning Semantics-Aware Representations for Cross-Lingual Information Extraction

no code implementations SEMEVAL 2018 Hongyuan Mei, Sheng Zhang, Kevin Duh, Benjamin Van Durme

Cross-lingual information extraction (CLIE) is an important and challenging task, especially in low resource scenarios.

Stochastic Answer Networks for Natural Language Inference

3 code implementations21 Apr 2018 Xiaodong Liu, Kevin Duh, Jianfeng Gao

We propose a stochastic answer network (SAN) to explore multi-step inference strategies in Natural Language Inference.

Natural Language Inference

Cross-lingual Semantic Parsing

no code implementations21 Apr 2018 Sheng Zhang, Kevin Duh, Benjamin Van Durme

We introduce the task of cross-lingual semantic parsing: mapping content provided in a source language into a meaning representation based on a target language.

Semantic Parsing

Stochastic Answer Networks for Machine Reading Comprehension

5 code implementations ACL 2018 Xiaodong Liu, Yelong Shen, Kevin Duh, Jianfeng Gao

We propose a simple yet robust stochastic answer network (SAN) that simulates multi-step reasoning in machine reading comprehension.

Machine Reading Comprehension Question Answering

An Empirical Analysis of Multiple-Turn Reasoning Strategies in Reading Comprehension Tasks

no code implementations IJCNLP 2017 Yelong Shen, Xiaodong Liu, Kevin Duh, Jianfeng Gao

Using a state-of-the-art RC model, we empirically investigate the performance of single-turn and multiple-turn reasoning on the SQuAD and MS MARCO datasets.

Reading Comprehension

Inference is Everything: Recasting Semantic Resources into a Unified Evaluation Framework

no code implementations IJCNLP 2017 Aaron Steven White, Pushpendre Rastogi, Kevin Duh, Benjamin Van Durme

We propose to unify a variety of existing semantic classification tasks, such as semantic role labeling, anaphora resolution, and paraphrase detection, under the heading of Recognizing Textual Entailment (RTE).

General Classification Image Captioning +2

A Multi-task Learning Approach to Adapting Bilingual Word Embeddings for Cross-lingual Named Entity Recognition

no code implementations IJCNLP 2017 Dingquan Wang, Nanyun Peng, Kevin Duh

We show how to adapt bilingual word embeddings (BWE{'}s) to bootstrap a cross-lingual name-entity recognition (NER) system in a language with no labeled data.

Cross-Lingual Transfer Multi-Task Learning +3

Selective Decoding for Cross-lingual Open Information Extraction

no code implementations IJCNLP 2017 Sheng Zhang, Kevin Duh, Benjamin Van Durme

Cross-lingual open information extraction is the task of distilling facts from the source language into representations in the target language.

Machine Translation Open Information Extraction

Streaming Word Embeddings with the Space-Saving Algorithm

2 code implementations24 Apr 2017 Chandler May, Kevin Duh, Benjamin Van Durme, Ashwin Lall

We develop a streaming (one-pass, bounded-memory) word embedding algorithm based on the canonical skip-gram with negative sampling algorithm implemented in word2vec.

Word Embeddings

DyNet: The Dynamic Neural Network Toolkit

4 code implementations15 Jan 2017 Graham Neubig, Chris Dyer, Yoav Goldberg, Austin Matthews, Waleed Ammar, Antonios Anastasopoulos, Miguel Ballesteros, David Chiang, Daniel Clothiaux, Trevor Cohn, Kevin Duh, Manaal Faruqui, Cynthia Gan, Dan Garrette, Yangfeng Ji, Lingpeng Kong, Adhiguna Kuncoro, Gaurav Kumar, Chaitanya Malaviya, Paul Michel, Yusuke Oda, Matthew Richardson, Naomi Saphra, Swabha Swayamdipta, Pengcheng Yin

In the static declaration strategy that is used in toolkits like Theano, CNTK, and TensorFlow, the user first defines a computation graph (a symbolic representation of the computation), and then examples are fed into an engine that executes this computation and computes its derivatives.

graph construction

Ordinal Common-sense Inference

no code implementations TACL 2017 Sheng Zhang, Rachel Rudinger, Kevin Duh, Benjamin Van Durme

Humans have the capacity to draw common-sense inferences from natural language: various things that are likely but not certain to hold based on established discourse, and are rarely stated explicitly.

Common Sense Reasoning Natural Language Inference

Robsut Wrod Reocginiton via semi-Character Recurrent Neural Network

1 code implementation7 Aug 2016 Keisuke Sakaguchi, Kevin Duh, Matt Post, Benjamin Van Durme

Inspired by the findings from the Cmabrigde Uinervtisy effect, we propose a word recognition model based on a semi-character level recurrent neural network (scRNN).

Spelling Correction

Dependency Parsing with LSTMs: An Empirical Evaluation

no code implementations22 Apr 2016 Adhiguna Kuncoro, Yuichiro Sawai, Kevin Duh, Yuji Matsumoto

We propose a transition-based dependency parser using Recurrent Neural Networks with Long Short-Term Memory (LSTM) units.

Dependency Parsing

Depth-Gated LSTM

no code implementations16 Aug 2015 Kaisheng Yao, Trevor Cohn, Katerina Vylomova, Kevin Duh, Chris Dyer

This gate is a function of the lower layer memory cell, the input to and the past memory cell of this layer.

Language Modelling Machine Translation +1

Incorporating Both Distributional and Relational Semantics in Word Representations

no code implementations18 Dec 2014 Daniel Fried, Kevin Duh

We investigate the hypothesis that word representations ought to incorporate both distributional and relational semantics.

Knowledge Base Completion

Incorporating Both Distributional and Relational Semantics in Word Representations

no code implementations14 Dec 2014 Daniel Fried, Kevin Duh

We investigate the hypothesis that word representations ought to incorporate both distributional and relational semantics.

Knowledge Base Completion

Parsing Chinese Synthetic Words with a Character-based Dependency Model

no code implementations LREC 2014 Fei Cheng, Kevin Duh, Yuji Matsumoto

Synthetic word analysis is a potentially important but relatively unexplored problem in Chinese natural language processing.

Chinese Word Segmentation

Cannot find the paper you are looking for? You can Submit a new open access paper.