Search Results for author: Hongyu Gong

Found 43 papers, 13 papers with code

Rich Syntactic and Semantic Information Helps Unsupervised Text Style Transfer

no code implementations INLG (ACL) 2020 Hongyu Gong, Linfeng Song, Suma Bhat

Text style transfer aims to change an input sentence to an output sentence by changing its text style while preserving the content.

Sentence Style Transfer +2

PIE: A Parallel Idiomatic Expression Corpus for Idiomatic Sentence Generation and Paraphrasing

no code implementations ACL (MWE) 2021 Jianing Zhou, Hongyu Gong, Suma Bhat

Idiomatic expressions (IE) play an important role in natural language, and have long been a “pain in the neck” for NLP systems.

Sentence Text Generation

Findings of the IWSLT 2022 Evaluation Campaign

no code implementations IWSLT (ACL) 2022 Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vĕra Kloudová, Surafel Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nǎdejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, Shinji Watanabe

The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation.

Speech-to-Speech Translation Translation

Multilingual Speech-to-Speech Translation into Multiple Target Languages

no code implementations17 Jul 2023 Hongyu Gong, Ning Dong, Sravya Popuri, Vedanuj Goswami, Ann Lee, Juan Pino

Despite a few studies on multilingual S2ST, their focus is the multilinguality on the source side, i. e., the translation from multiple source languages to one target language.

Language Identification Speech-to-Speech Translation +1

Pre-training for Speech Translation: CTC Meets Optimal Transport

1 code implementation27 Jan 2023 Phuong-Hang Le, Hongyu Gong, Changhan Wang, Juan Pino, Benjamin Lecouteux, Didier Schwab

Nevertheless, CTC is only a partial solution and thus, in our second contribution, we propose a novel pre-training method combining CTC and optimal transport to further reduce this gap.

Multi-Task Learning Speech-to-Text Translation +1

Improving Speech-to-Speech Translation Through Unlabeled Text

no code implementations26 Oct 2022 Xuan-Phi Nguyen, Sravya Popuri, Changhan Wang, Yun Tang, Ilia Kulikov, Hongyu Gong

Direct speech-to-speech translation (S2ST) is among the most challenging problems in the translation paradigm due to the significant scarcity of S2ST data.

Machine Translation speech-recognition +3

Idiomatic Expression Paraphrasing without Strong Supervision

no code implementations16 Dec 2021 Jianing Zhou, Ziheng Zeng, Hongyu Gong, Suma Bhat

In this paper, we study the task of idiomatic sentence paraphrasing (ISP), which aims to paraphrase a sentence with an IE by replacing the IE with its literal paraphrase.

Machine Translation Sentence +1

Multimodal and Multilingual Embeddings for Large-Scale Speech Mining

1 code implementation NeurIPS 2021 Paul-Ambroise Duquenne, Hongyu Gong, Holger Schwenk

Using a similarity metric in that multimodal embedding space, we perform mining of audio in German, French, Spanish and English from Librivox against billions of sentences from Common Crawl.

Speech-to-Speech Translation Translation

Direct Simultaneous Speech-to-Speech Translation with Variational Monotonic Multihead Attention

no code implementations15 Oct 2021 Xutai Ma, Hongyu Gong, Danni Liu, Ann Lee, Yun Tang, Peng-Jen Chen, Wei-Ning Hsu, Phillip Koehn, Juan Pino

We present a direct simultaneous speech-to-speech translation (Simul-S2ST) model, Furthermore, the generation of translation is independent from intermediate text representations.

Speech Synthesis Speech-to-Speech Translation +1

Contrastive Clustering to Mine Pseudo Parallel Data for Unsupervised Translation

no code implementations ICLR 2022 Xuan-Phi Nguyen, Hongyu Gong, Yun Tang, Changhan Wang, Philipp Koehn, Shafiq Joty

Modern unsupervised machine translation systems mostly train their models by generating synthetic parallel training data from large unlabeled monolingual corpora of different languages through various means, such as iterative back-translation.

Clustering Translation +1

FST: the FAIR Speech Translation System for the IWSLT21 Multilingual Shared Task

no code implementations ACL (IWSLT) 2021 Yun Tang, Hongyu Gong, Xian Li, Changhan Wang, Juan Pino, Holger Schwenk, Naman Goyal

In this paper, we describe our end-to-end multilingual speech translation system submitted to the IWSLT 2021 evaluation campaign on the Multilingual Speech Translation shared task.

Transfer Learning Translation

Pay Better Attention to Attention: Head Selection in Multilingual and Multi-Domain Sequence Modeling

no code implementations NeurIPS 2021 Hongyu Gong, Yun Tang, Juan Pino, Xian Li

We further propose attention sharing strategies to facilitate parameter sharing and specialization in multilingual and multi-domain sequence modeling.

speech-recognition Speech Recognition +2

LAWDR: Language-Agnostic Weighted Document Representations from Pre-trained Models

no code implementations7 Jun 2021 Hongyu Gong, Vishrav Chaudhary, Yuqing Tang, Francisco Guzmán

Cross-lingual document representations enable language understanding in multilingual contexts and allow transfer learning from high-resource to low-resource languages at the document level.

Sentence Sentence Embeddings +1

Adaptive Sparse Transformer for Multilingual Translation

no code implementations15 Apr 2021 Hongyu Gong, Xian Li, Dmitriy Genzel

Based on these insights, we propose an adaptive and sparse architecture for multilingual modeling, and train the model to learn shared and language-specific parameters to improve the positive transfer and mitigate the interference.

Machine Translation Transfer Learning +1

Robust Optimization for Multilingual Translation with Imbalanced Data

no code implementations NeurIPS 2021 Xian Li, Hongyu Gong

We show that common training method which upsamples low resources can not robustly optimize population loss with risks of either underfitting high resource languages or overfitting low resource ones.

Machine Translation Translation

From Solving a Problem Boldly to Cutting the Gordian Knot: Idiomatic Text Generation

no code implementations13 Apr 2021 Jianing Zhou, Hongyu Gong, Srihari Nanniyur, Suma Bhat

We study a new application for text generation -- idiomatic sentence generation -- which aims to transfer literal phrases in sentences into their idiomatic counterparts.

Sentence Text Generation

Self-Supervised Euphemism Detection and Identification for Content Moderation

1 code implementation31 Mar 2021 Wanzheng Zhu, Hongyu Gong, Rohan Bansal, Zachary Weinberg, Nicolas Christin, Giulia Fanti, Suma Bhat

It is usually apparent to a human moderator that a word is being used euphemistically, but they may not know what the secret meaning is, and therefore whether the message violates policy.

Sentence Word Embeddings

Enriching Word Embeddings with Temporal and Spatial Information

1 code implementation CONLL 2020 Hongyu Gong, Suma Bhat, Pramod Viswanath

The meaning of a word is closely linked to sociocultural factors that can change over time and location, resulting in corresponding meaning changes.

Word Embeddings

Recurrent Chunking Mechanisms for Long-Text Machine Reading Comprehension

1 code implementation ACL 2020 Hongyu Gong, Yelong Shen, Dian Yu, Jianshu Chen, Dong Yu

In this paper, we study machine reading comprehension (MRC) on long texts, where a model takes as inputs a lengthy document and a question and then extracts a text span from the document as an answer.

Chunking Machine Reading Comprehension +1

FUSE: Multi-Faceted Set Expansion by Coherent Clustering of Skip-grams

1 code implementation10 Oct 2019 Wanzheng Zhu, Hongyu Gong, Jiaming Shen, Chao Zhang, Jingbo Shang, Suma Bhat, Jiawei Han

In this paper, we study the task of multi-faceted set expansion, which aims to capture all semantic facets in the seed set and return multiple sets of entities, one for each semantic facet.

Clustering Language Modelling

PaRe: A Paper-Reviewer Matching Approach Using a Common Topic Space

no code implementations IJCNLP 2019 Omer Anjum, Hongyu Gong, Suma Bhat, Wen-mei Hwu, JinJun Xiong

Finding the right reviewers to assess the quality of conference submissions is a time consuming process for conference organizers.

Topic Models

Equipping Educational Applications with Domain Knowledge

no code implementations WS 2019 Tarek Sakakini, Hongyu Gong, Jong Yoon Lee, Robert Schloss, JinJun Xiong, Suma Bhat

One of the challenges of building natural language processing (NLP) applications for education is finding a large domain-specific corpus for the subject of interest (e. g., history or science).

Distractor Generation Language Modelling +1

WikiMatrix: Mining 135M Parallel Sentences in 1620 Language Pairs from Wikipedia

6 code implementations EACL 2021 Holger Schwenk, Vishrav Chaudhary, Shuo Sun, Hongyu Gong, Francisco Guzmán

We present an approach based on multilingual sentence embeddings to automatically extract parallel sentences from the content of Wikipedia articles in 85 languages, including several dialects or low-resource languages.

Sentence Sentence Embeddings

Document Similarity for Texts of Varying Lengths via Hidden Topics

1 code implementation ACL 2018 Hongyu Gong, Tarek Sakakini, Suma Bhat, JinJun Xiong

This is because of the lexical, contextual and the abstraction gaps between a long document of rich details and its concise summary of abstract information.

Text Matching

Context-Sensitive Malicious Spelling Error Correction

no code implementations23 Jan 2019 Hongyu Gong, Yuchen Li, Suma Bhat, Pramod Viswanath

Misspelled words of the malicious kind work by changing specific keywords and are intended to thwart existing automated applications for cyber-environment control such as harassing content detection on the Internet and email spam detection.

Spam detection Spelling Correction +1

Preposition Sense Disambiguation and Representation

1 code implementation EMNLP 2018 Hongyu Gong, Jiaqi Mu, Suma Bhat, Pramod Viswanath

Prepositions are highly polysemous, and their variegated senses encode significant semantic information.

Embedding Syntax and Semantics of Prepositions via Tensor Decomposition

no code implementations NAACL 2018 Hongyu Gong, Suma Bhat, Pramod Viswanath

Prepositions are among the most frequent words in English and play complex roles in the syntax and semantics of sentences.

Tensor Decomposition

Prepositions in Context

no code implementations5 Feb 2017 Hongyu Gong, Jiaqi Mu, Suma Bhat, Pramod Viswanath

Prepositions are highly polysemous, and their variegated senses encode significant semantic information.

Clustering

Geometry of Compositionality

1 code implementation29 Nov 2016 Hongyu Gong, Suma Bhat, Pramod Viswanath

This paper proposes a simple test for compositionality (i. e., literal usage) of a word or phrase in a context-specific way.

Word Embeddings

Cannot find the paper you are looking for? You can Submit a new open access paper.