no code implementations • 9 Oct 2024 • Cindy Tseng, Yun Tang, Vijendra Raj Apsingekar
However, it is not straightforward to apply consistency regularization for the transducer-based approaches, which are widely adopted for speech applications due to the competitive performance and streaming characteristic.
1 code implementation • 23 Feb 2024 • Yi Zhang, Yun Tang, Wenjie Ruan, Xiaowei Huang, Siddartha Khastgir, Paul Jennings, Xingyu Zhao
Text-to-Image (T2I) Diffusion Models (DMs) have shown impressive abilities in generating high-quality images based on simple text descriptions.
no code implementations • 17 Jul 2023 • Yun Tang, Antonio A. Bruto da Costa, Jason Zhang, Irvine Patrick, Siddartha Khastgir, Paul Jennings
Engineering knowledge-based (or expert) systems require extensive manual effort and domain knowledge.
no code implementations • 4 May 2023 • Yun Tang, Anna Y. Sun, Hirofumi Inaguma, Xinyue Chen, Ning Dong, Xutai Ma, Paden D. Tomasello, Juan Pino
In order to leverage strengths of both modeling methods, we propose a solution by combining Transducer and Attention based Encoder-Decoder (TAED) for speech-to-text tasks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
1 code implementation • 10 Apr 2023 • Brian Yan, Jiatong Shi, Yun Tang, Hirofumi Inaguma, Yifan Peng, Siddharth Dalmia, Peter Polák, Patrick Fernandes, Dan Berrebbi, Tomoki Hayashi, Xiaohui Zhang, Zhaoheng Ni, Moto Hira, Soumi Maiti, Juan Pino, Shinji Watanabe
ESPnet-ST-v2 is a revamp of the open-source ESPnet-ST toolkit necessitated by the broadening interests of the spoken language translation community.
no code implementations • 10 Apr 2023 • Jiatong Shi, Yun Tang, Ann Lee, Hirofumi Inaguma, Changhan Wang, Juan Pino, Shinji Watanabe
It has been known that direct speech-to-speech translation (S2ST) models usually suffer from the data scarcity issue because of the limited existing parallel materials for both source and target speech.
1 code implementation • 15 Dec 2022 • Hirofumi Inaguma, Sravya Popuri, Ilia Kulikov, Peng-Jen Chen, Changhan Wang, Yu-An Chung, Yun Tang, Ann Lee, Shinji Watanabe, Juan Pino
We enhance the model performance by subword prediction in the first-pass decoder, advanced two-pass decoder architecture design and search strategy, and better training regularization.
no code implementations • 26 Oct 2022 • Xuan-Phi Nguyen, Sravya Popuri, Changhan Wang, Yun Tang, Ilia Kulikov, Hongyu Gong
Direct speech-to-speech translation (S2ST) is among the most challenging problems in the translation paradigm due to the significant scarcity of S2ST data.
no code implementations • 21 Oct 2022 • Marco Gaido, Yun Tang, Ilia Kulikov, Rongqing Huang, Hongyu Gong, Hirofumi Inaguma
In a sentence, certain words are critical for its semantic.
no code implementations • 18 Oct 2022 • Changhan Wang, Hirofumi Inaguma, Peng-Jen Chen, Ilia Kulikov, Yun Tang, Wei-Ning Hsu, Michael Auli, Juan Pino
The amount of labeled data to train models for speech tasks is limited for most languages, however, the data scarcity is exacerbated for speech translation which requires labeled data covering two different languages.
no code implementations • ACL 2022 • Yun Tang, Hongyu Gong, Ning Dong, Changhan Wang, Wei-Ning Hsu, Jiatao Gu, Alexei Baevski, Xian Li, Abdelrahman Mohamed, Michael Auli, Juan Pino
Two pre-training configurations for speech translation and recognition, respectively, are presented to alleviate subtask interference.
no code implementations • 2 Dec 2021 • Ziyuan Zhong, Yun Tang, Yuan Zhou, Vania de Oliveira Neves, Yang Liu, Baishakhi Ray
To bridge this gap, in this work, we provide a generic formulation of scenario-based testing in high-fidelity simulation and conduct a literature review on the existing works.
no code implementations • 15 Oct 2021 • Danni Liu, Changhan Wang, Hongyu Gong, Xutai Ma, Yun Tang, Juan Pino
Speech-to-speech translation (S2ST) converts input speech to speech in another language.
Data Augmentation
Simultaneous Speech-to-Speech Translation
+4
no code implementations • 15 Oct 2021 • Xutai Ma, Hongyu Gong, Danni Liu, Ann Lee, Yun Tang, Peng-Jen Chen, Wei-Ning Hsu, Phillip Koehn, Juan Pino
We present a direct simultaneous speech-to-speech translation (Simul-S2ST) model, Furthermore, the generation of translation is independent from intermediate text representations.
Simultaneous Speech-to-Speech Translation
Speech Synthesis
+2
no code implementations • ICLR 2022 • Xuan-Phi Nguyen, Hongyu Gong, Yun Tang, Changhan Wang, Philipp Koehn, Shafiq Joty
Modern unsupervised machine translation systems mostly train their models by generating synthetic parallel training data from large unlabeled monolingual corpora of different languages through various means, such as iterative back-translation.
no code implementations • ACL 2021 • Xian Li, Changhan Wang, Yun Tang, Chau Tran, Yuqing Tang, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli
We present a simple yet effective approach to build multilingual speech-to-text (ST) translation through efficient transfer learning from a pretrained speech encoder and text decoder.
no code implementations • ACL (IWSLT) 2021 • Yun Tang, Hongyu Gong, Xian Li, Changhan Wang, Juan Pino, Holger Schwenk, Naman Goyal
In this paper, we describe our end-to-end multilingual speech translation system submitted to the IWSLT 2021 evaluation campaign on the Multilingual Speech Translation shared task.
no code implementations • ACL 2021 • Yun Tang, Juan Pino, Xian Li, Changhan Wang, Dmitriy Genzel
Pretraining and multitask learning are widely used to improve the speech to text translation performance.
1 code implementation • ACL 2022 • Ann Lee, Peng-Jen Chen, Changhan Wang, Jiatao Gu, Sravya Popuri, Xutai Ma, Adam Polyak, Yossi Adi, Qing He, Yun Tang, Juan Pino, Wei-Ning Hsu
When target text transcripts are available, we design a joint speech and text training framework that enables the model to generate dual modality output (speech and text) simultaneously in the same inference pass.
no code implementations • NeurIPS 2021 • Hongyu Gong, Yun Tang, Juan Pino, Xian Li
We further propose attention sharing strategies to facilitate parameter sharing and specialization in multilingual and multi-domain sequence modeling.
no code implementations • 24 Oct 2020 • Xian Li, Changhan Wang, Yun Tang, Chau Tran, Yuqing Tang, Juan Pino, Alexei Baevski, Alexis Conneau, Michael Auli
We present a simple yet effective approach to build multilingual speech-to-text (ST) translation by efficient transfer learning from pretrained speech encoder and text decoder.
no code implementations • 21 Oct 2020 • Yun Tang, Juan Pino, Changhan Wang, Xutai Ma, Dmitriy Genzel
We demonstrate that representing text input as phoneme sequences can reduce the difference between speech and text inputs, and enhance the knowledge transfer from text corpora to the speech to text tasks.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+5
5 code implementations • Asian Chapter of the Association for Computational Linguistics 2020 • Changhan Wang, Yun Tang, Xutai Ma, Anne Wu, Sravya Popuri, Dmytro Okhonko, Juan Pino
We introduce fairseq S2T, a fairseq extension for speech-to-text (S2T) modeling tasks such as end-to-end speech recognition and speech-to-text translation.
Ranked #8 on
Speech-to-Text Translation
on MuST-C EN->DE
no code implementations • 3 Jun 2020 • Juan Pino, Qiantong Xu, Xutai Ma, Mohammad Javad Dousti, Yun Tang
One of the main challenges for end-to-end speech translation is data scarcity.
no code implementations • ACL 2020 • Yun Tang, Jing Huang, Guangtao Wang, Xiaodong He, Bo-Wen Zhou
Translational distance-based knowledge graph embedding has shown progressive improvements on the link prediction task, from TransE to the latest state-of-the-art RotatE.
Ranked #19 on
Link Prediction
on FB15k-237
no code implementations • CONLL 2019 • Kevin Huang, Yun Tang, Jing Huang, Xiaodong He, Bo-Wen Zhou
We test the relation module on the SQuAD 2. 0 dataset using both the BiDAF and BERT models as baseline readers.
no code implementations • 23 Oct 2019 • Kevin Huang, Yun Tang, Jing Huang, Xiaodong He, Bo-Wen Zhou
In this paper, we aim to improve a MRC model's ability to determine whether a question has an answer in a given context (e. g. the recently proposed SQuAD 2. 0 task).
1 code implementation • 29 Aug 2019 • Shuaichen Chang, PengFei Liu, Yun Tang, Jing Huang, Xiaodong He, Bo-Wen Zhou
Recent years have seen great success in the use of neural seq2seq models on the text-to-SQL task.
no code implementations • ACL 2019 • Ming Tu, Guangtao Wang, Jing Huang, Yun Tang, Xiaodong He, Bo-Wen Zhou
We introduce a heterogeneous graph with different types of nodes and edges, which is named as Heterogeneous Document-Entity (HDE) graph.
no code implementations • 16 Apr 2019 • Kong Aik Lee, Ville Hautamaki, Tomi Kinnunen, Hitoshi Yamamoto, Koji Okabe, Ville Vestman, Jing Huang, Guohong Ding, Hanwu Sun, Anthony Larcher, Rohan Kumar Das, Haizhou Li, Mickael Rouvier, Pierre-Michel Bousquet, Wei Rao, Qing Wang, Chunlei Zhang, Fahimeh Bahmaninezhad, Hector Delgado, Jose Patino, Qiongqiong Wang, Ling Guo, Takafumi Koshinaka, Jiacen Zhang, Koichi Shinoda, Trung Ngo Trong, Md Sahidullah, Fan Lu, Yun Tang, Ming Tu, Kah Kuan Teh, Huy Dat Tran, Kuruvachan K. George, Ivan Kukanov, Florent Desnous, Jichen Yang, Emre Yilmaz, Longting Xu, Jean-Francois Bonastre, Cheng-Lin Xu, Zhi Hao Lim, Eng Siong Chng, Shivesh Ranjan, John H. L. Hansen, Massimiliano Todisco, Nicholas Evans
The I4U consortium was established to facilitate a joint entry to NIST speaker recognition evaluations (SRE).
no code implementations • 21 Feb 2019 • Yun Tang, Guohong Ding, Jing Huang, Xiaodong He, Bo-Wen Zhou
This paper aims to improve the widely used deep speaker embedding x-vector model.
1 code implementation • 11 Nov 2018 • Chao Shang, Yun Tang, Jing Huang, Jinbo Bi, Xiaodong He, Bo-Wen Zhou
The recent graph convolutional network (GCN) provides another way of learning graph node embedding by successfully utilizing graph connectivity structure.
Ranked #30 on
Link Prediction
on FB15k-237