Search Results for author: Jiatong Shi

Found 41 papers, 16 papers with code

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

1 code implementation25 Apr 2023 Rongjie Huang, Mingze Li, Dongchao Yang, Jiatong Shi, Xuankai Chang, Zhenhui Ye, Yuning Wu, Zhiqing Hong, Jiawei Huang, Jinglin Liu, Yi Ren, Zhou Zhao, Shinji Watanabe

In this work, we propose a multi-modal AI system named AudioGPT, which complements LLMs (i. e., ChatGPT) with 1) foundation models to process complex audio information and solve numerous understanding and generation tasks; and 2) the input/output interface (ASR, TTS) to support spoken dialogue.

EURO: ESPnet Unsupervised ASR Open-source Toolkit

1 code implementation30 Nov 2022 Dongji Gao, Jiatong Shi, Shun-Po Chuang, Leibny Paola Garcia, Hung-Yi Lee, Shinji Watanabe, Sanjeev Khudanpur

This paper describes the ESPnet Unsupervised ASR Open-source Toolkit (EURO), an end-to-end open-source toolkit for unsupervised automatic speech recognition (UASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Improving Massively Multilingual ASR With Auxiliary CTC Objectives

1 code implementation24 Feb 2023 William Chen, Brian Yan, Jiatong Shi, Yifan Peng, Soumi Maiti, Shinji Watanabe

In this paper, we introduce our work on improving performance on FLEURS, a 102-language open ASR benchmark, by conditioning the entire model on language identity (LID).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Large-Scale End-to-End Multilingual Speech Recognition and Language Identification with Multi-Task Learning

1 code implementation25 Oct 2020 Wenxin Hou, Yue Dong, Bairong Zhuang, Longfei Yang, Jiatong Shi, Takahiro Shinozaki

In this paper, we report a large-scale end-to-end language-independent multilingual model for joint automatic speech recognition (ASR) and language identification (LID).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

The Singing Voice Conversion Challenge 2023

1 code implementation26 Jun 2023 Wen-Chin Huang, Lester Phillip Violeta, Songxiang Liu, Jiatong Shi, Tomoki Toda

A new database was constructed for two tasks, namely in-domain and cross-domain SVC.

Voice Conversion

Dynamic-SUPERB: Towards A Dynamic, Collaborative, and Comprehensive Instruction-Tuning Benchmark for Speech

1 code implementation18 Sep 2023 Chien-yu Huang, Ke-Han Lu, Shih-Heng Wang, Chi-Yuan Hsiao, Chun-Yi Kuan, Haibin Wu, Siddhant Arora, Kai-Wei Chang, Jiatong Shi, Yifan Peng, Roshan Sharma, Shinji Watanabe, Bhiksha Ramakrishnan, Shady Shehata, Hung-Yi Lee

To achieve comprehensive coverage of diverse speech tasks and harness instruction tuning, we invite the community to collaborate and contribute, facilitating the dynamic growth of the benchmark.

Sequence-to-sequence Singing Voice Synthesis with Perceptual Entropy Loss

1 code implementation22 Oct 2020 Jiatong Shi, Shuai Guo, Nan Huo, Yuekai Zhang, Qin Jin

The neural network (NN) based singing voice synthesis (SVS) systems require sufficient data to train well and are prone to over-fitting due to data scarcity.

Singing Voice Synthesis

Cross-lingual Transfer for Speech Processing using Acoustic Language Similarity

1 code implementation2 Nov 2021 Peter Wu, Jiatong Shi, Yifan Zhong, Shinji Watanabe, Alan W Black

We demonstrate the effectiveness of our approach in language family classification, speech recognition, and speech synthesis tasks.

Cross-Lingual Transfer speech-recognition +2

Improving RNN Transducer With Target Speaker Extraction and Neural Uncertainty Estimation

no code implementations26 Nov 2020 Jiatong Shi, Chunlei Zhang, Chao Weng, Shinji Watanabe, Meng Yu, Dong Yu

Target-speaker speech recognition aims to recognize target-speaker speech from noisy environments with background noise and interfering speakers.

Speech Enhancement Speech Extraction +1 Sound Audio and Speech Processing

End-to-End Automatic Speech Recognition: Its Impact on the Workflowin Documenting Yoloxóchitl Mixtec

no code implementations NAACL (AmericasNLP) 2021 Jonathan D. Amith, Jiatong Shi, Rey Castillo García

This paper describes three open access Yoloxóchitl Mixtec corpora and presents the results and implications of end-to-end automatic speech recognition for endangered language documentation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

SingAug: Data Augmentation for Singing Voice Synthesis with Cycle-consistent Training Strategy

no code implementations31 Mar 2022 Shuai Guo, Jiatong Shi, Tao Qian, Shinji Watanabe, Qin Jin

Deep learning based singing voice synthesis (SVS) systems have been demonstrated to flexibly generate singing with better qualities, compared to conventional statistical parametric based methods.

Data Augmentation Singing Voice Synthesis

Blockwise Streaming Transformer for Spoken Language Understanding and Simultaneous Speech Translation

no code implementations19 Apr 2022 Keqi Deng, Shinji Watanabe, Jiatong Shi, Siddhant Arora

Although Transformers have gained success in several speech processing tasks like spoken language understanding (SLU) and speech translation (ST), achieving online processing while keeping competitive performance is still essential for real-world interaction.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Findings of the IWSLT 2022 Evaluation Campaign

no code implementations IWSLT (ACL) 2022 Antonios Anastasopoulos, Loïc Barrault, Luisa Bentivogli, Marcely Zanon Boito, Ondřej Bojar, Roldano Cattoni, Anna Currey, Georgiana Dinu, Kevin Duh, Maha Elbayad, Clara Emmanuel, Yannick Estève, Marcello Federico, Christian Federmann, Souhir Gahbiche, Hongyu Gong, Roman Grundkiewicz, Barry Haddow, Benjamin Hsu, Dávid Javorský, Vĕra Kloudová, Surafel Lakew, Xutai Ma, Prashant Mathur, Paul McNamee, Kenton Murray, Maria Nǎdejde, Satoshi Nakamura, Matteo Negri, Jan Niehues, Xing Niu, John Ortega, Juan Pino, Elizabeth Salesky, Jiatong Shi, Matthias Sperber, Sebastian Stüker, Katsuhito Sudoh, Marco Turchi, Yogesh Virkar, Alexander Waibel, Changhan Wang, Shinji Watanabe

The evaluation campaign of the 19th International Conference on Spoken Language Translation featured eight shared tasks: (i) Simultaneous speech translation, (ii) Offline speech translation, (iii) Speech to speech translation, (iv) Low-resource speech translation, (v) Multilingual speech translation, (vi) Dialect speech translation, (vii) Formality control for speech translation, (viii) Isometric speech translation.

Speech-to-Speech Translation Translation

CMU’s IWSLT 2022 Dialect Speech Translation System

no code implementations IWSLT (ACL) 2022 Brian Yan, Patrick Fernandes, Siddharth Dalmia, Jiatong Shi, Yifan Peng, Dan Berrebbi, Xinyi Wang, Graham Neubig, Shinji Watanabe

We use additional paired Modern Standard Arabic data (MSA) to directly improve the speech recognition (ASR) and machine translation (MT) components of our cascaded systems.

Knowledge Distillation Machine Translation +3

VQ-T: RNN Transducers using Vector-Quantized Prediction Network States

no code implementations3 Aug 2022 Jiatong Shi, George Saon, David Haws, Shinji Watanabe, Brian Kingsbury

Beam search, which is the dominant ASR decoding algorithm for end-to-end models, generates tree-structured hypotheses.

Language Modelling

On Compressing Sequences for Self-Supervised Speech Models

no code implementations13 Oct 2022 Yen Meng, Hsuan-Jui Chen, Jiatong Shi, Shinji Watanabe, Paola Garcia, Hung-Yi Lee, Hao Tang

Subsampling while training self-supervised models not only improves the overall performance on downstream tasks under certain frame rates, but also brings significant speed-up in inference.

Self-Supervised Learning

Bridging Speech and Textual Pre-trained Models with Unsupervised ASR

no code implementations6 Nov 2022 Jiatong Shi, Chan-Jan Hsu, Holam Chung, Dongji Gao, Paola Garcia, Shinji Watanabe, Ann Lee, Hung-Yi Lee

To be specific, we propose to use unsupervised automatic speech recognition (ASR) as a connector that bridges different modalities used in speech and textual pre-trained models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

4D ASR: Joint modeling of CTC, Attention, Transducer, and Mask-Predict decoders

no code implementations21 Dec 2022 Yui Sudo, Muhammad Shakeel, Brian Yan, Jiatong Shi, Shinji Watanabe

The network architecture of end-to-end (E2E) automatic speech recognition (ASR) can be classified into several models, including connectionist temporal classification (CTC), recurrent neural network transducer (RNN-T), attention mechanism, and non-autoregressive mask-predict models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Enhancing Speech-to-Speech Translation with Multiple TTS Targets

no code implementations10 Apr 2023 Jiatong Shi, Yun Tang, Ann Lee, Hirofumi Inaguma, Changhan Wang, Juan Pino, Shinji Watanabe

It has been known that direct speech-to-speech translation (S2ST) models usually suffer from the data scarcity issue because of the limited existing parallel materials for both source and target speech.

Speech-to-Speech Translation Speech-to-Text Translation +1

ML-SUPERB: Multilingual Speech Universal PERformance Benchmark

no code implementations18 May 2023 Jiatong Shi, Dan Berrebbi, William Chen, Ho-Lam Chung, En-Pei Hu, Wei Ping Huang, Xuankai Chang, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Shinji Watanabe

Speech processing Universal PERformance Benchmark (SUPERB) is a leaderboard to benchmark the performance of Self-Supervised Learning (SSL) models on various speech processing tasks.

Automatic Speech Recognition Language Identification +3

Joint Prediction and Denoising for Large-scale Multilingual Self-supervised Learning

no code implementations26 Sep 2023 William Chen, Jiatong Shi, Brian Yan, Dan Berrebbi, Wangyou Zhang, Yifan Peng, Xuankai Chang, Soumi Maiti, Shinji Watanabe

We show that further efficiency can be achieved with a vanilla HuBERT Base model, which can maintain 94% of XLS-R's performance with only 3% of the data, 4 GPUs, and limited trials.

Denoising Self-Supervised Learning

Findings of the 2023 ML-SUPERB Challenge: Pre-Training and Evaluation over More Languages and Beyond

no code implementations9 Oct 2023 Jiatong Shi, William Chen, Dan Berrebbi, Hsiu-Hsuan Wang, Wei-Ping Huang, En-Pei Hu, Ho-Lam Chuang, Xuankai Chang, Yuxun Tang, Shang-Wen Li, Abdelrahman Mohamed, Hung-Yi Lee, Shinji Watanabe

The 2023 Multilingual Speech Universal Performance Benchmark (ML-SUPERB) Challenge expands upon the acclaimed SUPERB framework, emphasizing self-supervised models in multilingual speech recognition and language identification.

Language Identification speech-recognition +1

Wav2Gloss: Generating Interlinear Glossed Text from Speech

no code implementations19 Mar 2024 Taiqi He, Kwanghee Choi, Lindia Tjuatja, Nathaniel R. Robinson, Jiatong Shi, Shinji Watanabe, Graham Neubig, David R. Mortensen, Lori Levin

Thousands of the world's languages are in danger of extinction--a tremendous threat to cultural identities and human language diversity.

Cannot find the paper you are looking for? You can Submit a new open access paper.