We study speech-to-speech translation (S2ST) that translates speech from one language into another language and focuses on building systems to support languages without standard text writing systems. We use English-Taiwanese Hokkien as a case study, and present an end-to-end solution from training data collection, modeling choices to benchmark dataset release. First, we present efforts on creating human annotated data, automatically mining data from large unlabeled speech datasets, and adopting pseudo-labeling to produce weakly supervised data. On the modeling, we take advantage of recent advances in applying self-supervised discrete representations as target for prediction in S2ST and show the effectiveness of leveraging additional text supervision from Mandarin, a language similar to Hokkien, in model training. Finally, we release an S2ST benchmark set to facilitate future research in this field.

PDF Abstract

Results from the Paper


 Ranked #1 on Speech-to-Speech Translation on TAT (using extra training data)

     Get a GitHub badge
Task Dataset Model Metric Name Metric Value Global Rank Uses Extra
Training Data
Benchmark
Speech-to-Speech Translation TAT Hokkien→En (Two-pass decoding) ASR-BLEU (Dev) 13.6 # 1
ASR-BLEU (Test) 12.5 # 1
Speech-to-Speech Translation TAT En→Hokkien (Single-pass decoding) ASR-BLEU (Dev) 6.6 # 8
ASR-BLEU (Test) 6.0 # 8
Speech-to-Speech Translation TAT En→Hokkien (Two-stage) ASR-BLEU (Dev) 7.1 # 7
ASR-BLEU (Test) 6.6 # 7
Speech-to-Speech Translation TAT En→Hokkien (Three-stage) ASR-BLEU (Dev) 7.5 # 6
ASR-BLEU (Test) 6.8 # 6
Speech-to-Speech Translation TAT En→Hokkien (Two-pass decoding) ASR-BLEU (Dev) 7.8 # 5
ASR-BLEU (Test) 7.3 # 5
Speech-to-Speech Translation TAT Hokkien→En (Single-pass decoding) ASR-BLEU (Dev) 8.8 # 4
ASR-BLEU (Test) 8.1 # 4
Speech-to-Speech Translation TAT Hokkien→En (Three-stage) ASR-BLEU (Dev) 12.5 # 2
ASR-BLEU (Test) 8.8 # 3
Speech-to-Speech Translation TAT Hokkien→En (Two-stage) ASR-BLEU (Dev) 12.5 # 2
ASR-BLEU (Test) 10.5 # 2

Methods


No methods listed for this paper. Add relevant methods here