NeurST: Neural Speech Translation Toolkit

NeurST is an open-source toolkit for neural speech translation. The toolkit mainly focuses on end-to-end speech translation, which is easy to use, modify, and extend to advanced speech translation research and products. NeurST aims at facilitating the speech translation research for NLP researchers and building reliable benchmarks for this field. It provides step-by-step recipes for feature extraction, data preprocessing, distributed training, and evaluation. In this paper, we will introduce the framework design of NeurST and show experimental results for different benchmark datasets, which can be regarded as reliable baselines for future research. The toolkit is publicly available at https://github.com/bytedance/neurst/ and we will continuously update the performance of NeurST with other counterparts and studies at https://st-benchmark.github.io/.

PDF Abstract ACL 2021 PDF ACL 2021 Abstract

Results from the Paper


Task Dataset Model Metric Name Metric Value Global Rank Benchmark
Speech-to-Text Translation libri-trans Transformer + ASR Pretrain + SpecAug Case-insensitive tokenized BLEU 18.7 # 1
Case-insensitive sacreBLEU 17.2 # 1
Case-sensitive sacreBLEU 16.3 # 1
Case-sensitive tokenized BLEU 17.8 # 1
Speech-to-Text Translation libri-trans Transformer + ASR Pretrain Case-insensitive tokenized BLEU 17.9 # 2
Case-insensitive sacreBLEU 16.5 # 2
Case-sensitive sacreBLEU 15.5 # 2
Case-sensitive tokenized BLEU 16.9 # 2
Speech-to-Text Translation MuST-C EN->DE Transformer + ASR Pretrain Case-sensitive sacreBLEU 22.8 # 7
Speech-to-Text Translation MuST-C EN->ES Transformer + ASR Pretrain Case-sensitive sacreBLEU 26.8 # 5
Speech-to-Text Translation MuST-C EN->ES Transformer + ASR Pretrain + SpecAug Case-sensitive sacreBLEU 27.4 # 4
Speech-to-Text Translation MuST-C EN->FR Transformer + ASR Pretrain + SpecAug Case-sensitive sacreBLEU 33.3 # 2
Speech-to-Text Translation MuST-C EN->FR Transformer + ASR Pretrain Case-sensitive sacreBLEU 32.3 # 3

Methods