Search Results for author: Tom Ko

Found 27 papers, 16 papers with code

An Investigation of Few-Shot Learning in Spoken Term Classification

1 code implementation • 26 Dec 2018 • Yangbin Chen, Tom Ko, Lifeng Shang, Xiao Chen, Xin Jiang, Qing Li

In this paper, we investigate the feasibility of applying few-shot learning algorithms to a speech task.

Few-Shot Learning General Classification +1

Paper
Code

MetaMix: Improved Meta-Learning with Interpolation-based Consistency Regularization

no code implementations • 29 Sep 2020 • Yangbin Chen, Yun Ma, Tom Ko, Jian-Ping Wang, Qing Li

MetaMix can be integrated with any of the MAML-based algorithms and learn the decision boundaries generalizing better to new tasks.

Few-Shot Learning Transfer Learning

Paper
Add Code

AutoSpeech 2020: The Second Automated Machine Learning Challenge for Speech Classification

no code implementations • 25 Oct 2020 • Jingsong Wang, Tom Ko, Zhen Xu, Xiawei Guo, Souxiang Liu, Wei-Wei Tu, Lei Xie

The AutoSpeech challenge calls for automated machine learning (AutoML) solutions to automate the process of applying machine learning to speech processing tasks.

AutoML BIG-bench Machine Learning +1

Paper
Add Code

Auto-KWS 2021 Challenge: Task, Datasets, and Baselines

1 code implementation • 31 Mar 2021 • Jingsong Wang, Yuxuan He, Chunyu Zhao, Qijie Shao, Wei-Wei Tu, Tom Ko, Hung-Yi Lee, Lei Xie

Auto-KWS 2021 challenge calls for automated machine learning (AutoML) solutions to automate the process of applying machine learning to a customized keyword spotting task.

AutoML BIG-bench Machine Learning +1

Paper
Code

Exploring Machine Speech Chain for Domain Adaptation and Few-Shot Speaker Adaptation

no code implementations • 8 Apr 2021 • Fengpeng Yue, Yan Deng, Lei He, Tom Ko

Machine Speech Chain, which integrates both end-to-end (E2E) automatic speech recognition (ASR) and text-to-speech (TTS) into one circle for joint training, has been proven to be effective in data augmentation by leveraging large amounts of unpaired data.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Token-Level Supervised Contrastive Learning for Punctuation Restoration

1 code implementation • 19 Jul 2021 • Qiushi Huang, Tom Ko, H Lilian Tang, Xubo Liu, Bo Wu

Punctuation is critical in understanding natural language text.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Code

CL4AC: A Contrastive Loss for Audio Captioning

2 code implementations • 21 Jul 2021 • Xubo Liu, Qiushi Huang, Xinhao Mei, Tom Ko, H Lilian Tang, Mark D. Plumbley, Wenwu Wang

Automated Audio captioning (AAC) is a cross-modal translation task that aims to use natural language to describe the content of an audio clip.

Audio captioning Translation

Paper
Code

An Encoder-Decoder Based Audio Captioning System With Transfer and Reinforcement Learning

1 code implementation • 5 Aug 2021 • Xinhao Mei, Qiushi Huang, Xubo Liu, Gengyun Chen, Jingqian Wu, Yusong Wu, Jinzheng Zhao, Shengchen Li, Tom Ko, H Lilian Tang, Xi Shao, Mark D. Plumbley, Wenwu Wang

Automated audio captioning aims to use natural language to describe the content of audio data.

Audio captioning reinforcement-learning +2

Paper
Code

Multi-View Self-Attention Based Transformer for Speaker Recognition

no code implementations • 11 Oct 2021 • Rui Wang, Junyi Ao, Long Zhou, Shujie Liu, Zhihua Wei, Tom Ko, Qing Li, Yu Zhang

In this work, we propose a novel multi-view self-attention mechanism and present an empirical study of different Transformer variants with or without the proposed attention mechanism for speaker recognition.

Speaker Recognition

Paper
Add Code

SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing

3 code implementations • ACL 2022 • Junyi Ao, Rui Wang, Long Zhou, Chengyi Wang, Shuo Ren, Yu Wu, Shujie Liu, Tom Ko, Qing Li, Yu Zhang, Zhihua Wei, Yao Qian, Jinyu Li, Furu Wei

Motivated by the success of T5 (Text-To-Text Transfer Transformer) in pre-trained natural language processing models, we propose a unified-modal SpeechT5 framework that explores the encoder-decoder pre-training for self-supervised speech/text representation learning.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +7

124,593

Paper
Code

LightHuBERT: Lightweight and Configurable Speech Representation Learning with Once-for-All Hidden-Unit BERT

1 code implementation • 29 Mar 2022 • Rui Wang, Qibing Bai, Junyi Ao, Long Zhou, Zhixiang Xiong, Zhihua Wei, Yu Zhang, Tom Ko, Haizhou Li

LightHuBERT outperforms the original HuBERT on ASR and five SUPERB tasks with the HuBERT size, achieves comparable performance to the teacher model in most tasks with a reduction of 29% parameters, and obtains a $3. 5\times$ compression ratio in three SUPERB tasks, e. g., automatic speaker verification, keyword spotting, and intent classification, with a slight accuracy loss.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +6

Paper
Code

Pre-Training Transformer Decoder for End-to-End ASR Model with Unpaired Speech Data

1 code implementation • 31 Mar 2022 • Junyi Ao, Ziqiang Zhang, Long Zhou, Shujie Liu, Haizhou Li, Tom Ko, LiRong Dai, Jinyu Li, Yao Qian, Furu Wei

In this way, the decoder learns to reconstruct original speech information with codes before learning to generate correct text.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

1,008

Paper
Code

GigaST: A 10,000-hour Pseudo Speech Translation Corpus

1 code implementation • 8 Apr 2022 • Rong Ye, Chengqi Zhao, Tom Ko, Chutong Meng, Tao Wang, Mingxuan Wang, Jun Cao

The training set is translated by a strong machine translation system and the test set is translated by human.

Machine Translation Translation

Paper
Code

Leveraging Pseudo-labeled Data to Improve Direct Speech-to-Speech Translation

1 code implementation • 18 May 2022 • Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Qibing Bai, Yu Zhang

Direct Speech-to-speech translation (S2ST) has drawn more and more attention recently.

Speech-to-Speech Translation Translation

Paper
Code

A Study of Modeling Rising Intonation in Cantonese Neural Speech Synthesis

no code implementations • 3 Aug 2022 • Qibing Bai, Tom Ko, Yu Zhang

In human speech, the attitude of a speaker cannot be fully expressed only by the textual content.

Speech Synthesis

Paper
Add Code

Personalized Dialogue Generation with Persona-Adaptive Attention

1 code implementation • 27 Oct 2022 • Qiushi Huang, Yu Zhang, Tom Ko, Xubo Liu, Bo Wu, Wenwu Wang, Lilian Tang

Persona-based dialogue systems aim to generate consistent responses based on historical context and predefined persona.

Dialogue Generation

Paper
Code

Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention

1 code implementation • 28 Oct 2022 • Xubo Liu, Qiushi Huang, Xinhao Mei, Haohe Liu, Qiuqiang Kong, Jianyuan Sun, Shengchen Li, Tom Ko, Yu Zhang, Lilian H. Tang, Mark D. Plumbley, Volkan Kılıç, Wenwu Wang

Audio captioning aims to generate text descriptions of audio clips.

AudioCaps Audio captioning +1

Paper
Code

Leveraging per Image-Token Consistency for Vision-Language Pre-training

no code implementations • CVPR 2023 • Yunhao Gou, Tom Ko, Hansi Yang, James Kwok, Yu Zhang, Mingxuan Wang

(2) Under-utilization of the unmasked tokens: CMLM primarily focuses on the masked token but it cannot simultaneously leverage other tokens to learn vision-language associations.

Language Modelling Masked Language Modeling +1

Paper
Add Code

M3ST: Mix at Three Levels for Speech Translation

no code implementations • 7 Dec 2022 • Xuxin Cheng, Qianqian Dong, Fengpeng Yue, Tom Ko, Mingxuan Wang, Yuexian Zou

How to solve the data scarcity problem for end-to-end speech-to-text translation (ST)?

Data Augmentation Machine Translation +3

Paper
Add Code

WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research

3 code implementations • 30 Mar 2023 • Xinhao Mei, Chutong Meng, Haohe Liu, Qiuqiang Kong, Tom Ko, Chengqi Zhao, Mark D. Plumbley, Yuexian Zou, Wenwu Wang

To address this data scarcity issue, we introduce WavCaps, the first large-scale weakly-labelled audio captioning dataset, comprising approximately 400k audio clips with paired captions.

Ranked #1 on Zero-Shot Environment Sound Classification on ESC-50 (using extra training data)

Audio captioning Event Detection +6

170

Paper
Code

DUB: Discrete Unit Back-translation for Speech Translation

1 code implementation • 19 May 2023 • Dong Zhang, Rong Ye, Tom Ko, Mingxuan Wang, Yaqian Zhou

The key point is to bridge the modality gap between speech and text so that useful MT techniques can be applied to ST.

Machine Translation Speech-to-Text Translation +1

Paper
Code

CTC-based Non-autoregressive Speech Translation

1 code implementation • 27 May 2023 • Chen Xu, Xiaoqian Liu, Xiaowen Liu, Qingxuan Sun, Yuhao Zhang, Murun Yang, Qianqian Dong, Tom Ko, Mingxuan Wang, Tong Xiao, Anxiang Ma, Jingbo Zhu

Combining end-to-end speech translation (ST) and non-autoregressive (NAR) generation is promising in language and speech processing for their advantages of less error propagation and low latency.

Translation

Paper
Code

PolyVoice: Language Models for Speech to Speech Translation

no code implementations • 5 Jun 2023 • Qianqian Dong, Zhiying Huang, Qiao Tian, Chen Xu, Tom Ko, Yunlong Zhao, Siyuan Feng, Tang Li, Kexin Wang, Xuxin Cheng, Fengpeng Yue, Ye Bai, Xi Chen, Lu Lu, Zejun Ma, Yuping Wang, Mingxuan Wang, Yuxuan Wang

For the speech synthesis part, we adopt the existing VALL-E X approach and build a unit-based audio language model.

Language Modelling Speech Synthesis +2

Paper
Add Code

MOSPC: MOS Prediction Based on Pairwise Comparison

no code implementations • 18 Jun 2023 • Kexin Wang, Yunlong Zhao, Qianqian Dong, Tom Ko, Mingxuan Wang

And our framework also surpasses the strong baseline in ranking accuracy on each fine-grained segment.

Paper
Add Code

Recent Advances in Direct Speech-to-text Translation

no code implementations • 20 Jun 2023 • Chen Xu, Rong Ye, Qianqian Dong, Chengqi Zhao, Tom Ko, Mingxuan Wang, Tong Xiao, Jingbo Zhu

Recently, speech-to-text translation has attracted more and more attention and many studies have emerged rapidly.

Data Augmentation Knowledge Distillation +2

Paper
Add Code

RepCodec: A Speech Representation Codec for Speech Tokenization

1 code implementation • 31 Aug 2023 • Zhichao Huang, Chutong Meng, Tom Ko

To improve the performance of these discrete speech tokens, we present RepCodec, a novel speech representation codec for semantic speech tokenization.

Language Modelling Quantization

Paper
Code

Speech Translation with Large Language Models: An Industrial Practice

no code implementations • 21 Dec 2023 • Zhichao Huang, Rong Ye, Tom Ko, Qianqian Dong, Shanbo Cheng, Mingxuan Wang, Hang Li

Given the great success of large language models (LLMs) across various tasks, in this paper, we introduce LLM-ST, a novel and effective speech translation model constructed upon a pre-trained LLM.

Language Modelling Large Language Model +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.