Search Results for author: Zhifu Gao

Found 8 papers, 6 papers with code

An Embarrassingly Simple Approach for LLM with Strong ASR Capacity

no code implementations13 Feb 2024 Ziyang Ma, Guanrou Yang, Yifan Yang, Zhifu Gao, JiaMing Wang, Zhihao Du, Fan Yu, Qian Chen, Siqi Zheng, Shiliang Zhang, Xie Chen

We found that delicate designs are not necessary, while an embarrassingly simple composition of off-the-shelf speech encoder, LLM, and the only trainable linear projector is competent for the ASR task.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

emotion2vec: Self-Supervised Pre-Training for Speech Emotion Representation

2 code implementations23 Dec 2023 Ziyang Ma, Zhisheng Zheng, Jiaxin Ye, Jinchao Li, Zhifu Gao, Shiliang Zhang, Xie Chen

To the best of our knowledge, emotion2vec is the first universal representation model in various emotion-related tasks, filling a gap in the field.

Self-Supervised Learning Sentiment Analysis +1

SeACo-Paraformer: A Non-Autoregressive ASR System with Flexible and Effective Hotword Customization Ability

2 code implementations7 Aug 2023 Xian Shi, Yexin Yang, Zerui Li, Yanni Chen, Zhifu Gao, Shiliang Zhang

It possesses the advantages of AED-based model's accuracy, NAR model's efficiency, and explicit customization capacity of superior performance.

Accurate and Reliable Confidence Estimation Based on Non-Autoregressive End-to-End Speech Recognition System

no code implementations18 May 2023 Xian Shi, Haoneng Luo, Zhifu Gao, Shiliang Zhang, Zhijie Yan

Estimating confidence scores for recognition results is a classic task in ASR field and of vital importance for kinds of downstream tasks and training strategies.

speech-recognition Speech Recognition

FunASR: A Fundamental End-to-End Speech Recognition Toolkit

1 code implementation18 May 2023 Zhifu Gao, Zerui Li, JiaMing Wang, Haoneng Luo, Xian Shi, Mengzhe Chen, Yabin Li, Lingyun Zuo, Zhihao Du, Zhangyu Xiao, Shiliang Zhang

FunASR offers models trained on large-scale industrial corpora and the ability to deploy them in applications.

 Ranked #1 on Speech Recognition on WenetSpeech (using extra training data)

Action Detection Activity Detection +2

Paraformer: Fast and Accurate Parallel Transformer for Non-autoregressive End-to-End Speech Recognition

2 code implementations16 Jun 2022 Zhifu Gao, Shiliang Zhang, Ian McLoughlin, Zhijie Yan

However, due to an independence assumption within the output tokens, performance of single-step NAR is inferior to that of AR models, especially with a large-scale corpus.

Language Modelling speech-recognition +1

Streaming Chunk-Aware Multihead Attention for Online End-to-End Speech Recognition

1 code implementation21 May 2020 Shiliang Zhang, Zhifu Gao, Haoneng Luo, Ming Lei, Jie Gao, Zhijie Yan, Lei Xie

Recently, streaming end-to-end automatic speech recognition (E2E-ASR) has gained more and more attention.

Sound Audio and Speech Processing

Cannot find the paper you are looking for? You can Submit a new open access paper.