Search Results for author: Yonghong Yan

Found 30 papers, 4 papers with code

Alternative Pseudo-Labeling for Semi-Supervised Automatic Speech Recognition

no code implementations • 12 Aug 2023 • Han Zhu, Dongji Gao, Gaofeng Cheng, Daniel Povey, Pengyuan Zhang, Yonghong Yan

Firstly, a generalized CTC loss function is introduced to handle noisy pseudo-labels by accepting alternative tokens in the positions of incorrect tokens.

Automatic Speech Recognition speech-recognition +1

Paper
Add Code

Online Hybrid CTC/Attention End-to-End Automatic Speech Recognition Architecture

no code implementations • 5 Jul 2023 • Haoran Miao, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

However, how to deploy hybrid CTC/attention systems for online speech recognition is still a non-trivial problem.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Speech Corpora Divergence Based Unsupervised Data Selection for ASR

no code implementations • 26 Feb 2023 • Changfeng Gao, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

This study proposes a unsupervised target-aware data selection method based on speech corpora divergence (SCD), which can measure the similarity between two speech corpora.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Summary on the ISCSLP 2022 Chinese-English Code-Switching ASR Challenge

no code implementations • 12 Oct 2022 • Shuhao Deng, Chengfei Li, Jinfeng Bai, Qingqing Zhang, Wei-Qiang Zhang, Runyan Yang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

Code-switching automatic speech recognition becomes one of the most challenging and the most valuable scenarios of automatic speech recognition, due to the code-switching phenomenon between multilingual language and the frequent occurrence of code-switching phenomenon in daily life.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

The Conversational Short-phrase Speaker Diarization (CSSD) Task: Dataset, Evaluation Metric and Baselines

1 code implementation • 17 Aug 2022 • Gaofeng Cheng, Yifan Chen, Runyan Yang, Qingxuan Li, Zehui Yang, Lingxuan Ye, Pengyuan Zhang, Qingqing Zhang, Lei Xie, Yanmin Qian, Kong Aik Lee, Yonghong Yan

In the metric aspect, we design the new conversational DER (CDER) evaluation metric, which calculates the SD accuracy at the utterance level.

Machine Translation speaker-diarization +1

Paper
Code

Improving Streaming End-to-End ASR on Transformer-based Causal Models with Encoder States Revision Strategies

no code implementations • 6 Jul 2022 • Zehan Li, Haoran Miao, Keqi Deng, Gaofeng Cheng, Sanli Tian, Ta Li, Yonghong Yan

Firstly, we introduce a real-time encoder states revision strategy to modify previous states.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Interrelate Training and Searching: A Unified Online Clustering Framework for Speaker Diarization

no code implementations • 28 Jun 2022 • Yifan Chen, Yifan Guo, Qingxuan Li, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

For online speaker diarization, samples arrive incrementally, and the overall distribution of the samples is invisible.

Clustering Online Clustering +2

Paper
Add Code

Boosting Cross-Domain Speech Recognition with Self-Supervision

1 code implementation • 20 Jun 2022 • Han Zhu, Gaofeng Cheng, Jindong Wang, Wenxin Hou, Pengyuan Zhang, Yonghong Yan

The cross-domain performance of automatic speech recognition (ASR) could be severely hampered due to the mismatch between training and testing distributions.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Paper
Code

Decoupled Federated Learning for ASR with Non-IID Data

no code implementations • 18 Jun 2022 • Han Zhu, Jindong Wang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

Secondly, to reduce the communication and computation costs, we propose decoupled federated learning (DecoupleFL).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Open Source MagicData-RAMC: A Rich Annotated Mandarin Conversational(RAMC) Speech Dataset

no code implementations • 31 Mar 2022 • Zehui Yang, Yifan Chen, Lei Luo, Runyan Yang, Lingxuan Ye, Gaofeng Cheng, Ji Xu, Yaohui Jin, Qingqing Zhang, Pengyuan Zhang, Lei Xie, Yonghong Yan

As a Mandarin speech dataset designed for dialog scenarios with high quality and rich annotations, MagicData-RAMC enriches the data diversity in the Mandarin speech community and allows extensive research on a series of speech-related tasks, including automatic speech recognition, speaker diarization, topic detection, keyword search, text-to-speech, etc.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

The Brain Tumor Sequence Registration (BraTS-Reg) Challenge: Establishing Correspondence Between Pre-Operative and Follow-up MRI Scans of Diffuse Glioma Patients

no code implementations • 13 Dec 2021 • Bhakti Baheti, Satrajit Chakrabarty, Hamed Akbari, Michel Bilello, Benedikt Wiestler, Julian Schwarting, Evan Calabrese, Jeffrey Rudie, Syed Abidi, Mina Mousa, Javier Villanueva-Meyer, Brandon K. K. Fields, Florian Kofler, Russell Takeshi Shinohara, Juan Eugenio Iglesias, Tony C. W. Mok, Albert C. S. Chung, Marek Wodzinski, Artur Jurgas, Niccolo Marini, Manfredo Atzori, Henning Muller, Christoph Grobroehmer, Hanna Siebert, Lasse Hansen, Mattias P. Heinrich, Luca Canalini, Jan Klein, Annika Gerken, Stefan Heldmann, Alessa Hering, Horst K. Hahn, Mingyuan Meng, Lei Bi, Dagan Feng, Jinman Kim, Ramy A. Zeineldin, Mohamed E. Karar, Franziska Mathis-Ullrich, Oliver Burgert, Javid Abderezaei, Aymeric Pionteck, Agamdeep Chopra, Mehmet Kurt, Kewei Yan, Yonghong Yan, Zhe Tang, Jianqiang Ma, Sahar Almahfouz Nasser, Nikhil Cherian Kurian, Mohit Meena, Saqib Shamsi, Amit Sethi, Nicholas J. Tustison, Brian B. Avants, Philip Cook, James C. Gee, Lin Tian, Hastings Greer, Marc Niethammer, Andrew Hoopes, Malte Hoffmann, Adrian V. Dalca, Stergios Christodoulidis, Theo Estiene, Maria Vakalopoulou, Nikos Paragios, Daniel S. Marcus, Christos Davatzikos, Aristeidis Sotiras, Bjoern Menze, Spyridon Bakas, Diana Waldmannstetter

Registration of longitudinal brain MRI scans containing pathologies is challenging due to dramatic changes in tissue appearance.

Descriptive

Paper
Add Code

Decomposing Complex Questions Makes Multi-Hop QA Easier and More Interpretable

1 code implementation • Findings (EMNLP) 2021 • Ruiliu Fu, Han Wang, Xuejun Zhang, Jun Zhou, Yonghong Yan

The Relation Extractor decomposes the complex question, and then the Reader answers the sub-questions in turn, and finally the Comparator performs numerical comparison and summarizes all to get the final answer, where the entire process itself constitutes a complete reasoning evidence path.

Relation

Paper
Code

Reminding the Incremental Language Model via Data-Free Self-Distillation

no code implementations • 17 Oct 2021 • Han Wang, Ruiliu Fu, Chengzhang Li, Xuejun Zhang, Jun Zhou, Yonghong Yan

Incremental language learning with pseudo-data can alleviate catastrophic forgetting in neural networks.

Data Augmentation Language Modelling

Paper
Add Code

Wav2vec-S: Semi-Supervised Pre-Training for Low-Resource ASR

no code implementations • 9 Oct 2021 • Han Zhu, Li Wang, Jindong Wang, Gaofeng Cheng, Pengyuan Zhang, Yonghong Yan

In this work, in order to build a better pre-trained model for low-resource ASR, we propose a pre-training approach called wav2vec-S, where we use task-specific semi-supervised pre-training to refine the self-supervised pre-trained model for the ASR task thus more effectively utilize the capacity of the pre-trained model to generate task-specific representations for ASR.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Improved Conformer-based End-to-End Speech Recognition Using Neural Architecture Search

no code implementations • 12 Apr 2021 • Yukun Liu, Ta Li, Pengyuan Zhang, Yonghong Yan

Recently neural architecture search(NAS) has been successfully used in image classification, natural language processing, and automatic speech recognition(ASR) tasks for finding the state-of-the-art(SOTA) architectures than those human-designed architectures.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Paper
Add Code

Multi-Accent Adaptation based on Gate Mechanism

no code implementations • 5 Nov 2020 • Han Zhu, Li Wang, Pengyuan Zhang, Yonghong Yan

To jointly train the acoustic model and the accent classifier, we propose the multi-task learning with gate mechanism (MTL-G).

Multi-Task Learning speech-recognition +1

Paper
Add Code

A Model Compression Method with Matrix Product Operators for Speech Enhancement

no code implementations • 10 Oct 2020 • Xingwei Sun, Ze-Feng Gao, Zhong-Yi Lu, Junfeng Li, Yonghong Yan

In this paper, we propose a model compression method based on matrix product operators (MPO) to substantially reduce the number of parameters in DNN models for speech enhancement.

Model Compression Speech Enhancement

Paper
Add Code

Transformer-based Online CTC/attention End-to-End Speech Recognition Architecture

no code implementations • 15 Jan 2020 • Haoran Miao, Gaofeng Cheng, Changfeng Gao, Pengyuan Zhang, Yonghong Yan

To support the online recognition, we integrate the state reuse chunk-SAE and the MTA based SAD into online CTC/attention architecture.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Integrating the Data Augmentation Scheme with Various Classifiers for Acoustic Scene Modeling

no code implementations • 15 Jul 2019 • Hangting Chen, Zuozhen Liu, Zongming Liu, Pengyuan Zhang, Yonghong Yan

This technical report describes the IOA team's submission for TASK1A of DCASE2019 challenge.

Acoustic Scene Classification Data Augmentation +1

Paper
Add Code

HCCL at SemEval-2018 Task 8: An End-to-End System for Sequence Labeling from Cybersecurity Reports

no code implementations • SEMEVAL 2018 • Mingming Fu, Xuemin Zhao, Yonghong Yan

This paper describes HCCL team systems that participated in SemEval 2018 Task 8: SecureNLP (Semantic Extraction from cybersecurity reports using NLP).

Feature Engineering Named Entity Recognition (NER) +1

Paper
Add Code

Discriminating between Similar Languages on Imbalanced Conversational Texts

no code implementations • LREC 2018 • Junqing He, Xian Huang, Xuemin Zhao, Yan Zhang, Yonghong Yan

Language Identification

Paper
Add Code

HCCL at SemEval-2017 Task 2: Combining Multilingual Word Embeddings and Transliteration Model for Semantic Similarity

no code implementations • SEMEVAL 2017 • Junqing He, Long Wu, Xuemin Zhao, Yonghong Yan

In this paper, we introduce an approach to combining word embeddings and machine translation for multilingual semantic word similarity, the task2 of SemEval-2017.

Cross-Lingual Word Embeddings Multilingual Word Embeddings +9

Paper
Add Code

Rank-1 Constrained Multichannel Wiener Filter for Speech Recognition in Noisy Environments

1 code implementation • 1 Jul 2017 • Ziteng Wang, Emmanuel Vincent, Romain Serizel, Yonghong Yan

Multichannel linear filters, such as the Multichannel Wiener Filter (MWF) and the Generalized Eigenvalue (GEV) beamformer are popular signal processing techniques which can improve speech recognition performance.

speech-recognition Speech Recognition

Paper
Code

Optimizing human-interpretable dialog management policy using Genetic Algorithm

no code implementations • 12 May 2016 • Hang Ren, Weiqun Xu, Yonghong Yan

Automatic optimization of spoken dialog management policies that are robust to environmental noise has long been the goal for both academia and industry.

Management reinforcement-learning +2

Paper
Add Code

Noise Robust IOA/CAS Speech Separation and Recognition System For The Third 'CHIME' Challenge

no code implementations • 21 Sep 2015 • Xiaofei Wang, Chao Wu, Pengyuan Zhang, Ziteng Wang, Yong liu, Xu Li, Qiang Fu, Yonghong Yan

This paper presents the contribution to the third 'CHiME' speech separation and recognition challenge including both front-end signal processing and back-end speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3