Search Results for author: Yifan Gong

Found 67 papers, 9 papers with code

End-to-End Attention based Text-Dependent Speaker Verification

no code implementations • 3 Jan 2017 • Shi-Xiong Zhang, Zhuo Chen, Yong Zhao, Jinyu Li, Yifan Gong

A new type of End-to-End system for text-dependent speaker verification is presented in this paper.

Text-Dependent Speaker Verification

Paper
Add Code

Large-Scale Domain Adaptation via Teacher-Student Learning

no code implementations • 17 Aug 2017 • Jinyu Li, Michael L. Seltzer, Xi Wang, Rui Zhao, Yifan Gong

High accuracy speech recognition requires a large amount of transcribed data for supervised training.

Domain Adaptation speech-recognition +1

Paper
Add Code

Unsupervised Adaptation with Domain Separation Networks for Robust Speech Recognition

no code implementations • 21 Nov 2017 • Zhong Meng, Zhuo Chen, Vadim Mazalov, Jinyu Li, Yifan Gong

Unsupervised domain adaptation of speech signal aims at adapting a well-trained source-domain acoustic model to the unlabeled data from target domain.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +5

Paper
Add Code

Acoustic-To-Word Model Without OOV

no code implementations • 28 Nov 2017 • Jinyu Li, Guoli Ye, Rui Zhao, Jasha Droppo, Yifan Gong

However, this type of word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node.

Paper
Add Code

Advancing Acoustic-to-Word CTC Model

no code implementations • 15 Mar 2018 • Jinyu Li, Guoli Ye, Amit Das, Rui Zhao, Yifan Gong

However, the word-based CTC model suffers from the out-of-vocabulary (OOV) issue as it can only model limited number of words in the output layer and maps all the remaining words into an OOV output node.

Language Modelling

Paper
Add Code

Advancing Connectionist Temporal Classification With Attention Modeling

no code implementations • 15 Mar 2018 • Amit Das, Jinyu Li, Rui Zhao, Yifan Gong

In this study, we propose advancing all-neural speech recognition by directly incorporating attention modeling within the Connectionist Temporal Classification (CTC) framework.

Classification General Classification +3

Paper
Add Code

Adversarial Teacher-Student Learning for Unsupervised Domain Adaptation

no code implementations • 2 Apr 2018 • Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang, Juang

In this method, a student acoustic model and a condition classifier are jointly optimized to minimize the Kullback-Leibler divergence between the output distributions of the teacher and student models, and simultaneously, to min-maximize the condition classification loss.

Transfer Learning Unsupervised Domain Adaptation

Paper
Add Code

Speaker-Invariant Training via Adversarial Learning

no code implementations • 2 Apr 2018 • Zhong Meng, Jinyu Li, Zhuo Chen, Yong Zhao, Vadim Mazalov, Yifan Gong, Biing-Hwang, Juang

We propose a novel adversarial multi-task learning scheme, aiming at actively curtailing the inter-talker feature variability while maximizing its senone discriminability so as to enhance the performance of a deep neural network (DNN) based ASR system.

General Classification Multi-Task Learning

Paper
Add Code

Developing Far-Field Speaker System Via Teacher-Student Learning

no code implementations • 14 Apr 2018 • Jinyu Li, Rui Zhao, Zhuo Chen, Changliang Liu, Xiong Xiao, Guoli Ye, Yifan Gong

In this study, we develop the keyword spotting (KWS) and acoustic model (AM) components in a far-field speaker system.

Keyword Spotting Model Compression

Paper
Add Code

Layer Trajectory LSTM

no code implementations • 28 Aug 2018 • Jinyu Li, Changliang Liu, Yifan Gong

In this paper, we propose a layer trajectory LSTM (ltLSTM) which builds a layer-LSTM using all the layer outputs from a standard multi-layer time-LSTM.

Paper
Add Code

Cycle-Consistent Speech Enhancement

no code implementations • 6 Sep 2018 • Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang, Juang

In this paper, we propose a cycle-consistent speech enhancement (CSE) in which an additional inverse mapping network is introduced to reconstruct the noisy features from the enhanced ones.

Multi-Task Learning Speech Enhancement

Paper
Add Code

Adversarial Feature-Mapping for Speech Enhancement

no code implementations • 6 Sep 2018 • Zhong Meng, Jinyu Li, Yifan Gong, Biing-Hwang, Juang

To achieve better performance on ASR task, senone-aware (SA) AFM is further proposed in which an acoustic model network is jointly trained with the feature-mapping and discriminator networks to optimize the senone classification loss in addition to the AFM losses.

Speech Enhancement

Paper
Add Code

Advancing Acoustic-to-Word CTC Model with Attention and Mixed-Units

no code implementations • 31 Dec 2018 • Amit Das, Jinyu Li, Guoli Ye, Rui Zhao, Yifan Gong

In particular, we introduce Attention CTC, Self-Attention CTC, Hybrid CTC, and Mixed-unit CTC.

Language Modelling

Paper
Add Code

Speaker Adaptation for End-to-End CTC Models

no code implementations • 4 Jan 2019 • Ke Li, Jinyu Li, Yong Zhao, Kshitiz Kumar, Yifan Gong

We propose two approaches for speaker adaptation in end-to-end (E2E) automatic speech recognition systems.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Conditional Teacher-Student Learning

no code implementations • 28 Apr 2019 • Zhong Meng, Jinyu Li, Yong Zhao, Yifan Gong

To overcome this problem, we propose a conditional T/S learning scheme, in which a "smart" student model selectively chooses to learn from either the teacher model or the ground truth labels conditioned on whether the teacher can correctly predict the ground truth.

Domain Adaptation Model Compression

Paper
Add Code

Attentive Adversarial Learning for Domain-Invariant Training

no code implementations • 28 Apr 2019 • Zhong Meng, Jinyu Li, Yifan Gong

Adversarial domain-invariant training (ADIT) proves to be effective in suppressing the effects of domain variability in acoustic modeling and has led to improved performance in automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Adversarial Speaker Verification

no code implementations • 29 Apr 2019 • Zhong Meng, Yong Zhao, Jinyu Li, Yifan Gong

The use of deep networks to extract embeddings for speaker recognition has proven successfully.

General Classification Speaker Recognition +1

Paper
Add Code

Adversarial Speaker Adaptation

no code implementations • 29 Apr 2019 • Zhong Meng, Jinyu Li, Yifan Gong

We propose a novel adversarial speaker adaptation (ASA) scheme, in which adversarial learning is applied to regularize the distribution of deep hidden features in a speaker-dependent (SD) deep neural network (DNN) acoustic model to be close to that of a fixed speaker-independent (SI) DNN acoustic model during adaptation.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Encrypted Speech Recognition using Deep Polynomial Networks

no code implementations • 11 May 2019 • Shi-Xiong Zhang, Yifan Gong, Dong Yu

One good property of the DPN is that it can be trained on unencrypted speech features in the traditional way.

speech-recognition Speech Recognition

Paper
Add Code

PyKaldi2: Yet another speech toolkit based on Kaldi and PyTorch

1 code implementation • 12 Jul 2019 • Liang Lu, Xiong Xiao, Zhuo Chen, Yifan Gong

While similar toolkits are available built on top of the two, a key feature of PyKaldi2 is sequence training with criteria such as MMI, sMBR and MPE.

speech-recognition Speech Recognition

173

Paper
Code

Self-Teaching Networks

no code implementations • 9 Sep 2019 • Liang Lu, Eric Sun, Yifan Gong

Furthermore, the auxiliary loss also works as a regularizer, which improves the generalization capacity of the network.

speech-recognition Speech Recognition

Paper
Add Code

Improving RNN Transducer Modeling for End-to-End Speech Recognition

1 code implementation • 26 Sep 2019 • Jinyu Li, Rui Zhao, Hu Hu, Yifan Gong

In this paper, we improve the RNN-T training in two aspects.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Speaker Adaptation for Attention-Based End-to-End Speech Recognition

no code implementations • 9 Nov 2019 • Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong

We propose three regularization-based speaker adaptation approaches to adapt the attention-based encoder-decoder (AED) model with very limited adaptation data from target speakers for end-to-end automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Advances in Online Audio-Visual Meeting Transcription

no code implementations • 10 Dec 2019 • Takuya Yoshioka, Igor Abramovski, Cem Aksoylar, Zhuo Chen, Moshe David, Dimitrios Dimitriadis, Yifan Gong, Ilya Gurvich, Xuedong Huang, Yan Huang, Aviv Hurvitz, Li Jiang, Sharon Koubi, Eyal Krupka, Ido Leichter, Changliang Liu, Partha Parthasarathy, Alon Vinnikov, Lingfeng Wu, Xiong Xiao, Wayne Xiong, Huaming Wang, Zhenghao Wang, Jun Zhang, Yong Zhao, Tianyan Zhou

This increases marginally to 1. 6% when 50% of the attendees are unknown to the system.

speaker-diarization Speaker Diarization +2

Paper
Add Code

Character-Aware Attention-Based End-to-End Speech Recognition

no code implementations • 6 Jan 2020 • Zhong Meng, Yashesh Gaur, Jinyu Li, Yifan Gong

However, as one input to the decoder recurrent neural network (RNN), each WSU embedding is learned independently through context and acoustic information in a purely data-driven fashion.

speech-recognition Speech Recognition

Paper
Add Code

Domain Adaptation via Teacher-Student Learning for End-to-End Speech Recognition

no code implementations • 6 Jan 2020 • Zhong Meng, Jinyu Li, Yashesh Gaur, Yifan Gong

In this work, we extend the T/S learning to large-scale unsupervised domain adaptation of an attention-based end-to-end (E2E) model through two levels of knowledge transfer: teacher's token posteriors as soft labels and one-best predictions as decoder guidance.

speech-recognition Speech Recognition +2

Paper
Add Code

BLK-REW: A Unified Block-based DNN Pruning Framework using Reweighted Regularization Method

no code implementations • 23 Jan 2020 • Xiaolong Ma, Zhengang Li, Yifan Gong, Tianyun Zhang, Wei Niu, Zheng Zhan, Pu Zhao, Jian Tang, Xue Lin, Bin Ren, Yanzhi Wang

Accelerating DNN execution on various resource-limited computing platforms has been a long-standing problem.

Paper
Add Code

SS-Auto: A Single-Shot, Automatic Structured Weight Pruning Framework of DNNs with Ultra-High Efficiency

no code implementations • 23 Jan 2020 • Zhengang Li, Yifan Gong, Xiaolong Ma, Sijia Liu, Mengshu Sun, Zheng Zhan, Zhenglun Kong, Geng Yuan, Yanzhi Wang

Structured weight pruning is a representative model compression technique of DNNs for hardware efficiency and inference accelerations.

Model Compression

Paper
Add Code

RTMobile: Beyond Real-Time Mobile Acceleration of RNNs for Speech Recognition

no code implementations • 19 Feb 2020 • Peiyan Dong, Siyue Wang, Wei Niu, Chengming Zhang, Sheng Lin, Zhengang Li, Yifan Gong, Bin Ren, Xue Lin, Yanzhi Wang, Dingwen Tao

Recurrent neural networks (RNNs) based automatic speech recognition has nowadays become prevalent on mobile devices such as smart phones.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

A Privacy-Preserving-Oriented DNN Pruning and Mobile Acceleration Framework

no code implementations • 13 Mar 2020 • Yifan Gong, Zheng Zhan, Zhengang Li, Wei Niu, Xiaolong Ma, Wenhao Wang, Bin Ren, Caiwen Ding, Xue Lin, Xiao-Lin Xu, Yanzhi Wang

Weight pruning of deep neural networks (DNNs) has been proposed to satisfy the limited storage and computing capability of mobile edge devices.

Model Compression Privacy Preserving

Paper
Add Code

High-Accuracy and Low-Latency Speech Recognition with Two-Head Contextual Layer Trajectory LSTM Model

no code implementations • 17 Mar 2020 • Jinyu Li, Rui Zhao, Eric Sun, Jeremy H. M. Wong, Amit Das, Zhong Meng, Yifan Gong

While the community keeps promoting end-to-end models over conventional hybrid models, which usually are long short-term memory (LSTM) models trained with a cross entropy criterion followed by a sequence discriminative training criterion, we argue that such conventional hybrid models can still be significantly improved.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Minimum Latency Training Strategies for Streaming Sequence-to-Sequence ASR

no code implementations • 10 Apr 2020 • Hirofumi Inaguma, Yashesh Gaur, Liang Lu, Jinyu Li, Yifan Gong

This leads to an inevitable latency during inference.

Multi-Task Learning speech-recognition +1

Paper
Add Code

L-Vector: Neural Label Embedding for Domain Adaptation

no code implementations • 25 Apr 2020 • Zhong Meng, Hu Hu, Jinyu Li, Changliang Liu, Yan Huang, Yifan Gong, Chin-Hui Lee

We propose a novel neural label embedding (NLE) scheme for the domain adaptation of a deep neural network (DNN) acoustic model with unpaired data samples from source and target domains.

Domain Adaptation

Paper
Add Code

Exploring Pre-training with Alignments for RNN Transducer based End-to-End Speech Recognition

no code implementations • 1 May 2020 • Hu Hu, Rui Zhao, Jinyu Li, Liang Lu, Yifan Gong

Recently, the recurrent neural network transducer (RNN-T) architecture has become an emerging trend in end-to-end automatic speech recognition research due to its advantages of being capable for online streaming speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Exploring Transformers for Large-Scale Speech Recognition

no code implementations • 19 May 2020 • Liang Lu, Changliang Liu, Jinyu Li, Yifan Gong

While recurrent neural networks still largely define state-of-the-art speech recognition systems, the Transformer network has been proven to be a competitive alternative, especially in the offline condition.

speech-recognition Speech Recognition

Paper
Add Code

Developing RNN-T Models Surpassing High-Performance Hybrid Models with Customization Capability

no code implementations • 30 Jul 2020 • Jinyu Li, Rui Zhao, Zhong Meng, Yanqing Liu, Wenning Wei, Sarangarajan Parthasarathy, Vadim Mazalov, Zhenghao Wang, Lei He, Sheng Zhao, Yifan Gong

Because of its streaming nature, recurrent neural network transducer (RNN-T) is a very promising end-to-end (E2E) model that may replace the popular hybrid model for automatic speech recognition.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Speaker Separation Using Speaker Inventories and Estimated Speech

no code implementations • 20 Oct 2020 • Peidong Wang, Zhuo Chen, DeLiang Wang, Jinyu Li, Yifan Gong

We propose speaker separation using speaker inventories and estimated speech (SSUSIES), a framework leveraging speaker profiles and estimated speech for speaker separation.

Speaker Separation Speech Extraction +2

Paper
Add Code

On Minimum Word Error Rate Training of the Hybrid Autoregressive Transducer

no code implementations • 23 Oct 2020 • Liang Lu, Zhong Meng, Naoyuki Kanda, Jinyu Li, Yifan Gong

Hybrid Autoregressive Transducer (HAT) is a recently proposed end-to-end acoustic model that extends the standard Recurrent Neural Network Transducer (RNN-T) for the purpose of the external language model (LM) fusion.

Language Modelling speech-recognition +1

Paper
Add Code

Internal Language Model Estimation for Domain-Adaptive End-to-End Speech Recognition

no code implementations • 3 Nov 2020 • Zhong Meng, Sarangarajan Parthasarathy, Eric Sun, Yashesh Gaur, Naoyuki Kanda, Liang Lu, Xie Chen, Rui Zhao, Jinyu Li, Yifan Gong

The external language models (LM) integration remains a challenging task for end-to-end (E2E) automatic speech recognition (ASR) which has no clear division between acoustic and language models.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Streaming end-to-end multi-talker speech recognition

no code implementations • 26 Nov 2020 • Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong

End-to-end multi-talker speech recognition is an emerging research trend in the speech community due to its vast potential in applications such as conversation and meeting transcriptions.

speech-recognition Speech Recognition

Paper
Add Code

Internal Language Model Training for Domain-Adaptive End-to-End Speech Recognition

no code implementations • 2 Feb 2021 • Zhong Meng, Naoyuki Kanda, Yashesh Gaur, Sarangarajan Parthasarathy, Eric Sun, Liang Lu, Xie Chen, Jinyu Li, Yifan Gong

The efficacy of external language model (LM) integration with existing end-to-end (E2E) automatic speech recognition (ASR) systems can be improved significantly using the internal language model estimation (ILME) method.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Streaming Multi-talker Speech Recognition with Joint Speaker Identification

no code implementations • 5 Apr 2021 • Liang Lu, Naoyuki Kanda, Jinyu Li, Yifan Gong

In multi-talker scenarios such as meetings and conversations, speech processing systems are usually required to transcribe the audio as well as identify the speakers for downstream applications.

Speaker Identification speech-recognition +2

Paper
Add Code

On Addressing Practical Challenges for RNN-Transducer

no code implementations • 27 Apr 2021 • Rui Zhao, Jian Xue, Jinyu Li, Wenning Wei, Lei He, Yifan Gong

The first challenge is solved with a splicing data method which concatenates the speech segments extracted from the source domain data.

speech-recognition Speech Recognition

Paper
Add Code

Minimum Word Error Rate Training with Language Model Fusion for End-to-End Speech Recognition

no code implementations • 4 Jun 2021 • Zhong Meng, Yu Wu, Naoyuki Kanda, Liang Lu, Xie Chen, Guoli Ye, Eric Sun, Jinyu Li, Yifan Gong

In this work, we perform LM fusion in the minimum WER (MWER) training of an E2E model to obviate the need for LM weights tuning during inference.

Language Modelling speech-recognition +1

Paper
Add Code

Achieving on-Mobile Real-Time Super-Resolution with Neural Architecture and Pruning Search

no code implementations • ICCV 2021 • Zheng Zhan, Yifan Gong, Pu Zhao, Geng Yuan, Wei Niu, Yushu Wu, Tianyun Zhang, Malith Jayaweera, David Kaeli, Bin Ren, Xue Lin, Yanzhi Wang

Though recent years have witnessed remarkable progress in single image super-resolution (SISR) tasks with the prosperous development of deep neural networks (DNNs), the deep learning methods are confronted with the computation and memory consumption issues in practice, especially for resource-limited platforms such as mobile devices.

Image Super-Resolution Neural Architecture Search +1

Paper
Add Code

Diarisation using location tracking with agglomerative clustering

no code implementations • 22 Sep 2021 • Jeremy H. M. Wong, Igor Abramovski, Xiong Xiao, Yifan Gong

Previous works have shown that spatial location information can be complementary to speaker embeddings for a speaker diarisation task.

Clustering

Paper
Add Code

Joint speaker diarisation and tracking in switching state-space model

no code implementations • 23 Sep 2021 • Jeremy H. M. Wong, Yifan Gong

Speakers may move around while diarisation is being performed.

Paper
Add Code

Internal Language Model Adaptation with Text-Only Data for End-to-End Speech Recognition

no code implementations • 6 Oct 2021 • Zhong Meng, Yashesh Gaur, Naoyuki Kanda, Jinyu Li, Xie Chen, Yu Wu, Yifan Gong

ILMA enables a fast text-only adaptation of the E2E model without increasing the run-time computational cost.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Have best of both worlds: two-pass hybrid and E2E cascading framework for speech recognition

no code implementations • 10 Oct 2021 • Guoli Ye, Vadim Mazalov, Jinyu Li, Yifan Gong

Hybrid and end-to-end (E2E) systems have their individual advantages, with different error patterns in the speech recognition results.

speech-recognition Speech Recognition

Paper
Add Code

MEST: Accurate and Fast Memory-Economic Sparse Training Framework on the Edge

1 code implementation • NeurIPS 2021 • Geng Yuan, Xiaolong Ma, Wei Niu, Zhengang Li, Zhenglun Kong, Ning Liu, Yifan Gong, Zheng Zhan, Chaoyang He, Qing Jin, Siyue Wang, Minghai Qin, Bin Ren, Yanzhi Wang, Sijia Liu, Xue Lin

Systematical evaluation on accuracy, training speed, and memory footprint are conducted, where the proposed MEST framework consistently outperforms representative SOTA works.

Paper
Code

Automatic Mapping of the Best-Suited DNN Pruning Schemes for Real-Time Mobile Acceleration

no code implementations • 22 Nov 2021 • Yifan Gong, Geng Yuan, Zheng Zhan, Wei Niu, Zhengang Li, Pu Zhao, Yuxuan Cai, Sijia Liu, Bin Ren, Xue Lin, Xulong Tang, Yanzhi Wang

Weight pruning is an effective model compression technique to tackle the challenges of achieving real-time deep neural network (DNN) inference on mobile devices.

Model Compression

Paper
Add Code

Endpoint Detection for Streaming End-to-End Multi-talker ASR

no code implementations • 24 Jan 2022 • Liang Lu, Jinyu Li, Yifan Gong

Our experimental results based on the 2-speaker LibrispeechMix dataset show that the SURT model can achieve promising EP detection without significantly degradation of the recognition accuracy.

Sentence speech-recognition +2

Paper
Add Code

Reverse Engineering of Imperceptible Adversarial Image Perturbations

2 code implementations • ICLR 2022 • Yifan Gong, Yuguang Yao, Yize Li, Yimeng Zhang, Xiaoming Liu, Xue Lin, Sijia Liu

However, carefully crafted, tiny adversarial perturbations are difficult to recover by optimizing a unilateral RED objective.

Data Augmentation Image Denoising

Paper
Code

Compiler-Aware Neural Architecture Search for On-Mobile Real-time Super-Resolution

1 code implementation • 25 Jul 2022 • Yushu Wu, Yifan Gong, Pu Zhao, Yanyu Li, Zheng Zhan, Wei Niu, Hao Tang, Minghai Qin, Bin Ren, Yanzhi Wang

Instead of measuring the speed on mobile devices at each iteration during the search process, a speed model incorporated with compiler optimizations is leveraged to predict the inference latency of the SR block with various width configurations for faster convergence.

Neural Architecture Search SSIM +1

Paper
Code

SparCL: Sparse Continual Learning on the Edge

1 code implementation • 20 Sep 2022 • Zifeng Wang, Zheng Zhan, Yifan Gong, Geng Yuan, Wei Niu, Tong Jian, Bin Ren, Stratis Ioannidis, Yanzhi Wang, Jennifer Dy

SparCL achieves both training acceleration and accuracy preservation through the synergy of three aspects: weight sparsity, data efficiency, and gradient sparsity.

Continual Learning

Paper
Code

Streaming, fast and accurate on-device Inverse Text Normalization for Automatic Speech Recognition

no code implementations • 7 Nov 2022 • Yashesh Gaur, Nick Kibre, Jian Xue, Kangyuan Shu, Yuhui Wang, Issac Alphanso, Jinyu Li, Yifan Gong

Automatic Speech Recognition (ASR) systems typically yield output in lexical form.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Paper
Add Code

Self-Ensemble Protection: Training Checkpoints Are Good Data Protectors

1 code implementation • 22 Nov 2022 • Sizhe Chen, Geng Yuan, Xinwen Cheng, Yifan Gong, Minghai Qin, Yanzhi Wang, Xiaolin Huang

In this paper, we uncover them by model checkpoints' gradients, forming the proposed self-ensemble protection (SEP), which is very effective because (1) learning on examples ignored during normal training tends to yield DNNs ignoring normal examples; (2) checkpoints' cross-model gradients are close to orthogonal, meaning that they are as diverse as DNNs with different architectures.

Paper
Code

All-in-One: A Highly Representative DNN Pruning Framework for Edge Devices with Dynamic Power Management

no code implementations • 9 Dec 2022 • Yifan Gong, Zheng Zhan, Pu Zhao, Yushu Wu, Chao Wu, Caiwen Ding, Weiwen Jiang, Minghai Qin, Yanzhi Wang

By re-configuring the model to the corresponding pruning ratio for a specific execution frequency (and voltage), we are able to achieve stable inference speed, i. e., keeping the difference in speed performance under various execution frequencies as small as possible.

Management

Paper
Add Code

Building High-accuracy Multilingual ASR with Gated Language Experts and Curriculum Training

no code implementations • 1 Mar 2023 • Eric Sun, Jinyu Li, Yuxuan Hu, Yimeng Zhu, Long Zhou, Jian Xue, Peidong Wang, Linquan Liu, Shujie Liu, Edward Lin, Yifan Gong

We propose gated language experts and curriculum training to enhance multilingual transformer transducer models without requiring language identification (LID) input from users during inference.

Language Identification

Paper
Add Code

Can Adversarial Examples Be Parsed to Reveal Victim Model Information?

1 code implementation • 13 Mar 2023 • Yuguang Yao, Jiancheng Liu, Yifan Gong, Xiaoming Liu, Yanzhi Wang, Xue Lin, Sijia Liu

We call this 'model parsing of adversarial attacks' - a task to uncover 'arcana' in terms of the concealed VM information in attacks.

Adversarial Attack

Paper
Code

DualHSIC: HSIC-Bottleneck and Alignment for Continual Learning

no code implementations • 30 Apr 2023 • Zifeng Wang, Zheng Zhan, Yifan Gong, Yucai Shao, Stratis Ioannidis, Yanzhi Wang, Jennifer Dy

Rehearsal-based approaches are a mainstay of continual learning (CL).

Continual Learning

Paper
Add Code

Adapting Large Language Model with Speech for Fully Formatted End-to-End Speech Recognition

1 code implementation • 17 Jul 2023 • Shaoshi Ling, Yuxuan Hu, Shuangbei Qian, Guoli Ye, Yao Qian, Yifan Gong, Ed Lin, Michael Zeng

Most end-to-end (E2E) speech recognition models are composed of encoder and decoder blocks that perform acoustic and language modeling functions.

Language Modelling Large Language Model +2

60,075

Paper
Code

Pre-training End-to-end ASR Models with Augmented Speech Samples Queried by Text

no code implementations • 30 Jul 2023 • Eric Sun, Jinyu Li, Jian Xue, Yifan Gong

When mixing 20, 000 hours augmented speech data generated by our method with 12, 500 hours original transcribed speech data for Italian Transformer transducer model pre-training, we achieve 8. 7% relative word error rate reduction.

Automatic Speech Recognition Data Augmentation +2

Paper
Add Code

Hybrid Attention-based Encoder-decoder Model for Efficient Language Model Adaptation

no code implementations • 14 Sep 2023 • Shaoshi Ling, Guoli Ye, Rui Zhao, Yifan Gong

Attention-based encoder-decoder (AED) speech recognition model has been widely successful in recent years.

Automatic Speech Recognition Language Modelling +2

Paper
Add Code

Value FULCRA: Mapping Large Language Models to the Multidimensional Spectrum of Basic Human Values

no code implementations • 15 Nov 2023 • Jing Yao, Xiaoyuan Yi, Xiting Wang, Yifan Gong, Xing Xie

The rapid advancement of Large Language Models (LLMs) has attracted much attention to value alignment for their responsible development.

Fairness

Paper
Add Code

E$^{2}$GAN: Efficient Training of Efficient GANs for Image-to-Image Translation

no code implementations • 11 Jan 2024 • Yifan Gong, Zheng Zhan, Qing Jin, Yanyu Li, Yerlan Idelbayev, Xian Liu, Andrey Zharkov, Kfir Aberman, Sergey Tulyakov, Yanzhi Wang, Jian Ren

One highly promising direction for enabling flexible real-time on-device image editing is utilizing data distillation by leveraging large-scale text-to-image diffusion models, such as Stable Diffusion, to generate paired datasets used for training generative adversarial networks (GANs).

Image-to-Image Translation

Paper
Add Code

NOTSOFAR-1 Challenge: New Datasets, Baseline, and Tasks for Distant Meeting Transcription

no code implementations • 16 Jan 2024 • Alon Vinnikov, Amir Ivry, Aviv Hurvitz, Igor Abramovski, Sharon Koubi, Ilya Gurvich, Shai Pe`er, Xiong Xiao, Benjamin Martinez Elizalde, Naoyuki Kanda, Xiaofei Wang, Shalev Shaer, Stav Yagev, Yossi Asher, Sunit Sivasankaran, Yifan Gong, Min Tang, Huaming Wang, Eyal Krupka

The challenge focuses on distant speaker diarization and automatic speech recognition (DASR) in far-field meeting scenarios, with single-channel and known-geometry multi-channel tracks, and serves as a launch platform for two new datasets: First, a benchmarking dataset of 315 meetings, averaging 6 minutes each, capturing a broad spectrum of real-world acoustic conditions and conversational dynamics.

Automatic Speech Recognition Benchmarking +4

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.