Search Results for author: Xiangang Li

Found 30 papers, 7 papers with code

BEIKE NLP at SemEval-2022 Task 4: Prompt-Based Paragraph Classification for Patronizing and Condescending Language Detection

no code implementations SemEval (NAACL) 2022 Yong Deng, Chenxiao Dou, Liangyu Chen, Deqiang Miao, Xianghui Sun, Baochang Ma, Xiangang Li

PCL detection task is aimed at identifying and categorizing language that is patronizing or condescending towards vulnerable communities in the general media. Compared to other NLP tasks of paragraph classification, the negative language presented in the PCL detection task is usually more implicit and subtle to be recognized, making the performance of common text-classification approaches disappointed.

Classification Multi-Label Classification +2

Time Domain Adversarial Voice Conversion for ADD 2022

no code implementations19 Apr 2022 Cheng Wen, Tingwei Guo, Xingjun Tan, Rui Yan, Shuran Zhou, Chuandong Xie, Wei Zou, Xiangang Li

In this paper, we describe our speech generation system for the first Audio Deep Synthesis Detection Challenge (ADD 2022).

Voice Conversion

Audio Deep Fake Detection System with Neural Stitching for ADD 2022

no code implementations19 Apr 2022 Rui Yan, Cheng Wen, Shuran Zhou, Tingwei Guo, Wei Zou, Xiangang Li

This paper describes our best system and methodology for ADD 2022: The First Audio Deep Synthesis Detection Challenge\cite{Yi2022ADD}.

Voice Conversion

GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio

1 code implementation13 Jun 2021 Guoguo Chen, Shuzhou Chai, Guanbo Wang, Jiayu Du, Wei-Qiang Zhang, Chao Weng, Dan Su, Daniel Povey, Jan Trmal, Junbo Zhang, Mingjie Jin, Sanjeev Khudanpur, Shinji Watanabe, Shuaijiang Zhao, Wei Zou, Xiangang Li, Xuchen Yao, Yongqing Wang, Yujun Wang, Zhao You, Zhiyong Yan

This paper introduces GigaSpeech, an evolving, multi-domain English speech recognition corpus with 10, 000 hours of high quality labeled audio suitable for supervised training, and 40, 000 hours of total audio suitable for semi-supervised and unsupervised training.

speech-recognition Speech Recognition

Robust Speaker Extraction Network Based on Iterative Refined Adaptation

no code implementations4 Nov 2020 Chengyun Deng, Shiqian Ma, Yi Zhang, Yongtao Sha, HUI ZHANG, Hui Song, Xiangang Li

dataset confirm the superior performance of the proposed method over the network without IRA in terms of SI-SDR and PESQ improvement.

DiDiSpeech: A Large Scale Mandarin Speech Corpus

no code implementations19 Oct 2020 Tingwei Guo, Cheng Wen, Dongwei Jiang, Ne Luo, Ruixiong Zhang, Shuaijiang Zhao, Wubo Li, Cheng Gong, Wei Zou, Kun Han, Xiangang Li

This paper introduces a new open-sourced Mandarin speech corpus, called DiDiSpeech.

Audio and Speech Processing

On Loss Functions and Recurrency Training for GAN-based Speech Enhancement Systems

no code implementations29 Jul 2020 Zhuohuang Zhang, Chengyun Deng, Yi Shen, Donald S. Williamson, Yongtao Sha, Yi Zhang, Hui Song, Xiangang Li

Recent work has shown that it is feasible to use generative adversarial networks (GANs) for speech enhancement, however, these approaches have not been compared to state-of-the-art (SOTA) non GAN-based approaches.

Audio and Speech Processing Sound

A Further Study of Unsupervised Pre-training for Transformer Based Speech Recognition

1 code implementation20 May 2020 Dongwei Jiang, Wubo Li, Ruixiong Zhang, Miao Cao, Ne Luo, Yang Han, Wei Zou, Xiangang Li

In this paper, we conduct a further study on MPC and focus on three important aspects: the effect of pre-training data speaking style, its extension on streaming model, and how to better transfer learned knowledge from pre-training stage to downstream tasks.

speech-recognition Speech Recognition +2

Adversarial Multi-Binary Neural Network for Multi-class Classification

no code implementations25 Mar 2020 Haiyang Xu, Junwen Chen, Kun Han, Xiangang Li

Multi-class text classification is one of the key problems in machine learning and natural language processing.

Classification General Classification +4

Learning Syntactic and Dynamic Selective Encoding for Document Summarization

no code implementations25 Mar 2020 Haiyang Xu, Yahao He, Kun Han, Junwen Chen, Xiangang Li

Our approach has the following contributions: first, we incorporate syntactic information such as constituency parsing trees into the encoding sequence to learn both the semantic and syntactic information from the document, resulting in more accurate summary; second, we propose a dynamic gate network to select the salient information based on the context of the decoder state, which is essential to document summarization.

Constituency Parsing Document Summarization

Selective Attention Encoders by Syntactic Graph Convolutional Networks for Document Summarization

no code implementations18 Mar 2020 Haiyang Xu, Yun Wang, Kun Han, Baochang Ma, Junwen Chen, Xiangang Li

Abstractive text summarization is a challenging task, and one need to design a mechanism to effectively extract salient information from the source text and then generate a summary.

Abstractive Text Summarization Document Summarization

TCT: A Cross-supervised Learning Method for Multimodal Sequence Representation

no code implementations23 Oct 2019 Wubo Li, Wei Zou, Xiangang Li

Multimodalities provide promising performance than unimodality in most tasks.

Cross-task pre-training for on-device acoustic scene classification

no code implementations22 Oct 2019 Ruixiong Zhang, Wei Zou, Xiangang Li

To utilize the acoustic event information to improve the performance of ASC tasks, we present the cross-task pre-training mechanism which utilizes acoustic event information from the pre-trained AED model for ASC tasks.

Acoustic Scene Classification Classification +3

Learning Alignment for Multimodal Emotion Recognition from Speech

1 code implementation6 Sep 2019 Haiyang Xu, HUI ZHANG, Kun Han, Yun Wang, Yiping Peng, Xiangang Li

Further, emotion recognition will be beneficial from using audio-textual multimodal information, it is not trivial to build a system to learn from multimodality.

Multimodal Emotion Recognition Speech Emotion Recognition +2

Towards End-to-End Code-Switching Speech Recognition

no code implementations31 Oct 2018 Ne Luo, Dongwei Jiang, Shuaijiang Zhao, Caixia Gong, Wei Zou, Xiangang Li

Code-switching speech recognition has attracted an increasing interest recently, but the need for expert linguistic knowledge has always been a big issue.

Automatic Speech Recognition Language Identification +1

Implicit Discourse Relation Recognition using Neural Tensor Network with Interactive Attention and Sparse Learning

no code implementations COLING 2018 Fengyu Guo, Ruifang He, Di Jin, Jianwu Dang, Longbiao Wang, Xiangang Li

In this paper, we propose a novel neural Tensor network framework with Interactive Attention and Sparse Learning (TIASL) for implicit discourse relation recognition.

Sparse Learning Text Summarization

A comparable study of modeling units for end-to-end Mandarin speech recognition

no code implementations10 May 2018 Wei Zou, Dongwei Jiang, Shuaijiang Zhao, Xiangang Li

We find that all types of modeling units can achieve approximate character error rate (CER) in CTC model and the performance of Chinese character attention model is better than syllable attention model.

speech-recognition Speech Recognition

Deep Speaker: an End-to-End Neural Speaker Embedding System

15 code implementations5 May 2017 Chao Li, Xiaokong Ma, Bing Jiang, Xiangang Li, Xuewei Zhang, Xiao Liu, Ying Cao, Ajay Kannan, Zhenyao Zhu

We present Deep Speaker, a neural speaker embedding system that maps utterances to a hypersphere where speaker similarity is measured by cosine similarity.

Speaker Identification Speaker Recognition

Gram-CTC: Automatic Unit Selection and Target Decomposition for Sequence Labelling

no code implementations ICML 2017 Hairong Liu, Zhenyao Zhu, Xiangang Li, Sanjeev Satheesh

These methods suffer from two major drawbacks: 1) the set of basic units is fixed, such as the set of words, characters or phonemes in speech recognition, and 2) the decomposition of target sequences is fixed.

speech-recognition Speech Recognition

Long Short-Term Memory based Convolutional Recurrent Neural Networks for Large Vocabulary Speech Recognition

no code implementations11 Oct 2016 Xiangang Li, Xihong Wu

Long short-term memory (LSTM) recurrent neural networks (RNNs) have been shown to give state-of-the-art performance on many speech recognition tasks, as they are able to provide the learned dynamically changing contextual window of all sequence history.

speech-recognition Speech Recognition

Constructing Long Short-Term Memory based Deep Recurrent Neural Networks for Large Vocabulary Speech Recognition

no code implementations16 Oct 2014 Xiangang Li, Xihong Wu

Long short-term memory (LSTM) based acoustic modeling methods have recently been shown to give state-of-the-art performance on some speech recognition tasks.

speech-recognition Speech Recognition

Cannot find the paper you are looking for? You can Submit a new open access paper.