no code implementations • dialdoc (ACL) 2022 • Tianda Li, Jia-Chen Gu, Zhen-Hua Ling, Quan Liu
When multiple conversations occur simultaneously, a listener must decide which conversation each utterance is part of in order to interpret and respond to it appropriately.
no code implementations • 22 Nov 2023 • Jie Zhang, Qing-Tian Xu, Zhen-Hua Ling
In this work, we therefore propose a novel end-to-end brain-assisted speech enhancement network (BASEN), which incorporates the listeners' EEG signals and adopts a temporal convolutional network together with a convolutional multi-layer cross attention module to fuse EEG-audio features.
1 code implementation • 20 Nov 2023 • Hui-Peng Du, Ye-Xin Lu, Yang Ai, Zhen-Hua Ling
APNet demonstrates the capability to generate synthesized speech of comparable quality to the HiFi-GAN vocoder but with a considerably improved inference speed.
1 code implementation • 25 Oct 2023 • Chao-Hong Tan, Jia-Chen Gu, Zhen-Hua Ling
Large Language Models (LLMs) have emerged as influential instruments within the realm of natural language processing; nevertheless, their capacity to handle multi-party conversations (MPCs) -- a scenario marked by the presence of multiple interlocutors involved in intricate information exchanges -- remains uncharted.
1 code implementation • 16 Oct 2023 • Jun-Yu Ma, Jia-Chen Gu, Zhen-Hua Ling, Quan Liu, Cong Liu
A new evaluation metric of reversibility is introduced, and a benchmark dubbed as Bidirectional Assessment for Knowledge Editing (BAKE) is constructed to evaluate the reversibility of edited models in recalling knowledge in the reverse direction of editing.
no code implementations • 19 Sep 2023 • Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling
Specifically, we guide an audio-lip speech enhancement student model to learn from a pre-trained audio-lip-tongue speech enhancement teacher model, thus transferring tongue-related knowledge.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
no code implementations • 18 Sep 2023 • Zheng-Yan Sheng, Yang Ai, Yan-Nian Chen, Zhen-Hua Ling
This paper presents a novel task, zero-shot voice conversion based on face images (zero-shot FaceVC), which aims at converting the voice characteristics of an utterance from any source speaker to a newly coming target speaker, solely relying on a single face image of the target speaker.
1 code implementation • 17 Aug 2023 • Ye-Xin Lu, Yang Ai, Zhen-Hua Ling
Phase information has a significant impact on speech perceptual quality and intelligibility.
no code implementations • 24 May 2023 • Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling
Audio-visual speech enhancement (AV-SE) aims to enhance degraded speech along with extra visual information such as lip videos, and has been shown to be more effective than audio-only speech enhancement.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+3
1 code implementation • 23 May 2023 • Ye-Xin Lu, Yang Ai, Zhen-Hua Ling
This paper proposes MP-SENet, a novel Speech Enhancement Network which directly denoises Magnitude and Phase spectra in parallel.
no code implementations • 22 May 2023 • Jia-Chen Gu, Chao-Hong Tan, Caiyuan Chu, Zhen-Hua Ling, Chongyang Tao, Quan Liu, Cong Liu
Given an MPC with a few addressee labels missing, existing methods fail to build a consecutively connected conversation graph, but only a few separate conversation fragments instead.
no code implementations • 21 May 2023 • Jun-Yu Ma, Jia-Chen Gu, Zhen-Hua Ling, Quan Liu, Cong Liu, Guoping Hu
The proposed encoder is capable of interactively capturing complementary information between features and contextual information, to derive language-agnostic representations for various IE tasks.
no code implementations • 19 May 2023 • Chao-Hong Tan, Jia-Chen Gu, Zhen-Hua Ling
In fact, the encoder-decoder architecture is naturally more flexible for its detachable encoder and decoder modules, which is extensible to multilingual and multimodal generation tasks for conditions and target texts.
1 code implementation • 17 May 2023 • Jie Zhang, Qing-Tian Xu, Qiu-Shi Zhu, Zhen-Hua Ling
In this paper, we thus propose a novel time-domain brain-assisted SE network (BASEN) incorporating electroencephalography (EEG) signals recorded from the listener for extracting the target speaker from monaural speech mixtures.
1 code implementation • 16 May 2023 • Jia-Chen Gu, Zhen-Hua Ling, Quan Liu, Cong Liu, Guoping Hu
Addressing the issues of who saying what to whom in multi-party conversations (MPCs) has recently attracted a lot of research attention.
no code implementations • 9 May 2023 • Zheng-Yan Sheng, Yang Ai, Zhen-Hua Ling
In this paper, we propose a zero-shot personalized Lip2Speech synthesis method, in which face images control speaker identities.
no code implementations • 4 May 2023 • Jun-Yu Ma, Jia-Chen Gu, Jiajun Qi, Zhen-Hua Ling, Quan Liu, Xiaoyi Zhao
A method named Statistical Construction and Dual Adaptation of Gazetteer (SCDAG) is proposed for Multilingual Complex NER.
1 code implementation • 26 Apr 2023 • Ye-Xin Lu, Yang Ai, Zhen-Hua Ling
This paper proposes a source-filter-based generative adversarial neural vocoder named SF-GAN, which achieves high-fidelity waveform generation from input acoustic features by introducing F0-based source excitation signals to a neural filter framework.
no code implementations • 12 Apr 2023 • Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling
This paper studies the task of speech reconstruction from ultrasound tongue images and optical lip videos recorded in a silent speaking mode, where people only activate their intra-oral and extra-oral articulators without producing sound.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1
1 code implementation • 7 Dec 2022 • Jun-Yu Ma, Beiduo Chen, Jia-Chen Gu, Zhen-Hua Ling, Wu Guo, Quan Liu, Zhigang Chen, Cong Liu
In this study, a mixture of short-channel distillers (MSD) method is proposed to fully interact the rich hierarchical information in the teacher model and to transfer knowledge to the student model sufficiently and efficiently.
no code implementations • 6 Dec 2022 • Jing-Xuan Zhang, Genshun Wan, Zhen-Hua Ling, Jia Pan, Jianqing Gao, Cong Liu
AV2vec has a student and a teacher module, in which the student performs a masked latent feature regression task using the multimodal target features generated online by the teacher.
no code implementations • 2 Jun 2022 • Chang Liu, Zhen-Hua Ling, Ling-Hui Chen
This paper proposes a multilingual speech synthesis method which combines unsupervised phonetic representations (UPR) and supervised phonetic representations (SPR) to avoid reliance on the pronunciation dictionaries of target languages.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
1 code implementation • ACL 2022 • Jia-Chen Gu, Chao-Hong Tan, Chongyang Tao, Zhen-Hua Ling, Huang Hu, Xiubo Geng, Daxin Jiang
To address these challenges, we present HeterMPC, a heterogeneous graph-based neural network for response generation in MPCs which models the semantics of utterances and interlocutors simultaneously with two types of nodes in a graph.
1 code implementation • Findings (ACL) 2022 • Chao-Hong Tan, Jia-Chen Gu, Chongyang Tao, Zhen-Hua Ling, Can Xu, Huang Hu, Xiubo Geng, Daxin Jiang
To address the problem, we propose augmenting TExt Generation via Task-specific and Open-world Knowledge (TegTok) in a unified framework.
1 code implementation • SemEval (NAACL) 2022 • Beiduo Chen, Jun-Yu Ma, Jiajun Qi, Wu Guo, Zhen-Hua Ling, Quan Liu
The proposed method is applied to several state-of-the-art Transformer-based NER models with a gazetteer built from Wikidata, and shows great generalization ability across them.
1 code implementation • 26 Jan 2022 • Lu Dong, Zhi-Qiang Guo, Chao-Hong Tan, Ya-Jun Hu, Yuan Jiang, Zhen-Hua Ling
Neural network models have achieved state-of-the-art performance on grapheme-to-phoneme (G2P) conversion.
1 code implementation • ICLR 2022 • Chao-Hong Tan, Qian Chen, Wen Wang, Qinglin Zhang, Siqi Zheng, Zhen-Hua Ling
We propose a novel Pooling Network (PoNet) for token mixing in long sequences with linear complexity.
1 code implementation • EMNLP 2021 • Jia-Chen Gu, Zhen-Hua Ling, Yu Wu, Quan Liu, Zhigang Chen, Xiaodan Zhu
This is a many-to-many semantic matching task because both contexts and personas in SPD are composed of multiple sentences.
1 code implementation • ACL 2021 • Jia-Chen Gu, Chongyang Tao, Zhen-Hua Ling, Can Xu, Xiubo Geng, Daxin Jiang
Recently, various neural models for multi-party conversation (MPC) have achieved impressive improvements on a variety of tasks such as addressee recognition, speaker identification and response prediction.
1 code implementation • 19 May 2021 • Jia-Chen Gu, Hui Liu, Zhen-Hua Ling, Quan Liu, Zhigang Chen, Xiaodan Zhu
Empirical studies on the Persona-Chat dataset show that the partner personas neglected in previous studies can improve the accuracy of response selection in the IMN- and BERT-based models.
no code implementations • 18 Apr 2021 • Yu-Ping Ruan, Zhen-Hua Ling
This paper presents an emotion-regularized conditional variational autoencoder (Emo-CVAE) model for generating emotional conversation responses.
1 code implementation • 22 Dec 2020 • Chao-Hong Tan, Xiaoyu Yang, Zi'ou Zheng, Tianda Li, Yufei Feng, Jia-Chen Gu, Quan Liu, Dan Liu, Zhen-Hua Ling, Xiaodan Zhu
Task-oriented conversational modeling with unstructured knowledge access, as track 1 of the 9th Dialogue System Technology Challenges (DSTC 9), requests to build a system to generate response given dialogue history and knowledge access.
1 code implementation • 9 Dec 2020 • Run-Ze Wang, Zhen-Hua Ling, Jing-Bo Zhou, Yu Hu
The dynamic schema-state and SQL-state representations are then utilized to decode the SQL query corresponding to current utterance.
no code implementations • 3 Sep 2020 • Jing-Xuan Zhang, Li-Juan Liu, Yan-Nian Chen, Ya-Jun Hu, Yuan Jiang, Zhen-Hua Ling, Li-Rong Dai
In this paper, we present a ASR-TTS method for voice conversion, which used iFLYTEK ASR engine to transcribe the source speech into text and a Transformer TTS model with WaveNet vocoder to synthesize the converted speech from the decoded text.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+4
1 code implementation • Findings of the Association for Computational Linguistics 2020 • Jia-Chen Gu, Zhen-Hua Ling, Quan Liu, Zhigang Chen, Xiaodan Zhu
The challenges of building knowledge-grounded retrieval-based chatbots lie in how to ground a conversation on its background knowledge and how to match response candidates with both context and knowledge simultaneously.
1 code implementation • 8 Apr 2020 • Tianda Li, Jia-Chen Gu, Xiaodan Zhu, Quan Liu, Zhen-Hua Ling, Zhiming Su, Si Wei
Disentanglement is a problem in which multiple conversations occur in the same channel simultaneously, and the listener should decide which utterance is part of the conversation he will respond to.
2 code implementations • 7 Apr 2020 • Jia-Chen Gu, Tianda Li, Quan Liu, Zhen-Hua Ling, Zhiming Su, Si Wei, Xiaodan Zhu
In this paper, we study the problem of employing pre-trained language models for multi-turn response selection in retrieval-based chatbots.
no code implementations • 4 Apr 2020 • Jia-Chen Gu, Tianda Li, Quan Liu, Xiaodan Zhu, Zhen-Hua Ling, Yu-Ping Ruan
The NOESIS II challenge, as the Track 2 of the 8th Dialogue System Technology Challenges (DSTC 8), is the extension of DSTC 7.
Ranked #1 on
Conversation Disentanglement
on irc-disentanglement
no code implementations • 1 Feb 2020 • Yu-Ping Ruan, Zhen-Hua Ling, Jia-Chen Gu, Quan Liu
We present our work on Track 4 in the Dialogue System Technology Challenges 8 (DSTC8).
1 code implementation • 16 Nov 2019 • Jia-Chen Gu, Zhen-Hua Ling, Quan Liu
The distances between context and response utterances are employed as a prior component when calculating the attention weights.
Ranked #7 on
Conversational Response Selection
on E-commerce
no code implementations • 5 Nov 2019 • Xin Wang, Junichi Yamagishi, Massimiliano Todisco, Hector Delgado, Andreas Nautsch, Nicholas Evans, Md Sahidullah, Ville Vestman, Tomi Kinnunen, Kong Aik Lee, Lauri Juvela, Paavo Alku, Yu-Huai Peng, Hsin-Te Hwang, Yu Tsao, Hsin-Min Wang, Sebastien Le Maguer, Markus Becker, Fergus Henderson, Rob Clark, Yu Zhang, Quan Wang, Ye Jia, Kai Onuma, Koji Mushika, Takashi Kaneda, Yuan Jiang, Li-Juan Liu, Yi-Chiao Wu, Wen-Chin Huang, Tomoki Toda, Kou Tanaka, Hirokazu Kameoka, Ingmar Steiner, Driss Matrouf, Jean-Francois Bonastre, Avashna Govender, Srikanth Ronanki, Jing-Xuan Zhang, Zhen-Hua Ling
Spoofing attacks within a logical access (LA) scenario are generated with the latest speech synthesis and voice conversion technologies, including state-of-the-art neural acoustic and waveform model techniques.
no code implementations • 19 Aug 2019 • Zhi-Xiu Ye, Qian Chen, Wen Wang, Zhen-Hua Ling
We also observe that fine-tuned models after the proposed pre-training approach maintain comparable performance on other NLP tasks, such as sentence classification and natural language inference tasks, compared to the original BERT models.
Ranked #21 on
Common Sense Reasoning
on CommonsenseQA
1 code implementation • IJCNLP 2019 • Jia-Chen Gu, Zhen-Hua Ling, Xiaodan Zhu, Quan Liu
Compared with previous persona fusion approaches which enhance the representation of a context by calculating its similarity with a given persona, the DIM model adopts a dual matching architecture, which performs interactive matching between responses and contexts and between responses and personas respectively for ranking response candidates.
1 code implementation • 25 Jun 2019 • Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai
In this method, disentangled linguistic and speaker representations are extracted from acoustic features, and voice conversion is achieved by preserving the linguistic representations of source utterances while replacing the speaker representations with the target ones.
Audio and Speech Processing Sound
no code implementations • 21 Jun 2019 • Yuan-Hao Yi, Yang Ai, Zhen-Hua Ling, Li-Rong Dai
This paper presents a method of using autoregressive neural networks for the acoustic modeling of singing voice synthesis (SVS).
1 code implementation • ACL 2019 • Zhi-Xiu Ye, Zhen-Hua Ling
This paper presents a multi-level matching and aggregation network (MLMAN) for few-shot relation classification.
no code implementations • 24 Apr 2019 • Yu-Ping Ruan, Zhen-Hua Ling, Quan Liu, Zhigang Chen, Nitin Indurkhya
This paper proposes a new model, called condition-transforming variational autoencoder (CTVAE), to improve the performance of conversation response generation using conditional variational autoencoders (CVAEs).
no code implementations • 22 Apr 2019 • Yu-Ping Ruan, Xiaodan Zhu, Zhen-Hua Ling, Zhan Shi, Quan Liu, Si Wei
Winograd Schema Challenge (WSC) was proposed as an AI-hard problem in testing computers' intelligence on common sense representation and reasoning.
1 code implementation • NAACL 2019 • Zhi-Xiu Ye, Zhen-Hua Ling
This paper presents a neural relation extraction method to deal with the noisy training data generated by distant supervision.
no code implementations • 27 Jan 2019 • Yu-Ping Ruan, Zhen-Hua Ling, Quan Liu, Jia-Chen Gu, Xiaodan Zhu
At this stage, two different models are proposed, i. e., a variational generative (VariGen) model and a retrieval based (Retrieval) model.
1 code implementation • 7 Jan 2019 • Jia-Chen Gu, Zhen-Hua Ling, Quan Liu
In this paper, we propose an interactive matching network (IMN) for the multi-turn response selection task.
Ranked #6 on
Conversational Response Selection
on E-commerce
2 code implementations • 11 Dec 2018 • Ya-Jie Zhang, Shifeng Pan, Lei He, Zhen-Hua Ling
In this paper, we introduce the Variational Autoencoder (VAE) to an end-to-end speech synthesis model, to learn the latent representation of speaking styles in an unsupervised manner.
1 code implementation • 3 Dec 2018 • Jia-Chen Gu, Zhen-Hua Ling, Yu-Ping Ruan, Quan Liu
This paper presents an end-to-end response selection model for Track 1 of the 7th Dialogue System Technology Challenges (DSTC7).
Ranked #5 on
Conversational Response Selection
on DSTC7 Ubuntu
no code implementations • 18 Jul 2018 • Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai
This paper proposes a forward attention method for the sequenceto- sequence acoustic modeling of speech synthesis.
1 code implementation • COLING 2018 • Qian Chen, Zhen-Hua Ling, Xiaodan Zhu
This paper explores generalized pooling methods to enhance sentence embedding.
Ranked #10 on
Sentiment Analysis
on Yelp Fine-grained classification
1 code implementation • ACL 2018 • Zhi-Xiu Ye, Zhen-Hua Ling
This paper proposes hybrid semi-Markov conditional random fields (SCRFs) for neural sequence labeling in natural language processing.
Ranked #62 on
Named Entity Recognition (NER)
on CoNLL 2003 (English)
no code implementations • 23 Apr 2018 • Tomi Kinnunen, Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Zhen-Hua Ling
As a supplement to subjective results for the 2018 Voice Conversion Challenge (VCC'18) data, we configure a standard constant-Q cepstral coefficient CM to quantify the extent of processing artifacts.
no code implementations • 12 Apr 2018 • Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Tomi Kinnunen, Zhen-Hua Ling
We present the Voice Conversion Challenge 2018, designed as a follow up to the 2016 edition with the aim of providing a common framework for evaluating and comparing different state-of-the-art voice conversion (VC) systems.
no code implementations • ICLR 2018 • Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Diana Inkpen
Modeling informal inference in natural language is very challenging.
no code implementations • 15 Nov 2017 • Yu-Ping Ruan, Qian Chen, Zhen-Hua Ling
The description layer utilizes modified LSTM units to process these chunk-level vectors in a recurrent manner and produces sequential encoding outputs.
1 code implementation • ACL 2018 • Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Diana Inkpen, Si Wei
With the availability of large annotated data, it has recently become feasible to train complex models such as neural-network-based inference models, which have shown to achieve the state-of-the-art performance.
Ranked #20 on
Natural Language Inference
on SNLI
2 code implementations • WS 2017 • Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, Diana Inkpen
The RepEval 2017 Shared Task aims to evaluate natural language understanding models for sentence representation, in which a sentence is represented as a fixed-length vector with neural networks and the quality of the representation is tested with a natural language inference task.
Ranked #71 on
Natural Language Inference
on SNLI
Natural Language Inference
Natural Language Understanding
+1
no code implementations • 13 Nov 2016 • Quan Liu, Hui Jiang, Zhen-Hua Ling, Xiaodan Zhu, Si Wei, Yu Hu
The PDP task we investigate in this paper is a complex coreference resolution task which requires the utilization of commonsense knowledge.
1 code implementation • 26 Oct 2016 • Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang
Distributed representation learned with neural networks has recently shown to be effective in modeling natural languages at fine granularities such as words, phrases, and even sentences.
11 code implementations • ACL 2017 • Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, Diana Inkpen
Reasoning and inference are central to human and artificial intelligence.
Ranked #30 on
Natural Language Inference
on SNLI
no code implementations • 24 Mar 2016 • Quan Liu, Zhen-Hua Ling, Hui Jiang, Yu Hu
The model proposed in this paper paper jointly optimizes word vectors and the POS relevance matrices.
no code implementations • 24 Mar 2016 • Quan Liu, Hui Jiang, Andrew Evdokimov, Zhen-Hua Ling, Xiaodan Zhu, Si Wei, Yu Hu
We propose to use neural networks to model association between any two events in a domain.
no code implementations • 7 Sep 2015 • Quan Liu, Wu Guo, Zhen-Hua Ling
The confidence measure of each term occurrence is then re-estimated through linear interpolation with the calculated document ranking weight to improve its reliability by integrating document-level information.