Search Results for author: Zhen-Hua Ling

Found 77 papers, 40 papers with code

Conversation- and Tree-Structure Losses for Dialogue Disentanglement

no code implementations dialdoc (ACL) 2022 Tianda Li, Jia-Chen Gu, Zhen-Hua Ling, Quan Liu

When multiple conversations occur simultaneously, a listener must decide which conversation each utterance is part of in order to interpret and respond to it appropriately.

Disentanglement

Multiscale Matching Driven by Cross-Modal Similarity Consistency for Audio-Text Retrieval

no code implementations15 Mar 2024 Qian Wang, Jia-Chen Gu, Zhen-Hua Ling

Audio-text retrieval (ATR), which retrieves a relevant caption given an audio clip (A2T) and vice versa (T2A), has recently attracted much research attention.

AudioCaps Contrastive Learning +2

Neighboring Perturbations of Knowledge Editing on Large Language Models

1 code implementation31 Jan 2024 Jun-Yu Ma, Jia-Chen Gu, Ningyu Zhang, Zhen-Hua Ling

Despite their exceptional capabilities, large language models (LLMs) are prone to generating unintended text due to false or outdated knowledge.

knowledge editing

Corrective Retrieval Augmented Generation

1 code implementation29 Jan 2024 Shi-Qi Yan, Jia-Chen Gu, Yun Zhu, Zhen-Hua Ling

Experiments on four datasets covering short- and long-form generation tasks show that CRAG can significantly improve the performance of RAG-based approaches.

Retrieval

Towards High-Quality and Efficient Speech Bandwidth Extension with Parallel Amplitude and Phase Prediction

no code implementations12 Jan 2024 Ye-Xin Lu, Yang Ai, Hui-Peng Du, Zhen-Hua Ling

Speech bandwidth extension (BWE) refers to widening the frequency bandwidth range of speech signals, enhancing the speech quality towards brighter and fuller.

Bandwidth Extension Generative Adversarial Network

Model Editing Can Hurt General Abilities of Large Language Models

1 code implementation9 Jan 2024 Jia-Chen Gu, Hao-Xiang Xu, Jun-Yu Ma, Pan Lu, Zhen-Hua Ling, Kai-Wei Chang, Nanyun Peng

One critical challenge that has emerged is the presence of hallucinations in the output of large language models (LLMs) due to false or outdated knowledge.

Model Editing Question Answering

MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding

no code implementations8 Dec 2023 Hongjie Zhang, Yi Liu, Lu Dong, Yifei HUANG, Zhen-Hua Ling, Yali Wang, LiMin Wang, Yu Qiao

While several long-form VideoQA datasets have been introduced, the length of both videos used to curate questions and sub-clips of clues leveraged to answer those questions have not yet reached the criteria for genuine long-form video understanding.

Question Answering Video Question Answering +1

Sparsity-Driven EEG Channel Selection for Brain-Assisted Speech Enhancement

no code implementations22 Nov 2023 Jie Zhang, Qing-Tian Xu, Zhen-Hua Ling

In this work, we therefore propose a novel end-to-end brain-assisted speech enhancement network (BASEN), which incorporates the listeners' EEG signals and adopts a temporal convolutional network together with a convolutional multi-layer cross attention module to fuse EEG-audio features.

EEG Electroencephalogram (EEG) +1

APNet2: High-quality and High-efficiency Neural Vocoder with Direct Prediction of Amplitude and Phase Spectra

1 code implementation20 Nov 2023 Hui-Peng Du, Ye-Xin Lu, Yang Ai, Zhen-Hua Ling

APNet demonstrates the capability to generate synthesized speech of comparable quality to the HiFi-GAN vocoder but with a considerably improved inference speed.

Speech Synthesis

Is ChatGPT a Good Multi-Party Conversation Solver?

1 code implementation25 Oct 2023 Chao-Hong Tan, Jia-Chen Gu, Zhen-Hua Ling

Large Language Models (LLMs) have emerged as influential instruments within the realm of natural language processing; nevertheless, their capacity to handle multi-party conversations (MPCs) -- a scenario marked by the presence of multiple interlocutors involved in intricate information exchanges -- remains uncharted.

Zero-Shot Learning

Untying the Reversal Curse via Bidirectional Language Model Editing

1 code implementation16 Oct 2023 Jun-Yu Ma, Jia-Chen Gu, Zhen-Hua Ling, Quan Liu, Cong Liu

A new evaluation metric of reversibility is introduced, and a benchmark dubbed as Bidirectional Assessment for Knowledge Editing (BAKE) is constructed to evaluate the reversibility of edited models in recalling knowledge in the reverse direction of editing.

knowledge editing Language Modelling +1

Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement

no code implementations19 Sep 2023 Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling

Specifically, we guide an audio-lip speech enhancement student model to learn from a pre-trained audio-lip-tongue speech enhancement teacher model, thus transferring tongue-related knowledge.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Face-Driven Zero-Shot Voice Conversion with Memory-based Face-Voice Alignment

no code implementations18 Sep 2023 Zheng-Yan Sheng, Yang Ai, Yan-Nian Chen, Zhen-Hua Ling

This paper presents a novel task, zero-shot voice conversion based on face images (zero-shot FaceVC), which aims at converting the voice characteristics of an utterance from any source speaker to a newly coming target speaker, solely relying on a single face image of the target speaker.

Voice Conversion

Incorporating Ultrasound Tongue Images for Audio-Visual Speech Enhancement through Knowledge Distillation

no code implementations24 May 2023 Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling

Audio-visual speech enhancement (AV-SE) aims to enhance degraded speech along with extra visual information such as lip videos, and has been shown to be more effective than audio-only speech enhancement.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

MP-SENet: A Speech Enhancement Model with Parallel Denoising of Magnitude and Phase Spectra

1 code implementation23 May 2023 Ye-Xin Lu, Yang Ai, Zhen-Hua Ling

This paper proposes MP-SENet, a novel Speech Enhancement Network which directly denoises Magnitude and Phase spectra in parallel.

Denoising Speech Enhancement

MADNet: Maximizing Addressee Deduction Expectation for Multi-Party Conversation Generation

1 code implementation22 May 2023 Jia-Chen Gu, Chao-Hong Tan, Caiyuan Chu, Zhen-Hua Ling, Chongyang Tao, Quan Liu, Cong Liu

Given an MPC with a few addressee labels missing, existing methods fail to build a consecutively connected conversation graph, but only a few separate conversation fragments instead.

SHINE: Syntax-augmented Hierarchical Interactive Encoder for Zero-shot Cross-lingual Information Extraction

no code implementations21 May 2023 Jun-Yu Ma, Jia-Chen Gu, Zhen-Hua Ling, Quan Liu, Cong Liu, Guoping Hu

The proposed encoder is capable of interactively capturing complementary information between features and contextual information, to derive language-agnostic representations for various IE tasks.

DiffuSIA: A Spiral Interaction Architecture for Encoder-Decoder Text Diffusion

no code implementations19 May 2023 Chao-Hong Tan, Jia-Chen Gu, Zhen-Hua Ling

In fact, the encoder-decoder architecture is naturally more flexible for its detachable encoder and decoder modules, which is extensible to multilingual and multimodal generation tasks for conditions and target texts.

Conditional Text Generation Dialogue Generation +4

BASEN: Time-Domain Brain-Assisted Speech Enhancement Network with Convolutional Cross Attention in Multi-talker Conditions

1 code implementation17 May 2023 Jie Zhang, Qing-Tian Xu, Qiu-Shi Zhu, Zhen-Hua Ling

In this paper, we thus propose a novel time-domain brain-assisted SE network (BASEN) incorporating electroencephalography (EEG) signals recorded from the listener for extracting the target speaker from monaural speech mixtures.

EEG Speech Enhancement

GIFT: Graph-Induced Fine-Tuning for Multi-Party Conversation Understanding

1 code implementation16 May 2023 Jia-Chen Gu, Zhen-Hua Ling, Quan Liu, Cong Liu, Guoping Hu

Addressing the issues of who saying what to whom in multi-party conversations (MPCs) has recently attracted a lot of research attention.

Speaker Identification

Zero-shot personalized lip-to-speech synthesis with face image based voice control

no code implementations9 May 2023 Zheng-Yan Sheng, Yang Ai, Zhen-Hua Ling

In this paper, we propose a zero-shot personalized Lip2Speech synthesis method, in which face images control speaker identities.

Lip to Speech Synthesis Representation Learning +1

Source-Filter-Based Generative Adversarial Neural Vocoder for High Fidelity Speech Synthesis

1 code implementation26 Apr 2023 Ye-Xin Lu, Yang Ai, Zhen-Hua Ling

This paper proposes a source-filter-based generative adversarial neural vocoder named SF-GAN, which achieves high-fidelity waveform generation from input acoustic features by introducing F0-based source excitation signals to a neural filter framework.

Speech Synthesis

Speech Reconstruction from Silent Tongue and Lip Articulation By Pseudo Target Generation and Domain Adversarial Training

no code implementations12 Apr 2023 Rui-Chen Zheng, Yang Ai, Zhen-Hua Ling

This paper studies the task of speech reconstruction from ultrasound tongue images and optical lip videos recorded in a silent speaking mode, where people only activate their intra-oral and extra-oral articulators without producing sound.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

WIDER & CLOSER: Mixture of Short-channel Distillers for Zero-shot Cross-lingual Named Entity Recognition

1 code implementation7 Dec 2022 Jun-Yu Ma, Beiduo Chen, Jia-Chen Gu, Zhen-Hua Ling, Wu Guo, Quan Liu, Zhigang Chen, Cong Liu

In this study, a mixture of short-channel distillers (MSD) method is proposed to fully interact the rich hierarchical information in the teacher model and to transfer knowledge to the student model sufficiently and efficiently.

Cross-Lingual NER Domain Adaptation +3

Self-Supervised Audio-Visual Speech Representations Learning By Multimodal Self-Distillation

no code implementations6 Dec 2022 Jing-Xuan Zhang, Genshun Wan, Zhen-Hua Ling, Jia Pan, Jianqing Gao, Cong Liu

AV2vec has a student and a teacher module, in which the student performs a masked latent feature regression task using the multimodal target features generated online by the teacher.

Language Modelling

Pronunciation Dictionary-Free Multilingual Speech Synthesis by Combining Unsupervised and Supervised Phonetic Representations

no code implementations2 Jun 2022 Chang Liu, Zhen-Hua Ling, Ling-Hui Chen

This paper proposes a multilingual speech synthesis method which combines unsupervised phonetic representations (UPR) and supervised phonetic representations (SPR) to avoid reliance on the pronunciation dictionaries of target languages.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

HeterMPC: A Heterogeneous Graph Neural Network for Response Generation in Multi-Party Conversations

1 code implementation ACL 2022 Jia-Chen Gu, Chao-Hong Tan, Chongyang Tao, Zhen-Hua Ling, Huang Hu, Xiubo Geng, Daxin Jiang

To address these challenges, we present HeterMPC, a heterogeneous graph-based neural network for response generation in MPCs which models the semantics of utterances and interlocutors simultaneously with two types of nodes in a graph.

Response Generation

USTC-NELSLIP at SemEval-2022 Task 11: Gazetteer-Adapted Integration Network for Multilingual Complex Named Entity Recognition

1 code implementation SemEval (NAACL) 2022 Beiduo Chen, Jun-Yu Ma, Jiajun Qi, Wu Guo, Zhen-Hua Ling, Quan Liu

The proposed method is applied to several state-of-the-art Transformer-based NER models with a gazetteer built from Wikidata, and shows great generalization ability across them.

named-entity-recognition Named Entity Recognition +1

Neural Grapheme-to-Phoneme Conversion with Pre-trained Grapheme Models

1 code implementation26 Jan 2022 Lu Dong, Zhi-Qiang Guo, Chao-Hong Tan, Ya-Jun Hu, Yuan Jiang, Zhen-Hua Ling

Neural network models have achieved state-of-the-art performance on grapheme-to-phoneme (G2P) conversion.

Language Modelling

Detecting Speaker Personas from Conversational Texts

1 code implementation EMNLP 2021 Jia-Chen Gu, Zhen-Hua Ling, Yu Wu, Quan Liu, Zhigang Chen, Xiaodan Zhu

This is a many-to-many semantic matching task because both contexts and personas in SPD are composed of multiple sentences.

MPC-BERT: A Pre-Trained Language Model for Multi-Party Conversation Understanding

1 code implementation ACL 2021 Jia-Chen Gu, Chongyang Tao, Zhen-Hua Ling, Can Xu, Xiubo Geng, Daxin Jiang

Recently, various neural models for multi-party conversation (MPC) have achieved impressive improvements on a variety of tasks such as addressee recognition, speaker identification and response prediction.

Language Modelling Speaker Identification

Partner Matters! An Empirical Study on Fusing Personas for Personalized Response Selection in Retrieval-Based Chatbots

1 code implementation19 May 2021 Jia-Chen Gu, Hui Liu, Zhen-Hua Ling, Quan Liu, Zhigang Chen, Xiaodan Zhu

Empirical studies on the Persona-Chat dataset show that the partner personas neglected in previous studies can improve the accuracy of response selection in the IMN- and BERT-based models.

Retrieval

Emotion-Regularized Conditional Variational Autoencoder for Emotional Response Generation

no code implementations18 Apr 2021 Yu-Ping Ruan, Zhen-Hua Ling

This paper presents an emotion-regularized conditional variational autoencoder (Emo-CVAE) model for generating emotional conversation responses.

Response Generation

Learning to Retrieve Entity-Aware Knowledge and Generate Responses with Copy Mechanism for Task-Oriented Dialogue Systems

1 code implementation22 Dec 2020 Chao-Hong Tan, Xiaoyu Yang, Zi'ou Zheng, Tianda Li, Yufei Feng, Jia-Chen Gu, Quan Liu, Dan Liu, Zhen-Hua Ling, Xiaodan Zhu

Task-oriented conversational modeling with unstructured knowledge access, as track 1 of the 9th Dialogue System Technology Challenges (DSTC 9), requests to build a system to generate response given dialogue history and knowledge access.

Response Generation Task-Oriented Dialogue Systems

Tracking Interaction States for Multi-Turn Text-to-SQL Semantic Parsing

1 code implementation9 Dec 2020 Run-Ze Wang, Zhen-Hua Ling, Jing-Bo Zhou, Yu Hu

The dynamic schema-state and SQL-state representations are then utilized to decode the SQL query corresponding to current utterance.

Semantic Parsing Text-To-SQL

Voice Conversion by Cascading Automatic Speech Recognition and Text-to-Speech Synthesis with Prosody Transfer

no code implementations3 Sep 2020 Jing-Xuan Zhang, Li-Juan Liu, Yan-Nian Chen, Ya-Jun Hu, Yuan Jiang, Zhen-Hua Ling, Li-Rong Dai

In this paper, we present a ASR-TTS method for voice conversion, which used iFLYTEK ASR engine to transcribe the source speech into text and a Transformer TTS model with WaveNet vocoder to synthesize the converted speech from the decoded text.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +4

Filtering before Iteratively Referring for Knowledge-Grounded Response Selection in Retrieval-Based Chatbots

1 code implementation Findings of the Association for Computational Linguistics 2020 Jia-Chen Gu, Zhen-Hua Ling, Quan Liu, Zhigang Chen, Xiaodan Zhu

The challenges of building knowledge-grounded retrieval-based chatbots lie in how to ground a conversation on its background knowledge and how to match response candidates with both context and knowledge simultaneously.

Retrieval

DialBERT: A Hierarchical Pre-Trained Model for Conversation Disentanglement

1 code implementation8 Apr 2020 Tianda Li, Jia-Chen Gu, Xiaodan Zhu, Quan Liu, Zhen-Hua Ling, Zhiming Su, Si Wei

Disentanglement is a problem in which multiple conversations occur in the same channel simultaneously, and the listener should decide which utterance is part of the conversation he will respond to.

Conversation Disentanglement Disentanglement

Align, Mask and Select: A Simple Method for Incorporating Commonsense Knowledge into Language Representation Models

no code implementations19 Aug 2019 Zhi-Xiu Ye, Qian Chen, Wen Wang, Zhen-Hua Ling

We also observe that fine-tuned models after the proposed pre-training approach maintain comparable performance on other NLP tasks, such as sentence classification and natural language inference tasks, compared to the original BERT models.

Common Sense Reasoning Natural Language Inference +3

Dually Interactive Matching Network for Personalized Response Selection in Retrieval-Based Chatbots

1 code implementation IJCNLP 2019 Jia-Chen Gu, Zhen-Hua Ling, Xiaodan Zhu, Quan Liu

Compared with previous persona fusion approaches which enhance the representation of a context by calculating its similarity with a given persona, the DIM model adopts a dual matching architecture, which performs interactive matching between responses and contexts and between responses and personas respectively for ranking response candidates.

Retrieval

Non-Parallel Sequence-to-Sequence Voice Conversion with Disentangled Linguistic and Speaker Representations

1 code implementation25 Jun 2019 Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai

In this method, disentangled linguistic and speaker representations are extracted from acoustic features, and voice conversion is achieved by preserving the linguistic representations of source utterances while replacing the speaker representations with the target ones.

Audio and Speech Processing Sound

Singing Voice Synthesis Using Deep Autoregressive Neural Networks for Acoustic Modeling

no code implementations21 Jun 2019 Yuan-Hao Yi, Yang Ai, Zhen-Hua Ling, Li-Rong Dai

This paper presents a method of using autoregressive neural networks for the acoustic modeling of singing voice synthesis (SVS).

Singing Voice Synthesis

Condition-Transforming Variational AutoEncoder for Conversation Response Generation

no code implementations24 Apr 2019 Yu-Ping Ruan, Zhen-Hua Ling, Quan Liu, Zhigang Chen, Nitin Indurkhya

This paper proposes a new model, called condition-transforming variational autoencoder (CTVAE), to improve the performance of conversation response generation using conditional variational autoencoders (CVAEs).

Response Generation

Exploring Unsupervised Pretraining and Sentence Structure Modelling for Winograd Schema Challenge

no code implementations22 Apr 2019 Yu-Ping Ruan, Xiaodan Zhu, Zhen-Hua Ling, Zhan Shi, Quan Liu, Si Wei

Winograd Schema Challenge (WSC) was proposed as an AI-hard problem in testing computers' intelligence on common sense representation and reasoning.

Common Sense Reasoning Sentence

Distant Supervision Relation Extraction with Intra-Bag and Inter-Bag Attentions

1 code implementation NAACL 2019 Zhi-Xiu Ye, Zhen-Hua Ling

This paper presents a neural relation extraction method to deal with the noisy training data generated by distant supervision.

Relation Relation Extraction +2

Promoting Diversity for End-to-End Conversation Response Generation

no code implementations27 Jan 2019 Yu-Ping Ruan, Zhen-Hua Ling, Quan Liu, Jia-Chen Gu, Xiaodan Zhu

At this stage, two different models are proposed, i. e., a variational generative (VariGen) model and a retrieval based (Retrieval) model.

Response Generation Retrieval

Learning latent representations for style control and transfer in end-to-end speech synthesis

2 code implementations11 Dec 2018 Ya-Jie Zhang, Shifeng Pan, Lei He, Zhen-Hua Ling

In this paper, we introduce the Variational Autoencoder (VAE) to an end-to-end speech synthesis model, to learn the latent representation of speaking styles in an unsupervised manner.

Speech Synthesis Style Transfer

Forward Attention in Sequence-to-sequence Acoustic Modelling for Speech Synthesis

no code implementations18 Jul 2018 Jing-Xuan Zhang, Zhen-Hua Ling, Li-Rong Dai

This paper proposes a forward attention method for the sequenceto- sequence acoustic modeling of speech synthesis.

Acoustic Modelling Speech Synthesis

Hybrid semi-Markov CRF for Neural Sequence Labeling

1 code implementation ACL 2018 Zhi-Xiu Ye, Zhen-Hua Ling

This paper proposes hybrid semi-Markov conditional random fields (SCRFs) for neural sequence labeling in natural language processing.

named-entity-recognition Named Entity Recognition +1

A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment

no code implementations23 Apr 2018 Tomi Kinnunen, Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Zhen-Hua Ling

As a supplement to subjective results for the 2018 Voice Conversion Challenge (VCC'18) data, we configure a standard constant-Q cepstral coefficient CM to quantify the extent of processing artifacts.

Benchmarking Speaker Verification +1

The Voice Conversion Challenge 2018: Promoting Development of Parallel and Nonparallel Methods

no code implementations12 Apr 2018 Jaime Lorenzo-Trueba, Junichi Yamagishi, Tomoki Toda, Daisuke Saito, Fernando Villavicencio, Tomi Kinnunen, Zhen-Hua Ling

We present the Voice Conversion Challenge 2018, designed as a follow up to the 2016 edition with the aim of providing a common framework for evaluating and comparing different state-of-the-art voice conversion (VC) systems.

Voice Conversion

A Sequential Neural Encoder with Latent Structured Description for Modeling Sentences

no code implementations15 Nov 2017 Yu-Ping Ruan, Qian Chen, Zhen-Hua Ling

The description layer utilizes modified LSTM units to process these chunk-level vectors in a recurrent manner and produces sequential encoding outputs.

Chunking Natural Language Inference +3

Neural Natural Language Inference Models Enhanced with External Knowledge

1 code implementation ACL 2018 Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Diana Inkpen, Si Wei

With the availability of large annotated data, it has recently become feasible to train complex models such as neural-network-based inference models, which have shown to achieve the state-of-the-art performance.

Natural Language Inference

Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference

2 code implementations WS 2017 Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang, Diana Inkpen

The RepEval 2017 Shared Task aims to evaluate natural language understanding models for sentence representation, in which a sentence is represented as a fixed-length vector with neural networks and the quality of the representation is tested with a natural language inference task.

Natural Language Inference Natural Language Understanding +1

Distraction-Based Neural Networks for Document Summarization

1 code implementation26 Oct 2016 Qian Chen, Xiaodan Zhu, Zhen-Hua Ling, Si Wei, Hui Jiang

Distributed representation learned with neural networks has recently shown to be effective in modeling natural languages at fine granularities such as words, phrases, and even sentences.

Document Summarization

Part-of-Speech Relevance Weights for Learning Word Embeddings

no code implementations24 Mar 2016 Quan Liu, Zhen-Hua Ling, Hui Jiang, Yu Hu

The model proposed in this paper paper jointly optimizes word vectors and the POS relevance matrices.

Learning Word Embeddings POS +2

Integrate Document Ranking Information into Confidence Measure Calculation for Spoken Term Detection

no code implementations7 Sep 2015 Quan Liu, Wu Guo, Zhen-Hua Ling

The confidence measure of each term occurrence is then re-estimated through linear interpolation with the calculated document ranking weight to improve its reliability by integrating document-level information.

Document Ranking

Cannot find the paper you are looking for? You can Submit a new open access paper.