Search Results for author: Muqiao Yang

Found 18 papers, 8 papers with code

Evaluating and Improving Continual Learning in Spoken Language Understanding

no code implementations16 Feb 2024 Muqiao Yang, Xiang Li, Umberto Cappellazzo, Shinji Watanabe, Bhiksha Raj

In this work, we propose an evaluation methodology that provides a unified evaluation on stability, plasticity, and generalizability in continual learning.

Continual Learning Spoken Language Understanding

Continual Contrastive Spoken Language Understanding

no code implementations4 Oct 2023 Umberto Cappellazzo, Enrico Fini, Muqiao Yang, Daniele Falavigna, Alessio Brutti, Bhiksha Raj

In this paper, we investigate the problem of learning sequence-to-sequence models for spoken language understanding in a class-incremental learning (CIL) setting and we propose COCONUT, a CIL method that relies on the combination of experience replay and contrastive learning.

Class Incremental Learning Contrastive Learning +2

uSee: Unified Speech Enhancement and Editing with Conditional Diffusion Models

no code implementations2 Oct 2023 Muqiao Yang, Chunlei Zhang, Yong Xu, Zhongweiyang Xu, Heming Wang, Bhiksha Raj, Dong Yu

Speech enhancement aims to improve the quality of speech signals in terms of quality and intelligibility, and speech editing refers to the process of editing the speech according to specific user needs.

Denoising Self-Supervised Learning +2

Rethinking Voice-Face Correlation: A Geometry View

no code implementations26 Jul 2023 Xiang Li, Yandong Wen, Muqiao Yang, Jinglu Wang, Rita Singh, Bhiksha Raj

Previous works on voice-face matching and voice-guided face synthesis demonstrate strong correlations between voice and face, but mainly rely on coarse semantic cues such as gender, age, and emotion.

3D Face Reconstruction Face Generation

Sequence-Level Knowledge Distillation for Class-Incremental End-to-End Spoken Language Understanding

1 code implementation23 May 2023 Umberto Cappellazzo, Muqiao Yang, Daniele Falavigna, Alessio Brutti

The ability to learn new concepts sequentially is a major weakness for modern neural networks, which hinders their use in non-stationary environments.

Continual Learning Knowledge Distillation +1

Backdoor Attacks with Input-unique Triggers in NLP

no code implementations25 Mar 2023 Xukun Zhou, Jiwei Li, Tianwei Zhang, Lingjuan Lyu, Muqiao Yang, Jun He

Backdoor attack aims at inducing neural models to make incorrect predictions for poison data while keeping predictions on the clean dataset unchanged, which creates a considerable threat to current natural language processing (NLP) systems.

Backdoor Attack Language Modelling +1

PAAPLoss: A Phonetic-Aligned Acoustic Parameter Loss for Speech Enhancement

2 code implementations16 Feb 2023 Muqiao Yang, Joseph Konan, David Bick, Yunyang Zeng, Shuo Han, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

We can add this criterion as an auxiliary loss to any model that produces speech, to optimize speech outputs to match the values of clean speech in these features.

Speech Enhancement Time Series +1

Simulating realistic speech overlaps improves multi-talker ASR

no code implementations27 Oct 2022 Muqiao Yang, Naoyuki Kanda, Xiaofei Wang, Jian Wu, Sunit Sivasankaran, Zhuo Chen, Jinyu Li, Takuya Yoshioka

Multi-talker automatic speech recognition (ASR) has been studied to generate transcriptions of natural conversation including overlapping speech of multiple speakers.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Online Continual Learning of End-to-End Speech Recognition Models

no code implementations11 Jul 2022 Muqiao Yang, Ian Lane, Shinji Watanabe

Continual Learning, also known as Lifelong Learning, aims to continually learn from new data as it becomes available.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +3

Improving Speech Enhancement through Fine-Grained Speech Characteristics

1 code implementation1 Jul 2022 Muqiao Yang, Joseph Konan, David Bick, Anurag Kumar, Shinji Watanabe, Bhiksha Raj

We first identify key acoustic parameters that have been found to correlate well with voice quality (e. g. jitter, shimmer, and spectral flux) and then propose objective functions which are aimed at reducing the difference between clean speech and enhanced speech with respect to these features.

Speech Enhancement

M2Lens: Visualizing and Explaining Multimodal Models for Sentiment Analysis

no code implementations17 Jul 2021 Xingbo Wang, Jianben He, Zhihua Jin, Muqiao Yang, Yong Wang, Huamin Qu

Much research focuses on modeling the complex intra- and inter-modal interactions between different communication channels.

Multimodal Sentiment Analysis

Signal Transformer: Complex-valued Attention and Meta-Learning for Signal Recognition

no code implementations5 Jun 2021 Yihong Dong, Ying Peng, Muqiao Yang, Songtao Lu, Qingjiang Shi

Deep neural networks have been shown as a class of useful tools for addressing signal recognition issues in recent years, especially for identifying the nonlinear feature structures of signals.

Meta-Learning Time Series +1

Self-supervised Representation Learning with Relative Predictive Coding

1 code implementation ICLR 2021 Yao-Hung Hubert Tsai, Martin Q. Ma, Muqiao Yang, Han Zhao, Louis-Philippe Morency, Ruslan Salakhutdinov

This paper introduces Relative Predictive Coding (RPC), a new contrastive representation learning objective that maintains a good balance among training stability, minibatch size sensitivity, and downstream task performance.

Representation Learning Self-Supervised Learning

Complex Transformer: A Framework for Modeling Complex-Valued Sequence

1 code implementation22 Oct 2019 Muqiao Yang, Martin Q. Ma, Dongyu Li, Yao-Hung Hubert Tsai, Ruslan Salakhutdinov

While deep learning has received a surge of interest in a variety of fields in recent years, major deep learning models barely use complex numbers.

Music Transcription

Cannot find the paper you are looking for? You can Submit a new open access paper.