Search Results for author: Xilin Jiang

Found 8 papers, 4 papers with code

Dual-path Mamba: Short and Long-term Bidirectional Selective Structured State Space Models for Speech Separation

1 code implementation • 27 Mar 2024 • Xilin Jiang, Cong Han, Nima Mesgarani

In this work, we replace transformers with Mamba, a selective state space model, for speech separation.

Paper
Code

Listen, Chat, and Edit: Text-Guided Soundscape Modification for Enhanced Auditory Experience

no code implementations • 6 Feb 2024 • Xilin Jiang, Cong Han, Yinghao Aaron Li, Nima Mesgarani

In daily life, we encounter a variety of sounds, both desirable and undesirable, with limited control over their presence and volume.

Language Modelling Large Language Model

Paper
Add Code

Exploring Self-Supervised Contrastive Learning of Spatial Sound Event Representation

no code implementations • 27 Sep 2023 • Xilin Jiang, Cong Han, Yinghao Aaron Li, Nima Mesgarani

In this study, we present a simple multi-channel framework for contrastive learning (MC-SimCLR) to encode 'what' and 'where' of spatial audios.

Contrastive Learning Data Augmentation

Paper
Add Code

HiFTNet: A Fast High-Quality Neural Vocoder with Harmonic-plus-Noise Filter and Inverse Short Time Fourier Transform

no code implementations • 18 Sep 2023 • Yinghao Aaron Li, Cong Han, Xilin Jiang, Nima Mesgarani

Subjective evaluations on LJSpeech show that our model significantly outperforms both iSTFTNet and HiFi-GAN, achieving ground-truth-level performance.

Speech Synthesis

Paper
Add Code

DeCoR: Defy Knowledge Forgetting by Predicting Earlier Audio Codes

no code implementations • 29 May 2023 • Xilin Jiang, Yinghao Aaron Li, Nima Mesgarani

Lifelong audio feature extraction involves learning new sound classes incrementally, which is essential for adapting to new data distributions over time.

Acoustic Scene Classification Continual Learning +3

Paper
Add Code

Phoneme-Level BERT for Enhanced Prosody of Text-to-Speech with Grapheme Predictions

2 code implementations • 20 Jan 2023 • Yinghao Aaron Li, Cong Han, Xilin Jiang, Nima Mesgarani

Large-scale pre-trained language models have been shown to be helpful in improving the naturalness of text-to-speech (TTS) models by enabling them to produce more naturalistic prosodic patterns.

4,048

Paper
Code

Learning Representations for New Sound Classes With Continual Self-Supervised Learning

1 code implementation • 15 May 2022 • Zhepei Wang, Cem Subakan, Xilin Jiang, Junkai Wu, Efthymios Tzinis, Mirco Ravanelli, Paris Smaragdis

In this paper, we work on a sound recognition system that continually incorporates new sound classes.

Continual Learning Representation Learning +1

Paper
Code

Compute and memory efficient universal sound source separation

3 code implementations • 3 Mar 2021 • Efthymios Tzinis, Zhepei Wang, Xilin Jiang, Paris Smaragdis

Recent progress in audio source separation lead by deep learning has enabled many neural network models to provide robust solutions to this fundamental estimation problem.

Ranked #4 on Speech Separation on WHAMR!

Audio Source Separation Efficient Neural Network +1

299

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.