no code implementations • Findings (NAACL) 2022 • Bowen Yang, Cong Han, Yu Li, Lei Zuo, Zhou Yu
In this paper, we propose a simple yet effective architecture comprising a pre-trained language model (PLM) and an item metadata encoder to integrate the recommendation and the dialog generation better.
no code implementations • 3 Apr 2023 • Cong Han, Yujie Zhong, Dengjie Li, Kai Han, Lin Ma
Recently, the zero-shot semantic segmentation problem has attracted increasing attention, and the best performing methods are based on two-stream networks: one stream for proposal mask generation and the other for segment classification using a pre-trained visual-language model.
no code implementations • 13 Mar 2023 • Cong Han, Nima Mesgarani
Binaural speech separation in real-world scenarios often involves moving speakers.
no code implementations • 11 Feb 2023 • Cong Han, Vishal Choudhari, Yinghao Aaron Li, Nima Mesgarani
Auditory attention decoding (AAD) is a technique used to identify and amplify the talker that a listener is focused on in a noisy environment.
no code implementations • 20 Jan 2023 • Yinghao Aaron Li, Cong Han, Xilin Jiang, Nima Mesgarani
Large-scale pre-trained language models have been shown to be helpful in improving the naturalness of text-to-speech (TTS) models by enabling them to produce more naturalistic prosodic patterns.
1 code implementation • 29 Dec 2022 • Yinghao Aaron Li, Cong Han, Nima Mesgarani
Here, we propose a novel approach to learning disentangled speech representation by transfer learning from style-based text-to-speech (TTS) models.
1 code implementation • 17 Oct 2022 • Yuhong Li, Jiajie Li, Cong Han, Pan Li, JinJun Xiong, Deming Chen
(2) Efficient proxies are not extensible to multi-modality downstream tasks.
1 code implementation • 30 May 2022 • Yinghao Aaron Li, Cong Han, Nima Mesgarani
Text-to-Speech (TTS) has recently seen great progress in synthesizing high-quality speech owing to the rapid development of parallel TTS systems, but producing speech with naturalistic prosodic variations, speaking styles and emotional tones remains challenging.
no code implementations • 17 Feb 2022 • Cong Han, E. Merve Kaya, Kyle Hoefer, Malcolm Slaney, Simon Carlile
This work describes a speech denoising system for machine ears that aims to improve speech intelligibility and the overall listening experience in noisy environments.
1 code implementation • 15 Dec 2021 • Bowen Yang, Cong Han, Yu Li, Lei Zuo, Zhou Yu
The encoder learns to map item metadata to embeddings that can reflect the semantic information in the dialog context.
no code implementations • 23 Feb 2021 • Chenda Li, Zhuo Chen, Yi Luo, Cong Han, Tianyan Zhou, Keisuke Kinoshita, Marc Delcroix, Shinji Watanabe, Yanmin Qian
A transformer-based dual-path system is proposed, which integrates transform layers for global modeling.
no code implementations • 17 Dec 2020 • Cong Han, Yi Luo, Chenda Li, Tianyan Zhou, Keisuke Kinoshita, Shinji Watanabe, Marc Delcroix, Hakan Erdogan, John R. Hershey, Nima Mesgarani, Zhuo Chen
Leveraging additional speaker information to facilitate speech separation has received increasing attention in recent years.
1 code implementation • 14 Dec 2020 • Yi Luo, Cong Han, Nima Mesgarani
A context codec module, containing a context encoder and a context decoder, is designed as a learnable downsampling and upsampling module to decrease the length of a sequential feature processed by the separation module.
no code implementations • 4 Dec 2020 • Bin Li, Xiao Yang, Daren Sun, Zhi Ji, Zhen Jiang, Cong Han, Dong Hao
Auto-bidding plays an important role in online advertising and has become a crucial tool for advertisers and advertising platforms to meet their performance objectives and optimize the efficiency of ad delivery.
Computer Science and Game Theory
1 code implementation • 29 Sep 2019 • Yi Luo, Enea Ceolini, Cong Han, Shih-Chii Liu, Nima Mesgarani
Beamforming has been extensively investigated for multi-channel audio processing tasks.