no code implementations • 10 May 2022 • Tingle Li, Yichen Liu, Andrew Owens, Hang Zhao
Our model learns to manipulate the texture of a scene to match a sound, a problem we term audio-driven image stylization.
no code implementations • NeurIPS 2021 • Chenxu Hu, Qiao Tian, Tingle Li, Yuping Wang, Yuxuan Wang, Hang Zhao
Neural Dubber is a multi-modal text-to-speech (TTS) model that utilizes the lip movement in the video to control the prosody of the generated speech.
no code implementations • 29 Sep 2021 • Chenzhuang Du, Jiaye Teng, Tingle Li, Yichen Liu, Yue Wang, Yang Yuan, Hang Zhao
We name this problem of multi-modal training, \emph{Modality Laziness}.
no code implementations • 21 Jun 2021 • Chenzhuang Du, Tingle Li, Yichen Liu, Zixin Wen, Tianyu Hua, Yue Wang, Hang Zhao
We name this problem Modality Failure, and hypothesize that the imbalance of modalities and the implicit bias of common objectives in fusion method prevent encoders of each modality from sufficient feature learning.
Ranked #33 on
Semantic Segmentation
on NYU Depth v2
1 code implementation • 12 Sep 2019 • Tingle Li, Jia-Wei Chen, Haowen Hou, Ming Li
Convolutional Neural Network (CNN) or Long short-term memory (LSTM) based models with the input of spectrogram or waveforms are commonly used for deep learning based audio source separation.
Ranked #20 on
Music Source Separation
on MUSDB18