no code implementations • 10 Apr 2024 • Philip Anastassiou, Zhenyu Tang, Kainan Peng, Dongya Jia, Jiaxin Li, Ming Tu, Yuping Wang, Yuxuan Wang, Mingbo Ma
We present VoiceShop, a novel speech-to-speech framework that can modify multiple attributes of speech, such as age, gender, accent, and speech style, in a single forward pass while preserving the input speaker's timbre.
1 code implementation • 13 Mar 2024 • Yatian Pang, Tanghui Jia, Yujun Shi, Zhenyu Tang, Junwu Zhang, Xinhua Cheng, Xing Zhou, Francis E. H. Tay, Li Yuan
To address this issue, we propose a novel cascade diffusion framework, which decomposes the challenging dense views generation task into two tractable stages, namely anchor views generation and anchor views interpolation.
no code implementations • 22 Feb 2024 • Bin Zhu, Munan Ning, Peng Jin, Bin Lin, Jinfa Huang, Qi Song, Junwu Zhang, Zhenyu Tang, Mingjun Pan, Xing Zhou, Li Yuan
In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress.
2 code implementations • 29 Jan 2024 • Bin Lin, Zhenyu Tang, Yang Ye, Jiaxi Cui, Bin Zhu, Peng Jin, Jinfa Huang, Junwu Zhang, Munan Ning, Li Yuan
In this work, we propose a simple yet effective training strategy MoE-Tuning for LVLMs.
Ranked #57 on Visual Question Answering on MM-Vet
1 code implementation • 20 Dec 2023 • Junwu Zhang, Zhenyu Tang, Yatian Pang, Xinhua Cheng, Peng Jin, Yida Wei, Munan Ning, Li Yuan
The core idea is to combine the powerful image generation capability of the 2D diffusion model and the texture alignment ability of the repainting strategy for generating high-quality multi-view images with consistency.
no code implementations • 2 Jul 2023 • Hao-Shu Fang, Hongjie Fang, Zhenyu Tang, Jirong Liu, Chenxi Wang, JunBo Wang, Haoyi Zhu, Cewu Lu
A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots.
1 code implementation • 25 Jun 2023 • Zhenyu Tang, Shaoting Zhang, Xiaosong Wang
Deep learning models often require large amounts of data for training, leading to increased costs.
no code implementations • 10 Dec 2022 • Rohith Aralikatti, Zhenyu Tang, Dinesh Manocha
We present a novel approach to improve the performance of learning-based speech dereverberation using accurate synthetic datasets.
2 code implementations • 18 May 2022 • Anton Ratnarajah, Zhenyu Tang, Rohith Chandrashekar Aralikatti, Dinesh Manocha
We show that the acoustic metrics of the IRs predicted from our MESH2IR match the ground truth with less than 10% error.
2 code implementations • 7 Oct 2021 • Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu, Zhenyu Tang, Dinesh Manocha, Dong Yu
We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +2
no code implementations • 19 Jul 2021 • Rohith Aralikatti, Anton Ratnarajah, Zhenyu Tang, Dinesh Manocha
We present a novel approach that improves the performance of reverberant speech separation.
no code implementations • 25 Jun 2021 • Ori Kabeli, Yossi Adi, Zhenyu Tang, Buye Xu, Anurag Kumar
Our stateful implementation for online separation leads to a minor drop in performance compared to the offline model; 0. 8dB for monaural inputs and 0. 3dB for binaural inputs while reaching a real-time factor of 0. 65.
no code implementations • 21 Apr 2021 • Zhenyu Tang, Dinesh Manocha
We use a deep learning-based estimator to non-intrusively compute the sub-band reverberation time of an environment from its speech samples.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +1
no code implementations • 1 Jan 2021 • Hsien-Yu Meng, Zhenyu Tang, Dinesh Manocha
Acoustic properties of objects corresponding to scattering characteristics are frequently used for 3D audio content creation, environmental acoustic effects, localization and acoustic scene analysis, etc.
no code implementations • 8 May 2020 • Kelei He, Wei Zhao, Xingzhi Xie, Wen Ji, Mingxia Liu, Zhenyu Tang, Feng Shi, Yang Gao, Jun Liu, Junfeng Zhang, Dinggang Shen
Considering that only a few infection regions in a CT image are related to the severity assessment, we first represent each input image by a bag that contains a set of 2D image patches (with each cropped from a specific slice).
1 code implementation • 6 Apr 2020 • Feng Shi, Jun Wang, Jun Shi, Ziyan Wu, Qian Wang, Zhenyu Tang, Kelei He, Yinghuan Shi, Dinggang Shen
In this review paper, we thus cover the entire pipeline of medical imaging and analysis techniques involved with COVID-19, including image acquisition, segmentation, diagnosis, and follow-up.
no code implementations • 26 Mar 2020 • Zhenyu Tang, Wei Zhao, Xingzhi Xie, Zheng Zhong, Feng Shi, Jun Liu, Dinggang Shen
Purpose: Using machine learning method to realize automatic severity assessment (non-severe or severe) of COVID-19 based on chest CT images, and to explore the severity-related features from the resulting assessment model.
no code implementations • 14 Nov 2019 • Zhenyu Tang, Nicholas J. Bryan, DIngzeyu Li, Timothy R. Langlois, Dinesh Manocha
We present a new method to capture the acoustic characteristics of real-world rooms using commodity devices, and use the captured characteristics to generate similar sounding sources with virtual models.
Sound Graphics Multimedia Audio and Speech Processing
no code implementations • 9 Jul 2019 • Zhenyu Tang, Lian-Wu Chen, Bo Wu, Dong Yu, Dinesh Manocha
We present an efficient and realistic geometric acoustic simulation approach for generating and augmenting training data in speech-related machine learning tasks.
1 code implementation • 17 Apr 2019 • Zhenyu Tang, John D. Kanu, Kevin Hogan, Dinesh Manocha
We present a novel learning-based approach to estimate the direction-of-arrival (DOA) of a sound source using a convolutional recurrent neural network (CRNN) trained via regression on synthetic data and Cartesian labels.
Ranked #1 on Direction of Arrival Estimation on SOFA (using extra training data)