Search Results for author: Zhenyu Tang

Found 20 papers, 8 papers with code

Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks

1 code implementation17 Apr 2019 Zhenyu Tang, John D. Kanu, Kevin Hogan, Dinesh Manocha

We present a novel learning-based approach to estimate the direction-of-arrival (DOA) of a sound source using a convolutional recurrent neural network (CRNN) trained via regression on synthetic data and Cartesian labels.

 Ranked #1 on Direction of Arrival Estimation on SOFA (using extra training data)

Direction of Arrival Estimation General Classification +1

Improving Reverberant Speech Training Using Diffuse Acoustic Simulation

no code implementations9 Jul 2019 Zhenyu Tang, Lian-Wu Chen, Bo Wu, Dong Yu, Dinesh Manocha

We present an efficient and realistic geometric acoustic simulation approach for generating and augmenting training data in speech-related machine learning tasks.

BIG-bench Machine Learning Keyword Spotting +2

Scene-Aware Audio Rendering via Deep Acoustic Analysis

no code implementations14 Nov 2019 Zhenyu Tang, Nicholas J. Bryan, DIngzeyu Li, Timothy R. Langlois, Dinesh Manocha

We present a new method to capture the acoustic characteristics of real-world rooms using commodity devices, and use the captured characteristics to generate similar sounding sources with virtual models.

Sound Graphics Multimedia Audio and Speech Processing

Severity Assessment of Coronavirus Disease 2019 (COVID-19) Using Quantitative Features from Chest CT Images

no code implementations26 Mar 2020 Zhenyu Tang, Wei Zhao, Xingzhi Xie, Zheng Zhong, Feng Shi, Jun Liu, Dinggang Shen

Purpose: Using machine learning method to realize automatic severity assessment (non-severe or severe) of COVID-19 based on chest CT images, and to explore the severity-related features from the resulting assessment model.

Computed Tomography (CT)

Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation and Diagnosis for COVID-19

1 code implementation6 Apr 2020 Feng Shi, Jun Wang, Jun Shi, Ziyan Wu, Qian Wang, Zhenyu Tang, Kelei He, Yinghuan Shi, Dinggang Shen

In this review paper, we thus cover the entire pipeline of medical imaging and analysis techniques involved with COVID-19, including image acquisition, segmentation, diagnosis, and follow-up.

Computed Tomography (CT)

Synergistic Learning of Lung Lobe Segmentation and Hierarchical Multi-Instance Classification for Automated Severity Assessment of COVID-19 in CT Images

no code implementations8 May 2020 Kelei He, Wei Zhao, Xingzhi Xie, Wen Ji, Mingxia Liu, Zhenyu Tang, Feng Shi, Yang Gao, Jun Liu, Junfeng Zhang, Dinggang Shen

Considering that only a few infection regions in a CT image are related to the severity assessment, we first represent each input image by a bag that contains a set of 2D image patches (with each cropped from a specific slice).

Segmentation

Fast 3D Acoustic Scattering via Discrete Laplacian Based Implicit Function Encoders

no code implementations1 Jan 2021 Hsien-Yu Meng, Zhenyu Tang, Dinesh Manocha

Acoustic properties of objects corresponding to scattering characteristics are frequently used for 3D audio content creation, environmental acoustic effects, localization and acoustic scene analysis, etc.

Scene-aware Far-field Automatic Speech Recognition

no code implementations21 Apr 2021 Zhenyu Tang, Dinesh Manocha

We use a deep learning-based estimator to non-intrusively compute the sub-band reverberation time of an environment from its speech samples.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Online Self-Attentive Gated RNNs for Real-Time Speaker Separation

no code implementations25 Jun 2021 Ori Kabeli, Yossi Adi, Zhenyu Tang, Buye Xu, Anurag Kumar

Our stateful implementation for online separation leads to a minor drop in performance compared to the offline model; 0. 8dB for monaural inputs and 0. 3dB for binaural inputs while reaching a real-time factor of 0. 65.

blind source separation Speaker Separation

FAST-RIR: Fast neural diffuse room impulse response generator

2 code implementations7 Oct 2021 Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu, Zhenyu Tang, Dinesh Manocha, Dong Yu

We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

MESH2IR: Neural Acoustic Impulse Response Generator for Complex 3D Scenes

2 code implementations18 May 2022 Anton Ratnarajah, Zhenyu Tang, Rohith Chandrashekar Aralikatti, Dinesh Manocha

We show that the acoustic metrics of the IRs predicted from our MESH2IR match the ground truth with less than 10% error.

2k Speech Dereverberation +1

Synthetic Wave-Geometric Impulse Responses for Improved Speech Dereverberation

no code implementations10 Dec 2022 Rohith Aralikatti, Zhenyu Tang, Dinesh Manocha

We present a novel approach to improve the performance of learning-based speech dereverberation using accurate synthetic datasets.

Speech Dereverberation

Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting

1 code implementation20 Dec 2023 Junwu Zhang, Zhenyu Tang, Yatian Pang, Xinhua Cheng, Peng Jin, Yida Wei, Munan Ning, Li Yuan

The core idea is to combine the powerful image generation capability of the 2D diffusion model and the texture alignment ability of the repainting strategy for generating high-quality multi-view images with consistency.

3D Generation Image to 3D

LLMBind: A Unified Modality-Task Integration Framework

no code implementations22 Feb 2024 Bin Zhu, Munan Ning, Peng Jin, Bin Lin, Jinfa Huang, Qi Song, Junwu Zhang, Zhenyu Tang, Mingjun Pan, Xing Zhou, Li Yuan

In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress.

Audio Generation Image Segmentation +3

Envision3D: One Image to 3D with Anchor Views Interpolation

1 code implementation13 Mar 2024 Yatian Pang, Tanghui Jia, Yujun Shi, Zhenyu Tang, Junwu Zhang, Xinhua Cheng, Xing Zhou, Francis E. H. Tay, Li Yuan

To address this issue, we propose a novel cascade diffusion framework, which decomposes the challenging dense views generation task into two tractable stages, namely anchor views generation and anchor views interpolation.

Image to 3D

VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing

no code implementations10 Apr 2024 Philip Anastassiou, Zhenyu Tang, Kainan Peng, Dongya Jia, Jiaxin Li, Ming Tu, Yuping Wang, Yuxuan Wang, Mingbo Ma

We present VoiceShop, a novel speech-to-speech framework that can modify multiple attributes of speech, such as age, gender, accent, and speech style, in a single forward pass while preserving the input speaker's timbre.

Attribute

Cannot find the paper you are looking for? You can Submit a new open access paper.