Search Results for author: Zhenyu Tang

Found 20 papers, 8 papers with code

Regression and Classification for Direction-of-Arrival Estimation with Convolutional Recurrent Neural Networks

1 code implementation • 17 Apr 2019 • Zhenyu Tang, John D. Kanu, Kevin Hogan, Dinesh Manocha

We present a novel learning-based approach to estimate the direction-of-arrival (DOA) of a sound source using a convolutional recurrent neural network (CRNN) trained via regression on synthetic data and Cartesian labels.

Ranked #1 on Direction of Arrival Estimation on SOFA (using extra training data)

Direction of Arrival Estimation General Classification +1

Paper
Code

Improving Reverberant Speech Training Using Diffuse Acoustic Simulation

no code implementations • 9 Jul 2019 • Zhenyu Tang, Lian-Wu Chen, Bo Wu, Dong Yu, Dinesh Manocha

We present an efficient and realistic geometric acoustic simulation approach for generating and augmenting training data in speech-related machine learning tasks.

BIG-bench Machine Learning Keyword Spotting +2

Paper
Add Code

Scene-Aware Audio Rendering via Deep Acoustic Analysis

no code implementations • 14 Nov 2019 • Zhenyu Tang, Nicholas J. Bryan, DIngzeyu Li, Timothy R. Langlois, Dinesh Manocha

We present a new method to capture the acoustic characteristics of real-world rooms using commodity devices, and use the captured characteristics to generate similar sounding sources with virtual models.

Sound Graphics Multimedia Audio and Speech Processing

Paper
Add Code

Severity Assessment of Coronavirus Disease 2019 (COVID-19) Using Quantitative Features from Chest CT Images

no code implementations • 26 Mar 2020 • Zhenyu Tang, Wei Zhao, Xingzhi Xie, Zheng Zhong, Feng Shi, Jun Liu, Dinggang Shen

Purpose: Using machine learning method to realize automatic severity assessment (non-severe or severe) of COVID-19 based on chest CT images, and to explore the severity-related features from the resulting assessment model.

Computed Tomography (CT)

Paper
Add Code

Review of Artificial Intelligence Techniques in Imaging Data Acquisition, Segmentation and Diagnosis for COVID-19

1 code implementation • 6 Apr 2020 • Feng Shi, Jun Wang, Jun Shi, Ziyan Wu, Qian Wang, Zhenyu Tang, Kelei He, Yinghuan Shi, Dinggang Shen

In this review paper, we thus cover the entire pipeline of medical imaging and analysis techniques involved with COVID-19, including image acquisition, segmentation, diagnosis, and follow-up.

Computed Tomography (CT)

Paper
Code

Synergistic Learning of Lung Lobe Segmentation and Hierarchical Multi-Instance Classification for Automated Severity Assessment of COVID-19 in CT Images

no code implementations • 8 May 2020 • Kelei He, Wei Zhao, Xingzhi Xie, Wen Ji, Mingxia Liu, Zhenyu Tang, Feng Shi, Yang Gao, Jun Liu, Junfeng Zhang, Dinggang Shen

Considering that only a few infection regions in a CT image are related to the severity assessment, we first represent each input image by a bag that contains a set of 2D image patches (with each cropped from a specific slice).

Segmentation

Paper
Add Code

Fast 3D Acoustic Scattering via Discrete Laplacian Based Implicit Function Encoders

no code implementations • 1 Jan 2021 • Hsien-Yu Meng, Zhenyu Tang, Dinesh Manocha

Acoustic properties of objects corresponding to scattering characteristics are frequently used for 3D audio content creation, environmental acoustic effects, localization and acoustic scene analysis, etc.

Paper
Add Code

Scene-aware Far-field Automatic Speech Recognition

no code implementations • 21 Apr 2021 • Zhenyu Tang, Dinesh Manocha

We use a deep learning-based estimator to non-intrusively compute the sub-band reverberation time of an environment from its speech samples.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Add Code

Online Self-Attentive Gated RNNs for Real-Time Speaker Separation

no code implementations • 25 Jun 2021 • Ori Kabeli, Yossi Adi, Zhenyu Tang, Buye Xu, Anurag Kumar

Our stateful implementation for online separation leads to a minor drop in performance compared to the offline model; 0. 8dB for monaural inputs and 0. 3dB for binaural inputs while reaching a real-time factor of 0. 65.

blind source separation Speaker Separation

Paper
Add Code

Improving Reverberant Speech Separation with Multi-stage Training and Curriculum Learning

no code implementations • 19 Jul 2021 • Rohith Aralikatti, Anton Ratnarajah, Zhenyu Tang, Dinesh Manocha

We present a novel approach that improves the performance of reverberant speech separation.

Speech Separation

Paper
Add Code

FAST-RIR: Fast neural diffuse room impulse response generator

2 code implementations • 7 Oct 2021 • Anton Ratnarajah, Shi-Xiong Zhang, Meng Yu, Zhenyu Tang, Dinesh Manocha, Dong Yu

We present a neural-network-based fast diffuse room impulse response generator (FAST-RIR) for generating room impulse responses (RIRs) for a given acoustic environment.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

137

Paper
Code

MESH2IR: Neural Acoustic Impulse Response Generator for Complex 3D Scenes

2 code implementations • 18 May 2022 • Anton Ratnarajah, Zhenyu Tang, Rohith Chandrashekar Aralikatti, Dinesh Manocha

We show that the acoustic metrics of the IRs predicted from our MESH2IR match the ground truth with less than 10% error.

2k Speech Dereverberation +1

137

Paper
Code

Synthetic Wave-Geometric Impulse Responses for Improved Speech Dereverberation

no code implementations • 10 Dec 2022 • Rohith Aralikatti, Zhenyu Tang, Dinesh Manocha

We present a novel approach to improve the performance of learning-based speech dereverberation using accurate synthetic datasets.

Speech Dereverberation

Paper
Add Code

Exploring Data Redundancy in Real-world Image Classification through Data Selection

1 code implementation • 25 Jun 2023 • Zhenyu Tang, Shaoting Zhang, Xiaosong Wang

Deep learning models often require large amounts of data for training, leading to increased costs.

Active Learning Continual Learning +3

Paper
Code

RH20T: A Comprehensive Robotic Dataset for Learning Diverse Skills in One-Shot

no code implementations • 2 Jul 2023 • Hao-Shu Fang, Hongjie Fang, Zhenyu Tang, Jirong Liu, Chenxi Wang, JunBo Wang, Haoyi Zhu, Cewu Lu

A key challenge in robotic manipulation in open domains is how to acquire diverse and generalizable skills for robots.

Imitation Learning Motion Planning +2

Paper
Add Code

Repaint123: Fast and High-quality One Image to 3D Generation with Progressive Controllable 2D Repainting

1 code implementation • 20 Dec 2023 • Junwu Zhang, Zhenyu Tang, Yatian Pang, Xinhua Cheng, Peng Jin, Yida Wei, Munan Ning, Li Yuan

The core idea is to combine the powerful image generation capability of the 2D diffusion model and the texture alignment ability of the repainting strategy for generating high-quality multi-view images with consistency.

3D Generation Image to 3D

250

Paper
Code

MoE-LLaVA: Mixture of Experts for Large Vision-Language Models

2 code implementations • 29 Jan 2024 • Bin Lin, Zhenyu Tang, Yang Ye, Jiaxi Cui, Bin Zhu, Peng Jin, Jinfa Huang, Junwu Zhang, Munan Ning, Li Yuan

In this work, we propose a simple yet effective training strategy MoE-Tuning for LVLMs.

Ranked #52 on Visual Question Answering on MM-Vet

Hallucination Visual Question Answering

2,344

Paper
Code

LLMBind: A Unified Modality-Task Integration Framework

no code implementations • 22 Feb 2024 • Bin Zhu, Munan Ning, Peng Jin, Bin Lin, Jinfa Huang, Qi Song, Junwu Zhang, Zhenyu Tang, Mingjun Pan, Xing Zhou, Li Yuan

In the multi-modal domain, the dependence of various models on specific input formats leads to user confusion and hinders progress.

Audio Generation Image Segmentation +3

Paper
Add Code

Envision3D: One Image to 3D with Anchor Views Interpolation

1 code implementation • 13 Mar 2024 • Yatian Pang, Tanghui Jia, Yujun Shi, Zhenyu Tang, Junwu Zhang, Xinhua Cheng, Xing Zhou, Francis E. H. Tay, Li Yuan

To address this issue, we propose a novel cascade diffusion framework, which decomposes the challenging dense views generation task into two tractable stages, namely anchor views generation and anchor views interpolation.

Image to 3D

Paper
Code

VoiceShop: A Unified Speech-to-Speech Framework for Identity-Preserving Zero-Shot Voice Editing

no code implementations • 10 Apr 2024 • Philip Anastassiou, Zhenyu Tang, Kainan Peng, Dongya Jia, Jiaxin Li, Ming Tu, Yuping Wang, Yuxuan Wang, Mingbo Ma

We present VoiceShop, a novel speech-to-speech framework that can modify multiple attributes of speech, such as age, gender, accent, and speech style, in a single forward pass while preserving the input speaker's timbre.

Attribute

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.