1 code implementation • 5 Jan 2023 • Chengyi Wang, Sanyuan Chen, Yu Wu, Ziqiang Zhang, Long Zhou, Shujie Liu, Zhuo Chen, Yanqing Liu, Huaming Wang, Jinyu Li, Lei He, Sheng Zhao, Furu Wei
In addition, we find Vall-E could preserve the speaker's emotion and acoustic environment of the acoustic prompt in synthesis.
no code implementations • 29 Dec 2022 • Jianwei Fei, Yunshu Dai, Huaming Wang, Zhihua Xia
Our goal is to reduce the features that are easy to learn in the training phase, so as to reduce the risk of overfitting on specific forgery types.
no code implementations • 27 Dec 2022 • Huaming Wang, Jianwei Fei, Yunshu Dai, Lingyun Leng, Zhihua Xia
To our knowledge, we are the first to conduct data augmentation in the fingerprint domain.
no code implementations • 14 Nov 2022 • Hira Dhamyal, Benjamin Elizalde, Soham Deshmukh, Huaming Wang, Bhiksha Raj, Rita Singh
We investigate how the model can learn to associate the audio with the descriptions, resulting in performance improvement of Speech Emotion Recognition and Speech Audio Retrieval.
no code implementations • 4 Nov 2022 • Sefik Emre Eskimez, Takuya Yoshioka, Alex Ju, Min Tang, Tanel Parnamaa, Huaming Wang
We dedicate the early layers to the AEC task while encouraging later layers for personalization by adding a bypass connection from the early layers to the mask prediction layer.
1 code implementation • 28 Sep 2022 • Soham Deshmukh, Benjamin Elizalde, Huaming Wang
In this work, we propose a new collection of web audio-text pairs and a new framework for retrieval.
no code implementations • 2 Apr 2022 • Manthan Thakker, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang
Our results show that E3Net provides better speech and transcription quality with a lower target speaker over-suppression (TSOS) rate than the baseline model.
no code implementations • 20 Oct 2021 • Hassan Taherian, Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Zhuo Chen, Xuedong Huang
Experimental results show that the proposed geometry agnostic model outperforms the model trained on a specific microphone array geometry in both speech quality and automatic speech recognition accuracy.
no code implementations • 18 Oct 2021 • Sefik Emre Eskimez, Takuya Yoshioka, Huaming Wang, Xiaofei Wang, Zhuo Chen, Xuedong Huang
Our results show that the proposed models can yield better speech recognition accuracy, speech intelligibility, and perceptual quality than the baseline models, and the multi-task training can alleviate the TSOS issue in addition to improving the speech recognition accuracy.
no code implementations • 5 Jun 2021 • Sefik Emre Eskimez, Xiaofei Wang, Min Tang, Hemin Yang, Zirun Zhu, Zhuo Chen, Huaming Wang, Takuya Yoshioka
Performance analysis is also carried out by changing the ASR model, the data used for the ASR-step, and the schedule of the two update steps.
no code implementations • 10 Dec 2019 • Takuya Yoshioka, Igor Abramovski, Cem Aksoylar, Zhuo Chen, Moshe David, Dimitrios Dimitriadis, Yifan Gong, Ilya Gurvich, Xuedong Huang, Yan Huang, Aviv Hurvitz, Li Jiang, Sharon Koubi, Eyal Krupka, Ido Leichter, Changliang Liu, Partha Parthasarathy, Alon Vinnikov, Lingfeng Wu, Xiong Xiao, Wayne Xiong, Huaming Wang, Zhenghao Wang, Jun Zhang, Yong Zhao, Tianyan Zhou
This increases marginally to 1. 6% when 50% of the attendees are unknown to the system.
no code implementations • CVPR 2018 • Jing Xu, Rui Zhao, Feng Zhu, Huaming Wang, Wanli Ouyang
AACN consists of two main components: Pose-guided Part Attention (PPA) and Attention-aware Feature Composition (AFC).