no code implementations • 8 Jan 2024 • Yang Liu, Li Wan, Yun Li, Yiteng Huang, Ming Sun, James Luan, Yangyang Shi, Xin Lei
Despite the potential of diffusion models in speech enhancement, their deployment in Acoustic Echo Cancellation (AEC) has been restricted.
no code implementations • 17 Feb 2023 • Vinicius Ribeiro, Yiteng Huang, Yuan Shangguan, Zhaojun Yang, Li Wan, Ming Sun
The third, proposed by us, is a hybrid solution in which the model is trained with a small set of aligned data and then tuned with a sizeable unaligned dataset.
no code implementations • 9 Nov 2022 • Haichuan Yang, Zhaojun Yang, Li Wan, Biqiao Zhang, Yangyang Shi, Yiteng Huang, Ivaylo Enchev, Limin Tang, Raziel Alvarez, Ming Sun, Xin Lei, Raghuraman Krishnamoorthi, Vikas Chandra
This paper proposes a hardware-efficient architecture, Linearized Convolution Network (LiCo-Net) for keyword spotting.
no code implementations • 8 Dec 2021 • Mufan Sang, Haoqi Li, Fang Liu, Andrew O. Arnold, Li Wan
With our strong online data augmentation strategy, the proposed SSReg shows the potential of self-supervised learning without using negative pairs and it can significantly improve the performance of self-supervised speaker representation learning with a simple Siamese network architecture.
no code implementations • 21 Oct 2019 • Shengye Wang, Li Wan, Yang Yu, Ignacio Lopez Moreno
We compare the performance of a lattice-based ensemble model and a deep neural network model to combine signals from recognizers with that of a baseline that only uses low-level acoustic signals.
2 code implementations • 12 Aug 2019 • Shaojin Ding, Quan Wang, Shuo-Yiin Chang, Li Wan, Ignacio Lopez Moreno
In this paper, we propose "personal VAD", a system to detect the voice activity of a target speaker at the frame level.
1 code implementation • 29 Nov 2018 • Li Wan, Prashant Sridhar, Yang Yu, Quan Wang, Ignacio Lopez Moreno
In many scenarios of a language identification task, the user will specify a small set of languages which he/she can speak instead of a large set of all possible languages.
1 code implementation • 30 Jan 2018 • Philip Andrew Mansfield, Quan Wang, Carlton Downey, Li Wan, Ignacio Lopez Moreno
We present a novel algorithm, called Links, designed to perform online clustering on unit vectors in a high-dimensional Euclidean space.
4 code implementations • 28 Oct 2017 • Quan Wang, Carlton Downey, Li Wan, Philip Andrew Mansfield, Ignacio Lopez Moreno
For many years, i-vector based audio embedding techniques were the dominant approach for speaker verification and speaker diarization applications.
Ranked #2 on Speaker Diarization on CALLHOME-109
28 code implementations • 28 Oct 2017 • Li Wan, Quan Wang, Alan Papir, Ignacio Lopez Moreno
In this paper, we propose a new loss function called generalized end-to-end (GE2E) loss, which makes the training of speaker verification models more efficient than our previous tuple-based end-to-end (TE2E) loss function.
Ranked #1 on Speaker Verification on CALLHOME
2 code implementations • 28 Oct 2017 • F A Rezaur Rahman Chowdhury, Quan Wang, Ignacio Lopez Moreno, Li Wan
Attention-based models have recently shown great performance on a range of tasks, such as speech recognition, machine translation, and image captioning due to their ability to summarize relevant information that expands through the entire length of an input sequence.
no code implementations • CVPR 2015 • Li Wan, David Eigen, Rob Fergus
In this paper, we propose a new model that combines these two approaches, obtaining the advantages of each.
no code implementations • 19 Nov 2014 • Li Wan, David Eigen, Rob Fergus
In this paper, we propose a new model that combines these two approaches, obtaining the advantages of each.
1 code implementation • ICML'13: Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 2013 • Li Wan, Matthew Zeiler, Sixin Zhang, Yann Lecun, Rob Fergus
When training with Dropout, a randomly selected subset of activations are set to zero within each layer.
Ranked #6 on Image Classification on MNIST