no code implementations • 17 Feb 2023 • Vinicius Ribeiro, Yiteng Huang, Yuan Shangguan, Zhaojun Yang, Li Wan, Ming Sun
The third, proposed by us, is a hybrid solution in which the model is trained with a small set of aligned data and then tuned with a sizeable unaligned dataset.
no code implementations • 9 Nov 2022 • Haichuan Yang, Zhaojun Yang, Li Wan, Biqiao Zhang, Yangyang Shi, Yiteng Huang, Ivaylo Enchev, Limin Tang, Raziel Alvarez, Ming Sun, Xin Lei, Raghuraman Krishnamoorthi, Vikas Chandra
This paper proposes a hardware-efficient architecture, Linearized Convolution Network (LiCo-Net) for keyword spotting.
no code implementations • 8 Dec 2021 • Mufan Sang, Haoqi Li, Fang Liu, Andrew O. Arnold, Li Wan
With our strong online data augmentation strategy, the proposed SSReg shows the potential of self-supervised learning without using negative pairs and it can significantly improve the performance of self-supervised speaker representation learning with a simple Siamese network architecture.
no code implementations • 21 Oct 2019 • Shengye Wang, Li Wan, Yang Yu, Ignacio Lopez Moreno
We compare the performance of a lattice-based ensemble model and a deep neural network model to combine signals from recognizers with that of a baseline that only uses low-level acoustic signals.
2 code implementations • 12 Aug 2019 • Shaojin Ding, Quan Wang, Shuo-Yiin Chang, Li Wan, Ignacio Lopez Moreno
In this paper, we propose "personal VAD", a system to detect the voice activity of a target speaker at the frame level.
1 code implementation • 29 Nov 2018 • Li Wan, Prashant Sridhar, Yang Yu, Quan Wang, Ignacio Lopez Moreno
In many scenarios of a language identification task, the user will specify a small set of languages which he/she can speak instead of a large set of all possible languages.
1 code implementation • 30 Jan 2018 • Philip Andrew Mansfield, Quan Wang, Carlton Downey, Li Wan, Ignacio Lopez Moreno
We present a novel algorithm, called Links, designed to perform online clustering on unit vectors in a high-dimensional Euclidean space.
28 code implementations • 28 Oct 2017 • Li Wan, Quan Wang, Alan Papir, Ignacio Lopez Moreno
In this paper, we propose a new loss function called generalized end-to-end (GE2E) loss, which makes the training of speaker verification models more efficient than our previous tuple-based end-to-end (TE2E) loss function.
Ranked #1 on
Speaker Verification
on CALLHOME
2 code implementations • 28 Oct 2017 • F A Rezaur Rahman Chowdhury, Quan Wang, Ignacio Lopez Moreno, Li Wan
Attention-based models have recently shown great performance on a range of tasks, such as speech recognition, machine translation, and image captioning due to their ability to summarize relevant information that expands through the entire length of an input sequence.
4 code implementations • 28 Oct 2017 • Quan Wang, Carlton Downey, Li Wan, Philip Andrew Mansfield, Ignacio Lopez Moreno
For many years, i-vector based audio embedding techniques were the dominant approach for speaker verification and speaker diarization applications.
Ranked #1 on
Speaker Diarization
on CALLHOME-109
no code implementations • CVPR 2015 • Li Wan, David Eigen, Rob Fergus
In this paper, we propose a new model that combines these two approaches, obtaining the advantages of each.
no code implementations • 19 Nov 2014 • Li Wan, David Eigen, Rob Fergus
In this paper, we propose a new model that combines these two approaches, obtaining the advantages of each.
1 code implementation • ICML'13: Proceedings of the 30th International Conference on International Conference on Machine Learning - Volume 28 2013 • Li Wan, Matthew Zeiler, Sixin Zhang, Yann Lecun, Rob Fergus
When training with Dropout, a randomly selected subset of activations are set to zero within each layer.
Ranked #7 on
Image Classification
on MNIST