no code implementations • 25 Feb 2023 • Pai Zhu, Hyun Jin Park, Alex Park, Angelo Scorza Scarpati, Ignacio Lopez Moreno
A Multilingual Keyword Spotting (KWS) system detects spokenkeywords over multiple locales.
no code implementations • 11 Nov 2022 • Guanlong Zhao, Quan Wang, Han Lu, Yiling Huang, Ignacio Lopez Moreno
Due to the sparsity of the speaker changes in the training data, the conventional T-T based SCD model loss leads to sub-optimal detection accuracy.
1 code implementation • 25 Oct 2022 • Quan Wang, Yiling Huang, Han Lu, Guanlong Zhao, Ignacio Lopez Moreno
While recent research advances in speaker diarization mostly focus on improving the quality of diarization results, there is also an increasing interest in improving the efficiency of diarization systems.
no code implementations • 11 Apr 2022 • Andrew Hard, Kurt Partridge, Neng Chen, Sean Augenstein, Aishanee Shah, Hyun Jin Park, Alex Park, Sara Ng, Jessica Nguyen, Ignacio Lopez Moreno, Rajiv Mathews, Françoise Beaufays
We trained a keyword spotting model using federated learning on real user devices and observed significant improvements when the model was deployed for inference on phones.
1 code implementation • 10 Mar 2022 • Jason Pelecanos, Quan Wang, Yiling Huang, Ignacio Lopez Moreno
This paper presents a novel study of parameter-free attentive scoring for speaker verification.
1 code implementation • 24 Feb 2022 • Quan Wang, Yang Yu, Jason Pelecanos, Yiling Huang, Ignacio Lopez Moreno
In this paper, we introduce a novel language identification system based on conformer layers.
1 code implementation • 23 Sep 2021 • Wei Xia, Han Lu, Quan Wang, Anshuman Tripathi, Yiling Huang, Ignacio Lopez Moreno, Hasim Sak
In this paper, we present a novel speaker diarization system for streaming on-device applications.
no code implementations • 3 Jun 2021 • Hyun-Jin Park, Pai Zhu, Ignacio Lopez Moreno, Niranjan Subrahmanya
We propose self-training with noisy student-teacher approach for streaming keyword spotting, that can utilize large-scale unlabeled data and aggressive data augmentation.
1 code implementation • 5 Apr 2021 • Jason Pelecanos, Quan Wang, Ignacio Lopez Moreno
In this work we propose scoring these representations in a way that can capture uncertainty, enroll/test asymmetry and additional non-linear information.
1 code implementation • 5 Apr 2021 • Roza Chojnacka, Jason Pelecanos, Quan Wang, Ignacio Lopez Moreno
To the best of our knowledge, this is the first study of speaker verification systems at the scale of 46 languages.
2 code implementations • 9 Sep 2020 • Quan Wang, Ignacio Lopez Moreno, Mert Saglam, Kevin Wilson, Alan Chiao, Renjie Liu, Yanzhang He, Wei Li, Jason Pelecanos, Marily Nika, Alexander Gruenstein
We introduce VoiceFilter-Lite, a single-channel source separation model that runs on the device to preserve only the speech signals from a target user, as part of a streaming speech recognition system.
1 code implementation • 23 Jul 2020 • Quan Wang, Ignacio Lopez Moreno
In this paper, we describe different version control strategies for speaker recognition systems that had been carefully studied at Google from years of engineering practice.
no code implementations • 21 May 2020 • Andrew Hard, Kurt Partridge, Cameron Nguyen, Niranjan Subrahmanya, Aishanee Shah, Pai Zhu, Ignacio Lopez Moreno, Rajiv Mathews
We demonstrate that a production-quality keyword-spotting model can be trained on-device using federated learning and achieve comparable false accept and false reject rates to a centrally-trained model.
no code implementations • 21 Oct 2019 • Shengye Wang, Li Wan, Yang Yu, Ignacio Lopez Moreno
We compare the performance of a lattice-based ensemble model and a deep neural network model to combine signals from recognizers with that of a baseline that only uses low-level acoustic signals.
2 code implementations • 12 Aug 2019 • Shaojin Ding, Quan Wang, Shuo-Yiin Chang, Li Wan, Ignacio Lopez Moreno
In this paper, we propose "personal VAD", a system to detect the voice activity of a target speaker at the frame level.
1 code implementation • 29 Nov 2018 • Li Wan, Prashant Sridhar, Yang Yu, Quan Wang, Ignacio Lopez Moreno
In many scenarios of a language identification task, the user will specify a small set of languages which he/she can speak instead of a large set of all possible languages.
5 code implementations • 11 Oct 2018 • Quan Wang, Hannah Muckenhirn, Kevin Wilson, Prashant Sridhar, Zelin Wu, John Hershey, Rif A. Saurous, Ron J. Weiss, Ye Jia, Ignacio Lopez Moreno
In this paper, we present a novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker.
11 code implementations • NeurIPS 2018 • Ye Jia, Yu Zhang, Ron J. Weiss, Quan Wang, Jonathan Shen, Fei Ren, Zhifeng Chen, Patrick Nguyen, Ruoming Pang, Ignacio Lopez Moreno, Yonghui Wu
Clone a voice in 5 seconds to generate arbitrary speech in real-time
1 code implementation • 30 Jan 2018 • Philip Andrew Mansfield, Quan Wang, Carlton Downey, Li Wan, Ignacio Lopez Moreno
We present a novel algorithm, called Links, designed to perform online clustering on unit vectors in a high-dimensional Euclidean space.
29 code implementations • 28 Oct 2017 • Li Wan, Quan Wang, Alan Papir, Ignacio Lopez Moreno
In this paper, we propose a new loss function called generalized end-to-end (GE2E) loss, which makes the training of speaker verification models more efficient than our previous tuple-based end-to-end (TE2E) loss function.
Ranked #1 on
Speaker Verification
on CALLHOME
4 code implementations • 28 Oct 2017 • Quan Wang, Carlton Downey, Li Wan, Philip Andrew Mansfield, Ignacio Lopez Moreno
For many years, i-vector based audio embedding techniques were the dominant approach for speaker verification and speaker diarization applications.
Ranked #2 on
Speaker Diarization
on CALLHOME-109
2 code implementations • 28 Oct 2017 • F A Rezaur Rahman Chowdhury, Quan Wang, Ignacio Lopez Moreno, Li Wan
Attention-based models have recently shown great performance on a range of tasks, such as speech recognition, machine translation, and image captioning due to their ability to summarize relevant information that expands through the entire length of an input sequence.