Search Results for author: Ignacio Lopez Moreno

Found 22 papers, 13 papers with code

Augmenting Transformer-Transducer Based Speaker Change Detection With Token-Level Training Loss

no code implementations11 Nov 2022 Guanlong Zhao, Quan Wang, Han Lu, Yiling Huang, Ignacio Lopez Moreno

Due to the sparsity of the speaker changes in the training data, the conventional T-T based SCD model loss leads to sub-optimal detection accuracy.

Change Detection

Highly Efficient Real-Time Streaming and Fully On-Device Speaker Diarization with Multi-Stage Clustering

1 code implementation25 Oct 2022 Quan Wang, Yiling Huang, Han Lu, Guanlong Zhao, Ignacio Lopez Moreno

This multi-stage clustering strategy is critical for streaming on-device speaker diarization systems, where the budgets of CPU, memory and battery are tight.

speaker-diarization Speaker Diarization

Parameter-Free Attentive Scoring for Speaker Verification

1 code implementation10 Mar 2022 Jason Pelecanos, Quan Wang, Yiling Huang, Ignacio Lopez Moreno

This paper presents a novel study of parameter-free attentive scoring for speaker verification.

Speaker Verification

Noisy student-teacher training for robust keyword spotting

no code implementations3 Jun 2021 Hyun-Jin Park, Pai Zhu, Ignacio Lopez Moreno, Niranjan Subrahmanya

We propose self-training with noisy student-teacher approach for streaming keyword spotting, that can utilize large-scale unlabeled data and aggressive data augmentation.

Data Augmentation Keyword Spotting

Dr-Vectors: Decision Residual Networks and an Improved Loss for Speaker Recognition

no code implementations5 Apr 2021 Jason Pelecanos, Quan Wang, Ignacio Lopez Moreno

In this work we propose scoring these representations in a way that can capture uncertainty, enroll/test asymmetry and additional non-linear information.

Speaker Recognition

VoiceFilter-Lite: Streaming Targeted Voice Separation for On-Device Speech Recognition

1 code implementation9 Sep 2020 Quan Wang, Ignacio Lopez Moreno, Mert Saglam, Kevin Wilson, Alan Chiao, Renjie Liu, Yanzhang He, Wei Li, Jason Pelecanos, Marily Nika, Alexander Gruenstein

We introduce VoiceFilter-Lite, a single-channel source separation model that runs on the device to preserve only the speech signals from a target user, as part of a streaming speech recognition system.

speech-recognition Speech Recognition

Version Control of Speaker Recognition Systems

no code implementations23 Jul 2020 Quan Wang, Ignacio Lopez Moreno

This paper discusses one of the most challenging practical engineering problems in speaker recognition systems - the version control of models and user profiles.

Speaker Recognition

Training Keyword Spotting Models on Non-IID Data with Federated Learning

no code implementations21 May 2020 Andrew Hard, Kurt Partridge, Cameron Nguyen, Niranjan Subrahmanya, Aishanee Shah, Pai Zhu, Ignacio Lopez Moreno, Rajiv Mathews

We demonstrate that a production-quality keyword-spotting model can be trained on-device using federated learning and achieve comparable false accept and false reject rates to a centrally-trained model.

Data Augmentation Federated Learning +1

Signal Combination for Language Identification

no code implementations21 Oct 2019 Shengye Wang, Li Wan, Yang Yu, Ignacio Lopez Moreno

We compare the performance of a lattice-based ensemble model and a deep neural network model to combine signals from recognizers with that of a baseline that only uses low-level acoustic signals.

Language Identification speech-recognition +1

Personal VAD: Speaker-Conditioned Voice Activity Detection

2 code implementations12 Aug 2019 Shaojin Ding, Quan Wang, Shuo-Yiin Chang, Li Wan, Ignacio Lopez Moreno

In this paper, we propose "personal VAD", a system to detect the voice activity of a target speaker at the frame level.

Action Detection Activity Detection +4

Tuplemax Loss for Language Identification

1 code implementation29 Nov 2018 Li Wan, Prashant Sridhar, Yang Yu, Quan Wang, Ignacio Lopez Moreno

In many scenarios of a language identification task, the user will specify a small set of languages which he/she can speak instead of a large set of all possible languages.

Language Identification

VoiceFilter: Targeted Voice Separation by Speaker-Conditioned Spectrogram Masking

4 code implementations11 Oct 2018 Quan Wang, Hannah Muckenhirn, Kevin Wilson, Prashant Sridhar, Zelin Wu, John Hershey, Rif A. Saurous, Ron J. Weiss, Ye Jia, Ignacio Lopez Moreno

In this paper, we present a novel system that separates the voice of a target speaker from multi-speaker signals, by making use of a reference signal from the target speaker.

Speaker Recognition Speaker Separation +3

Links: A High-Dimensional Online Clustering Method

1 code implementation30 Jan 2018 Philip Andrew Mansfield, Quan Wang, Carlton Downey, Li Wan, Ignacio Lopez Moreno

We present a novel algorithm, called Links, designed to perform online clustering on unit vectors in a high-dimensional Euclidean space.

Online Clustering

Speaker Diarization with LSTM

4 code implementations28 Oct 2017 Quan Wang, Carlton Downey, Li Wan, Philip Andrew Mansfield, Ignacio Lopez Moreno

For many years, i-vector based audio embedding techniques were the dominant approach for speaker verification and speaker diarization applications.

speaker-diarization Speaker Diarization +1

Attention-Based Models for Text-Dependent Speaker Verification

2 code implementations28 Oct 2017 F A Rezaur Rahman Chowdhury, Quan Wang, Ignacio Lopez Moreno, Li Wan

Attention-based models have recently shown great performance on a range of tasks, such as speech recognition, machine translation, and image captioning due to their ability to summarize relevant information that expands through the entire length of an input sequence.

Image Captioning Machine Translation +5

Generalized End-to-End Loss for Speaker Verification

28 code implementations28 Oct 2017 Li Wan, Quan Wang, Alan Papir, Ignacio Lopez Moreno

In this paper, we propose a new loss function called generalized end-to-end (GE2E) loss, which makes the training of speaker verification models more efficient than our previous tuple-based end-to-end (TE2E) loss function.

Domain Adaptation Speaker Verification

Cannot find the paper you are looking for? You can Submit a new open access paper.