Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-Based, Alignment-Free and Hybrid Approaches

no code implementations17 Feb 2023 Vinicius Ribeiro, Yiteng Huang, Yuan Shangguan, Zhaojun Yang, Li Wan, Ming Sun

The third, proposed by us, is a hybrid solution in which the model is trained with a small set of aligned data and then tuned with a sizeable unaligned dataset.

Self-Supervised Speaker Verification with Simple Siamese Network and Self-Supervised Regularization

no code implementations8 Dec 2021 Mufan Sang, Haoqi Li, Fang Liu, Andrew O. Arnold, Li Wan

With our strong online data augmentation strategy, the proposed SSReg shows the potential of self-supervised learning without using negative pairs and it can significantly improve the performance of self-supervised speaker representation learning with a simple Siamese network architecture.

Contrastive Learning Data Augmentation +3

Signal Combination for Language Identification

no code implementations21 Oct 2019 Shengye Wang, Li Wan, Yang Yu, Ignacio Lopez Moreno

We compare the performance of a lattice-based ensemble model and a deep neural network model to combine signals from recognizers with that of a baseline that only uses low-level acoustic signals.

Language Identification speech-recognition +1

Personal VAD: Speaker-Conditioned Voice Activity Detection

2 code implementations12 Aug 2019 Shaojin Ding, Quan Wang, Shuo-Yiin Chang, Li Wan, Ignacio Lopez Moreno

In this paper, we propose "personal VAD", a system to detect the voice activity of a target speaker at the frame level.

Action Detection Activity Detection +4

Tuplemax Loss for Language Identification

1 code implementation29 Nov 2018 Li Wan, Prashant Sridhar, Yang Yu, Quan Wang, Ignacio Lopez Moreno

In many scenarios of a language identification task, the user will specify a small set of languages which he/she can speak instead of a large set of all possible languages.

Language Identification

Links: A High-Dimensional Online Clustering Method

1 code implementation30 Jan 2018 Philip Andrew Mansfield, Quan Wang, Carlton Downey, Li Wan, Ignacio Lopez Moreno

We present a novel algorithm, called Links, designed to perform online clustering on unit vectors in a high-dimensional Euclidean space.

Online Clustering

Generalized End-to-End Loss for Speaker Verification

28 code implementations28 Oct 2017 Li Wan, Quan Wang, Alan Papir, Ignacio Lopez Moreno

In this paper, we propose a new loss function called generalized end-to-end (GE2E) loss, which makes the training of speaker verification models more efficient than our previous tuple-based end-to-end (TE2E) loss function.

Domain Adaptation Speaker Verification

Attention-Based Models for Text-Dependent Speaker Verification

2 code implementations28 Oct 2017 F A Rezaur Rahman Chowdhury, Quan Wang, Ignacio Lopez Moreno, Li Wan

Attention-based models have recently shown great performance on a range of tasks, such as speech recognition, machine translation, and image captioning due to their ability to summarize relevant information that expands through the entire length of an input sequence.

Image Captioning Machine Translation +5

Speaker Diarization with LSTM

4 code implementations28 Oct 2017 Quan Wang, Carlton Downey, Li Wan, Philip Andrew Mansfield, Ignacio Lopez Moreno

For many years, i-vector based audio embedding techniques were the dominant approach for speaker verification and speaker diarization applications.

speaker-diarization Speaker Diarization +1

