Search Results for author: Li Wan

Found 15 papers, 7 papers with code

FADI-AEC: Fast Score Based Diffusion Model Guided by Far-end Signal for Acoustic Echo Cancellation

no code implementations • 8 Jan 2024 • Yang Liu, Li Wan, Yun Li, Yiteng Huang, Ming Sun, James Luan, Yangyang Shi, Xin Lei

Despite the potential of diffusion models in speech enhancement, their deployment in Acoustic Echo Cancellation (AEC) has been restricted.

Acoustic echo cancellation Speech Enhancement

Paper
Add Code

Handling the Alignment for Wake Word Detection: A Comparison Between Alignment-Based, Alignment-Free and Hybrid Approaches

no code implementations • 17 Feb 2023 • Vinicius Ribeiro, Yiteng Huang, Yuan Shangguan, Zhaojun Yang, Li Wan, Ming Sun

The third, proposed by us, is a hybrid solution in which the model is trained with a small set of aligned data and then tuned with a sizeable unaligned dataset.

Paper
Add Code

LiCo-Net: Linearized Convolution Network for Hardware-efficient Keyword Spotting

no code implementations • 9 Nov 2022 • Haichuan Yang, Zhaojun Yang, Li Wan, Biqiao Zhang, Yangyang Shi, Yiteng Huang, Ivaylo Enchev, Limin Tang, Raziel Alvarez, Ming Sun, Xin Lei, Raghuraman Krishnamoorthi, Vikas Chandra

This paper proposes a hardware-efficient architecture, Linearized Convolution Network (LiCo-Net) for keyword spotting.

Keyword Spotting

Paper
Add Code

Self-Supervised Speaker Verification with Simple Siamese Network and Self-Supervised Regularization

no code implementations • 8 Dec 2021 • Mufan Sang, Haoqi Li, Fang Liu, Andrew O. Arnold, Li Wan

With our strong online data augmentation strategy, the proposed SSReg shows the potential of self-supervised learning without using negative pairs and it can significantly improve the performance of self-supervised speaker representation learning with a simple Siamese network architecture.

Contrastive Learning Data Augmentation +3

Paper
Add Code

Signal Combination for Language Identification

no code implementations • 21 Oct 2019 • Shengye Wang, Li Wan, Yang Yu, Ignacio Lopez Moreno

We compare the performance of a lattice-based ensemble model and a deep neural network model to combine signals from recognizers with that of a baseline that only uses low-level acoustic signals.

Language Identification speech-recognition +1

Paper
Add Code

Personal VAD: Speaker-Conditioned Voice Activity Detection

2 code implementations • 12 Aug 2019 • Shaojin Ding, Quan Wang, Shuo-Yiin Chang, Li Wan, Ignacio Lopez Moreno

In this paper, we propose "personal VAD", a system to detect the voice activity of a target speaker at the frame level.

Action Detection Activity Detection +4

Paper
Code

Tuplemax Loss for Language Identification

1 code implementation • 29 Nov 2018 • Li Wan, Prashant Sridhar, Yang Yu, Quan Wang, Ignacio Lopez Moreno

In many scenarios of a language identification task, the user will specify a small set of languages which he/she can speak instead of a large set of all possible languages.

Language Identification

Paper
Code

Links: A High-Dimensional Online Clustering Method

1 code implementation • 30 Jan 2018 • Philip Andrew Mansfield, Quan Wang, Carlton Downey, Li Wan, Ignacio Lopez Moreno

We present a novel algorithm, called Links, designed to perform online clustering on unit vectors in a high-dimensional Euclidean space.

Clustering Online Clustering +1

Paper
Code

Generalized End-to-End Loss for Speaker Verification

28 code implementations • 28 Oct 2017 • Li Wan, Quan Wang, Alan Papir, Ignacio Lopez Moreno

In this paper, we propose a new loss function called generalized end-to-end (GE2E) loss, which makes the training of speaker verification models more efficient than our previous tuple-based end-to-end (TE2E) loss function.

Ranked #1 on Speaker Verification on CALLHOME

Domain Adaptation Speaker Verification

50,670

Paper
Code

Attention-Based Models for Text-Dependent Speaker Verification

2 code implementations • 28 Oct 2017 • F A Rezaur Rahman Chowdhury, Quan Wang, Ignacio Lopez Moreno, Li Wan

Attention-based models have recently shown great performance on a range of tasks, such as speech recognition, machine translation, and image captioning due to their ability to summarize relevant information that expands through the entire length of an input sequence.

Image Captioning Machine Translation +5