Search Results for author: Zhiyao Duan

Found 37 papers, 28 papers with code

One-class learning towards generalized voice spoofing detection

3 code implementations27 Oct 2020 You Zhang, Fei Jiang, Zhiyao Duan

Human voices can be used to authenticate the identity of the speaker, but the automatic speaker verification (ASV) systems are vulnerable to voice spoofing attacks, such as impersonation, replay, text-to-speech, and voice conversion.

Speaker Verification Voice Anti-spoofing +1

Don't look back: an online beat tracking method using RNN and enhanced particle filtering

1 code implementation5 Nov 2020 Mojtaba Heydari, Zhiyao Duan

Most preexisting OBT methods either apply some offline approaches to a moving window containing past data to make predictions about future beat positions or must be primed with past data at startup to initialize.

Online Beat Tracking

BeatNet: CRNN and Particle Filtering for Online Joint Beat Downbeat and Meter Tracking

4 code implementations8 Aug 2021 Mojtaba Heydari, Frank Cwitkowitz, Zhiyao Duan

The online estimation of rhythmic information, such as beat positions, downbeat positions, and meter, is critical for many real-time music applications.

Online Beat Tracking Online Downbeat Tracking

Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss

1 code implementation9 May 2019 Lele Chen, Ross K. Maddox, Zhiyao Duan, Chenliang Xu

We devise a cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions.

ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed

1 code implementation23 Sep 2022 Meiying Chen, Zhiyao Duan

In this paper, we propose ControlVC, the first neural voice conversion system that achieves time-varying controls on pitch and speed.

Pitch control Speech Synthesis +1

An Empirical Study on Channel Effects for Synthetic Voice Spoofing Countermeasure Systems

3 code implementations3 Apr 2021 You Zhang, Ge Zhu, Fei Jiang, Zhiyao Duan

Spoofing countermeasure (CM) systems are critical in speaker verification; they aim to discern spoofing attacks from bona fide speech trials.

Data Augmentation Multi-Task Learning +2

UR Channel-Robust Synthetic Speech Detection System for ASVspoof 2021

2 code implementations26 Jul 2021 Xinhui Chen, You Zhang, Ge Zhu, Zhiyao Duan

Different from previous ASVspoof challenges, the LA task this year presents codec and transmission channel variability, while the new task DF presents general audio compression.

Audio Compression Face Swapping +2

Skipping the Frame-Level: Event-Based Piano Transcription With Neural Semi-CRFs

1 code implementation NeurIPS 2021 Yujia Yan, Frank Cwitkowitz, Zhiyao Duan

When formulating piano transcription in this way, we eliminate the need to rely on disjoint frame-level estimates for different stages of a note event.

Multi-Task Learning

SingFake: Singing Voice Deepfake Detection

1 code implementation14 Sep 2023 Yongyi Zang, You Zhang, Mojtaba Heydari, Zhiyao Duan

These unique properties make singing voice deepfake detection a relevant but significantly different problem from synthetic speech detection.

Face Swapping Singing Voice Synthesis +1

Lip Movements Generation at a Glance

1 code implementation ECCV 2018 Lele Chen, Zhiheng Li, Ross K. Maddox, Zhiyao Duan, Chenliang Xu

In this paper, we consider a task of such: given an arbitrary audio speech and one lip image of arbitrary target identity, generate synthesized lip movements of the target identity saying the speech.

SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice Anti-Spoofing

1 code implementation4 Nov 2022 Siwen Ding, You Zhang, Zhiyao Duan

Our previous research on one-class learning has improved the generalization ability to unseen attacks by compacting the bona fide speech in the embedding space.

Speaker Verification Speech Synthesis +1

Y-Vector: Multiscale Waveform Encoder for Speaker Embedding

1 code implementation24 Oct 2020 Ge Zhu, Fei Jiang, Zhiyao Duan

State-of-the-art text-independent speaker verification systems typically use cepstral features or filter bank energies as speech features.

Text-Independent Speaker Verification

Singing Beat Tracking With Self-supervised Front-end and Linear Transformers

1 code implementation31 Aug 2022 Mojtaba Heydari, Zhiyao Duan

Tracking beats of singing voices without the presence of musical accompaniment can find many applications in music production, automatic song arrangement, and social media interaction.

Learning Sparse Analytic Filters for Piano Transcription

2 code implementations23 Aug 2021 Frank Cwitkowitz, Mojtaba Heydari, Zhiyao Duan

In this work, several variations of a frontend filterbank learning module are investigated for piano transcription, a challenging low-level music information retrieval task.

Information Retrieval Music Information Retrieval +1

Music Source Separation with Generative Flow

1 code implementation19 Apr 2022 Ge Zhu, Jordan Darefsky, Fei Jiang, Anton Selitskiy, Zhiyao Duan

Fully-supervised models for source separation are trained on parallel mixture-source data and are currently state-of-the-art.

Music Source Separation

HRTF Field: Unifying Measured HRTF Magnitude Representation with Neural Fields

2 code implementations27 Oct 2022 You Zhang, Yuxiang Wang, Zhiyao Duan

In this work, we propose to use neural fields, a differentiable representation of functions through neural networks, to model HRTFs with arbitrary spatial sampling schemes.

Mitigating Cross-Database Differences for Learning Unified HRTF Representation

2 code implementations27 Jul 2023 Yutong Wen, You Zhang, Zhiyao Duan

We further show that these normalized HRTFs can be used to learn a more unified HRTF representation across databases than the prior art.

A Data-Driven Methodology for Considering Feasibility and Pairwise Likelihood in Deep Learning Based Guitar Tablature Transcription Systems

2 code implementations17 Apr 2022 Frank Cwitkowitz, Jonathan Driedger, Zhiyao Duan

This naturally enforces playability constraints for guitar, and yields tablature which is more consistent with the symbolic data used to estimate pairwise likelihoods.

Information Retrieval Music Information Retrieval +1

A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification

1 code implementation10 Feb 2022 You Zhang, Ge Zhu, Zhiyao Duan

We further propose fusion strategies for direct inference and fine-tuning to predict the SASV score based on the framework.

Speaker Verification

When Counterpoint Meets Chinese Folk Melodies

1 code implementation NeurIPS 2020 Nan Jiang, Sheng Jin, Zhiyao Duan, ChangShui Zhang

An interaction reward model is trained on the duets formed from outer parts of Bach chorales to model counterpoint interaction, while a style reward model is trained on monophonic melodies of Chinese folk songs to model melodic patterns.

Transcription free filler word detection with Neural semi-CRFs

1 code implementation11 Mar 2023 Ge Zhu, Yujia Yan, Juan-Pablo Caceres, Zhiyao Duan

Non-linguistic filler words, such as "uh" or "um", are prevalent in spontaneous speech and serve as indicators for expressing hesitation or uncertainty.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Predicting Global Head-Related Transfer Functions From Scanned Head Geometry Using Deep Learning and Compact Representations

1 code implementation28 Jul 2022 Yuxiang Wang, You Zhang, Zhiyao Duan, Mark Bocko

For the HRTF data, we use truncated spherical harmonic (SH) coefficients to represent the HRTF magnitudes and onsets.

A study of the robustness of raw waveform based speaker embeddings under mismatched conditions

1 code implementation8 Oct 2021 Ge Zhu, Frank Cwitkowitz, Zhiyao Duan

In this paper, we conduct a cross-dataset study on parametric and non-parametric raw-waveform based speaker embeddings through speaker verification experiments.

Speaker Verification

Toward Fully Self-Supervised Multi-Pitch Estimation

1 code implementation23 Feb 2024 Frank Cwitkowitz, Zhiyao Duan

Multi-pitch estimation is a decades-long research problem involving the detection of pitch activity associated with concurrent musical events within multi-instrument mixtures.

Self-Supervised Learning

Generating Talking Face Landmarks from Speech

no code implementations26 Mar 2018 Sefik Emre Eskimez, Ross K. Maddox, Chenliang Xu, Zhiyao Duan

In this paper, we present a system that can generate landmark points of a talking face from an acoustic speech in real time.

Deep Cross-Modal Audio-Visual Generation

no code implementations26 Apr 2017 Lele Chen, Sudhanshu Srivastava, Zhiyao Duan, Chenliang Xu

Being the first to explore this new problem, we compose two new datasets with pairs of images and sounds of musical performances of different instruments.

RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning

no code implementations8 Feb 2020 Nan Jiang, Sheng Jin, Zhiyao Duan, Chang-Shui Zhang

We cast this as a reinforcement learning problem, where the generation agent learns a policy to generate a musical note (action) based on previously generated context (state).

Music Generation reinforcement-learning +1

Themes Informed Audio-visual Correspondence Learning

no code implementations14 Sep 2020 Runze Su, Fei Tao, Xudong Liu, Hao-Ran Wei, Xiaorong Mei, Zhiyao Duan, Lei Yuan, Ji Liu, Yuying Xie

The applications of short-term user-generated video (UGV), such as Snapchat, and Youtube short-term videos, booms recently, raising lots of multimodal machine learning tasks.

Audiovisual Singing Voice Separation

no code implementations1 Jul 2021 Bochen Li, Yuxuan Wang, Zhiyao Duan

Separating a song into vocal and accompaniment components is an active research topic, and recent years witnessed an increased performance from supervised training using deep learning techniques.

Rethinking Audio-visual Synchronization for Active Speaker Detection

no code implementations21 Jun 2022 Abudukelimu Wuerkaixi, You Zhang, Zhiyao Duan, ChangShui Zhang

This clarification of definition is motivated by our extensive experiments, through which we discover that existing ASD methods fail in modeling the audio-visual synchronization and often classify unsynchronized videos as active speaking.

Audio-Visual Synchronization Contrastive Learning

SingNet: A Real-time Singing Voice Beat and Downbeat Tracking System

no code implementations4 Jun 2023 Mojtaba Heydari, Ju-Chiang Wang, Zhiyao Duan

Singing voice beat and downbeat tracking posses several applications in automatic music production, analysis and manipulation.

SynthTab: Leveraging Synthesized Data for Guitar Tablature Transcription

no code implementations16 Sep 2023 Yongyi Zang, Yi Zhong, Frank Cwitkowitz, Zhiyao Duan

Guitar Tablature Transcription (GTT) is an important task with broad applications in music education, composition, and entertainment.

Specificity

Scoring Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription

no code implementations15 Apr 2024 Yujia Yan, Zhiyao Duan

The neural semi-Markov Conditional Random Field (semi-CRF) framework has demonstrated promise for event-based piano transcription.

Cannot find the paper you are looking for? You can Submit a new open access paper.