Search Results for author: Zhiyao Duan

Found 37 papers, 28 papers with code

One-class learning towards generalized voice spoofing detection

3 code implementations • 27 Oct 2020 • You Zhang, Fei Jiang, Zhiyao Duan

Human voices can be used to authenticate the identity of the speaker, but the automatic speaker verification (ASV) systems are vulnerable to voice spoofing attacks, such as impersonation, replay, text-to-speech, and voice conversion.

Speaker Verification Voice Anti-spoofing +1

294

Paper
Code

Don't look back: an online beat tracking method using RNN and enhanced particle filtering

1 code implementation • 5 Nov 2020 • Mojtaba Heydari, Zhiyao Duan

Most preexisting OBT methods either apply some offline approaches to a moving window containing past data to make predictions about future beat positions or must be primed with past data at startup to initialize.

Online Beat Tracking

282

Paper
Code

BeatNet: CRNN and Particle Filtering for Online Joint Beat Downbeat and Meter Tracking

4 code implementations • 8 Aug 2021 • Mojtaba Heydari, Frank Cwitkowitz, Zhiyao Duan

The online estimation of rhythmic information, such as beat positions, downbeat positions, and meter, is critical for many real-time music applications.

Ranked #1 on Online Beat Tracking on Rock Corpus

Online Beat Tracking Online Downbeat Tracking

282

Paper
Code

Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss

1 code implementation • 9 May 2019 • Lele Chen, Ross K. Maddox, Zhiyao Duan, Chenliang Xu

We devise a cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions.

254

Paper
Code

Audio-Visual Event Localization in Unconstrained Videos

2 code implementations • ECCV 2018 • Yapeng Tian, Jing Shi, Bochen Li, Zhiyao Duan, Chenliang Xu

In this paper, we introduce a novel problem of audio-visual event localization in unconstrained videos.

audio-visual event localization Temporal Localization

158

Paper
Code

Speech Driven Talking Face Generation from a Single Image and an Emotion Condition

1 code implementation • 8 Aug 2020 • Sefik Emre Eskimez, You Zhang, Zhiyao Duan

Visual emotion expression plays an important role in audiovisual speech communication.

Emotion Recognition Talking Face Generation

152

Paper
Code

ControlVC: Zero-Shot Voice Conversion with Time-Varying Controls on Pitch and Speed

1 code implementation • 23 Sep 2022 • Meiying Chen, Zhiyao Duan

In this paper, we propose ControlVC, the first neural voice conversion system that achieves time-varying controls on pitch and speed.

Pitch control Speech Synthesis +1

122

Paper
Code

An Empirical Study on Channel Effects for Synthetic Voice Spoofing Countermeasure Systems

3 code implementations • 3 Apr 2021 • You Zhang, Ge Zhu, Fei Jiang, Zhiyao Duan

Spoofing countermeasure (CM) systems are critical in speaker verification; they aim to discern spoofing attacks from bona fide speech trials.

Data Augmentation Multi-Task Learning +2

Paper
Code

UR Channel-Robust Synthetic Speech Detection System for ASVspoof 2021

2 code implementations • 26 Jul 2021 • Xinhui Chen, You Zhang, Ge Zhu, Zhiyao Duan

Different from previous ASVspoof challenges, the LA task this year presents codec and transmission channel variability, while the new task DF presents general audio compression.

Audio Compression Face Swapping +2

Paper
Code

Skipping the Frame-Level: Event-Based Piano Transcription With Neural Semi-CRFs

1 code implementation • NeurIPS 2021 • Yujia Yan, Frank Cwitkowitz, Zhiyao Duan

When formulating piano transcription in this way, we eliminate the need to rely on disjoint frame-level estimates for different stages of a note event.

Multi-Task Learning

Paper
Code

A Novel 1D State Space for Efficient Music Rhythmic Analysis

1 code implementation • 1 Nov 2021 • Mojtaba Heydari, Matthew McCallum, Andreas Ehmann, Zhiyao Duan

Inferring music time structures has a broad range of applications in music production, processing and analysis.

Ranked #1 on Online Beat Tracking on GTZAN

Inference Optimization Online Downbeat Tracking

Paper
Code

SingFake: Singing Voice Deepfake Detection

1 code implementation • 14 Sep 2023 • Yongyi Zang, You Zhang, Mojtaba Heydari, Zhiyao Duan

These unique properties make singing voice deepfake detection a relevant but significantly different problem from synthetic speech detection.

Face Swapping Singing Voice Synthesis +1

Paper
Code

Lip Movements Generation at a Glance

1 code implementation • ECCV 2018 • Lele Chen, Zhiheng Li, Ross K. Maddox, Zhiyao Duan, Chenliang Xu

In this paper, we consider a task of such: given an arbitrary audio speech and one lip image of arbitrary target identity, generate synthesized lip movements of the target identity saying the speech.

Paper
Code

SAMO: Speaker Attractor Multi-Center One-Class Learning for Voice Anti-Spoofing

1 code implementation • 4 Nov 2022 • Siwen Ding, You Zhang, Zhiyao Duan

Our previous research on one-class learning has improved the generalization ability to unseen attacks by compacting the bona fide speech in the embedding space.

Speaker Verification Speech Synthesis +1

Paper
Code

Y-Vector: Multiscale Waveform Encoder for Speaker Embedding

1 code implementation • 24 Oct 2020 • Ge Zhu, Fei Jiang, Zhiyao Duan

State-of-the-art text-independent speaker verification systems typically use cepstral features or filter bank energies as speech features.

Text-Independent Speaker Verification

Paper
Code

Singing Beat Tracking With Self-supervised Front-end and Linear Transformers

1 code implementation • 31 Aug 2022 • Mojtaba Heydari, Zhiyao Duan

Tracking beats of singing voices without the presence of musical accompaniment can find many applications in music production, automatic song arrangement, and social media interaction.

Paper
Code

Learning Sparse Analytic Filters for Piano Transcription

2 code implementations • 23 Aug 2021 • Frank Cwitkowitz, Mojtaba Heydari, Zhiyao Duan

In this work, several variations of a frontend filterbank learning module are investigated for piano transcription, a challenging low-level music information retrieval task.

Information Retrieval Music Information Retrieval +1

Paper
Code

Music Source Separation with Generative Flow

1 code implementation • 19 Apr 2022 • Ge Zhu, Jordan Darefsky, Fei Jiang, Anton Selitskiy, Zhiyao Duan

Fully-supervised models for source separation are trained on parallel mixture-source data and are currently state-of-the-art.

Music Source Separation

Paper
Code

HRTF Field: Unifying Measured HRTF Magnitude Representation with Neural Fields

2 code implementations • 27 Oct 2022 • You Zhang, Yuxiang Wang, Zhiyao Duan

In this work, we propose to use neural fields, a differentiable representation of functions through neural networks, to model HRTFs with arbitrary spatial sampling schemes.

Paper
Code

Mitigating Cross-Database Differences for Learning Unified HRTF Representation

2 code implementations • 27 Jul 2023 • Yutong Wen, You Zhang, Zhiyao Duan

We further show that these normalized HRTFs can be used to learn a more unified HRTF representation across databases than the prior art.

Paper
Code

A Data-Driven Methodology for Considering Feasibility and Pairwise Likelihood in Deep Learning Based Guitar Tablature Transcription Systems

2 code implementations • 17 Apr 2022 • Frank Cwitkowitz, Jonathan Driedger, Zhiyao Duan

This naturally enforces playability constraints for guitar, and yields tablature which is more consistent with the symbolic data used to estimate pairwise likelihoods.

Information Retrieval Music Information Retrieval +1

Paper
Code

A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification

1 code implementation • 10 Feb 2022 • You Zhang, Ge Zhu, Zhiyao Duan

We further propose fusion strategies for direct inference and fine-tuning to predict the SASV score based on the framework.

Speaker Verification

Paper
Code

When Counterpoint Meets Chinese Folk Melodies

1 code implementation • NeurIPS 2020 • Nan Jiang, Sheng Jin, Zhiyao Duan, ChangShui Zhang

An interaction reward model is trained on the duets formed from outer parts of Bach chorales to model counterpoint interaction, while a style reward model is trained on monophonic melodies of Chinese folk songs to model melodic patterns.

Paper
Code

Transcription free filler word detection with Neural semi-CRFs

1 code implementation • 11 Mar 2023 • Ge Zhu, Yujia Yan, Juan-Pablo Caceres, Zhiyao Duan

Non-linguistic filler words, such as "uh" or "um", are prevalent in spontaneous speech and serve as indicators for expressing hesitation or uncertainty.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Paper
Code

Predicting Global Head-Related Transfer Functions From Scanned Head Geometry Using Deep Learning and Compact Representations

1 code implementation • 28 Jul 2022 • Yuxiang Wang, You Zhang, Zhiyao Duan, Mark Bocko

For the HRTF data, we use truncated spherical harmonic (SH) coefficients to represent the HRTF magnitudes and onsets.

Paper
Code

A study of the robustness of raw waveform based speaker embeddings under mismatched conditions

1 code implementation • 8 Oct 2021 • Ge Zhu, Frank Cwitkowitz, Zhiyao Duan

In this paper, we conduct a cross-dataset study on parametric and non-parametric raw-waveform based speaker embeddings through speaker verification experiments.

Speaker Verification

Paper
Code

Learning Arousal-Valence Representation from Categorical Emotion Labels of Speech

1 code implementation • 24 Nov 2023 • Enting Zhou, You Zhang, Zhiyao Duan

In this work, we propose to learn the AV representation from categorical emotion labels of speech.

Dimensionality Reduction Emotion Classification +3

Paper
Code

Toward Fully Self-Supervised Multi-Pitch Estimation

1 code implementation • 23 Feb 2024 • Frank Cwitkowitz, Zhiyao Duan

Multi-pitch estimation is a decades-long research problem involving the detection of pitch activity associated with concurrent musical events within multi-instrument mixtures.

Self-Supervised Learning

Paper
Code

Generating Talking Face Landmarks from Speech

no code implementations • 26 Mar 2018 • Sefik Emre Eskimez, Ross K. Maddox, Chenliang Xu, Zhiyao Duan

In this paper, we present a system that can generate landmark points of a talking face from an acoustic speech in real time.

Paper
Add Code

Deep Cross-Modal Audio-Visual Generation

no code implementations • 26 Apr 2017 • Lele Chen, Sudhanshu Srivastava, Zhiyao Duan, Chenliang Xu

Being the first to explore this new problem, we compose two new datasets with pairs of images and sounds of musical performances of different instruments.

Paper
Add Code

RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning

no code implementations • 8 Feb 2020 • Nan Jiang, Sheng Jin, Zhiyao Duan, Chang-Shui Zhang

We cast this as a reinforcement learning problem, where the generation agent learns a policy to generate a musical note (action) based on previously generated context (state).

Music Generation reinforcement-learning +1

Paper
Add Code

Themes Informed Audio-visual Correspondence Learning

no code implementations • 14 Sep 2020 • Runze Su, Fei Tao, Xudong Liu, Hao-Ran Wei, Xiaorong Mei, Zhiyao Duan, Lei Yuan, Ji Liu, Yuying Xie

The applications of short-term user-generated video (UGV), such as Snapchat, and Youtube short-term videos, booms recently, raising lots of multimodal machine learning tasks.

Paper
Add Code

Audiovisual Singing Voice Separation

no code implementations • 1 Jul 2021 • Bochen Li, Yuxuan Wang, Zhiyao Duan

Separating a song into vocal and accompaniment components is an active research topic, and recent years witnessed an increased performance from supervised training using deep learning techniques.

Paper
Add Code

Rethinking Audio-visual Synchronization for Active Speaker Detection

no code implementations • 21 Jun 2022 • Abudukelimu Wuerkaixi, You Zhang, Zhiyao Duan, ChangShui Zhang

This clarification of definition is motivated by our extensive experiments, through which we discover that existing ASD methods fail in modeling the audio-visual synchronization and often classify unsynchronized videos as active speaking.

Audio-Visual Synchronization Contrastive Learning

Paper
Add Code

SingNet: A Real-time Singing Voice Beat and Downbeat Tracking System

no code implementations • 4 Jun 2023 • Mojtaba Heydari, Ju-Chiang Wang, Zhiyao Duan

Singing voice beat and downbeat tracking posses several applications in automatic music production, analysis and manipulation.

Paper
Add Code

SynthTab: Leveraging Synthesized Data for Guitar Tablature Transcription

no code implementations • 16 Sep 2023 • Yongyi Zang, Yi Zhong, Frank Cwitkowitz, Zhiyao Duan

Guitar Tablature Transcription (GTT) is an important task with broad applications in music education, composition, and entertainment.

Specificity

Paper
Add Code

Scoring Intervals using Non-Hierarchical Transformer For Automatic Piano Transcription

no code implementations • 15 Apr 2024 • Yujia Yan, Zhiyao Duan

The neural semi-Markov Conditional Random Field (semi-CRF) framework has demonstrated promise for event-based piano transcription.

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.