Search Results for author: Zhiyao Duan

Found 25 papers, 19 papers with code

Predicting Global Head-Related Transfer Functions From Scanned Head Geometry Using Deep Learning and Compact Representations

1 code implementation28 Jul 2022 Yuxiang Wang, You Zhang, Zhiyao Duan, Mark Bocko

For the HRTF data, we use truncated spherical harmonic (SH) coefficients to represent the HRTF magnitudes and onsets.

Rethinking Audio-visual Synchronization for Active Speaker Detection

no code implementations21 Jun 2022 Abudukelimu Wuerkaixi, You Zhang, Zhiyao Duan, ChangShui Zhang

This clarification of definition is motivated by our extensive experiments, through which we discover that existing ASD methods fail in modeling the audio-visual synchronization and often classify unsynchronized videos as active speaking.

Audio-Visual Synchronization Contrastive Learning

Music Source Separation with Generative Flow

1 code implementation19 Apr 2022 Ge Zhu, Jordan Darefsky, Fei Jiang, Anton Selitskiy, Zhiyao Duan

Fully-supervised models for source separation are trained on parallel mixture-source data and are currently state-of-the-art.

Music Source Separation

A Data-Driven Methodology for Considering Feasibility and Pairwise Likelihood in Deep Learning Based Guitar Tablature Transcription Systems

1 code implementation17 Apr 2022 Frank Cwitkowitz, Jonathan Driedger, Zhiyao Duan

This naturally enforces playability constraints for guitar, and yields tablature which is more consistent with the symbolic data used to estimate pairwise likelihoods.

Information Retrieval Music Information Retrieval

A Probabilistic Fusion Framework for Spoofing Aware Speaker Verification

1 code implementation10 Feb 2022 You Zhang, Ge Zhu, Zhiyao Duan

We further propose fusion strategies for direct inference and fine-tuning to predict the SASV score based on the framework.

Speaker Verification

Skipping the Frame-Level: Event-Based Piano Transcription With Neural Semi-CRFs

1 code implementation NeurIPS 2021 Yujia Yan, Frank Cwitkowitz, Zhiyao Duan

When formulating piano transcription in this way, we eliminate the need to rely on disjoint frame-level estimates for different stages of a note event.

Multi-Task Learning

A study of the robustness of raw waveform based speaker embeddings under mismatched conditions

1 code implementation8 Oct 2021 Ge Zhu, Frank Cwitkowitz, Zhiyao Duan

In this paper, we conduct a cross-dataset study on parametric and non-parametric raw-waveform based speaker embeddings through speaker verification experiments.

Speaker Verification

Learning Sparse Analytic Filters for Piano Transcription

2 code implementations23 Aug 2021 Frank Cwitkowitz, Mojtaba Heydari, Zhiyao Duan

In this work, several variations of a frontend filterbank learning module are investigated for piano transcription, a challenging low-level music information retrieval task.

Information Retrieval Music Information Retrieval

BeatNet: CRNN and Particle Filtering for Online Joint Beat Downbeat and Meter Tracking

1 code implementation8 Aug 2021 Mojtaba Heydari, Frank Cwitkowitz, Zhiyao Duan

The online estimation of rhythmic information, such as beat positions, downbeat positions, and meter, is critical for many real-time music applications.

Online Beat Tracking Online Downbeat Tracking

UR Channel-Robust Synthetic Speech Detection System for ASVspoof 2021

2 code implementations26 Jul 2021 Xinhui Chen, You Zhang, Ge Zhu, Zhiyao Duan

Different from previous ASVspoof challenges, the LA task this year presents codec and transmission channel variability, while the new task DF presents general audio compression.

Face Swapping Synthetic Speech Detection +1

Audiovisual Singing Voice Separation

no code implementations1 Jul 2021 Bochen Li, Yuxuan Wang, Zhiyao Duan

Separating a song into vocal and accompaniment components is an active research topic, and recent years witnessed an increased performance from supervised training using deep learning techniques.

An Empirical Study on Channel Effects for Synthetic Voice Spoofing Countermeasure Systems

3 code implementations3 Apr 2021 You Zhang, Ge Zhu, Fei Jiang, Zhiyao Duan

Spoofing countermeasure (CM) systems are critical in speaker verification; they aim to discern spoofing attacks from bona fide speech trials.

Data Augmentation Multi-Task Learning +2

When Counterpoint Meets Chinese Folk Melodies

1 code implementation NeurIPS 2020 Nan Jiang, Sheng Jin, Zhiyao Duan, ChangShui Zhang

An interaction reward model is trained on the duets formed from outer parts of Bach chorales to model counterpoint interaction, while a style reward model is trained on monophonic melodies of Chinese folk songs to model melodic patterns.

Don't look back: an online beat tracking method using RNN and enhanced particle filtering

1 code implementation5 Nov 2020 Mojtaba Heydari, Zhiyao Duan

Most preexisting OBT methods either apply some offline approaches to a moving window containing past data to make predictions about future beat positions or must be primed with past data at startup to initialize.

Online Beat Tracking

One-class learning towards generalized voice spoofing detection

2 code implementations27 Oct 2020 You Zhang, Fei Jiang, Zhiyao Duan

Human voices can be used to authenticate the identity of the speaker, but the automatic speaker verification (ASV) systems are vulnerable to voice spoofing attacks, such as impersonation, replay, text-to-speech, and voice conversion.

Speaker Verification Voice Anti-spoofing +1

Y-Vector: Multiscale Waveform Encoder for Speaker Embedding

1 code implementation24 Oct 2020 Ge Zhu, Fei Jiang, Zhiyao Duan

State-of-the-art text-independent speaker verification systems typically use cepstral features or filter bank energies as speech features.

Text-Independent Speaker Verification

Themes Informed Audio-visual Correspondence Learning

no code implementations14 Sep 2020 Runze Su, Fei Tao, Xudong Liu, Hao-Ran Wei, Xiaorong Mei, Zhiyao Duan, Lei Yuan, Ji Liu, Yuying Xie

The applications of short-term user-generated video (UGV), such as Snapchat, and Youtube short-term videos, booms recently, raising lots of multimodal machine learning tasks.

RL-Duet: Online Music Accompaniment Generation Using Deep Reinforcement Learning

no code implementations8 Feb 2020 Nan Jiang, Sheng Jin, Zhiyao Duan, Chang-Shui Zhang

We cast this as a reinforcement learning problem, where the generation agent learns a policy to generate a musical note (action) based on previously generated context (state).

Music Generation reinforcement-learning

Hierarchical Cross-Modal Talking Face Generationwith Dynamic Pixel-Wise Loss

1 code implementation9 May 2019 Lele Chen, Ross K. Maddox, Zhiyao Duan, Chenliang Xu

We devise a cascade GAN approach to generate talking face video, which is robust to different face shapes, view angles, facial characteristics, and noisy audio conditions.

Lip Movements Generation at a Glance

1 code implementation ECCV 2018 Lele Chen, Zhiheng Li, Ross K. Maddox, Zhiyao Duan, Chenliang Xu

In this paper, we consider a task of such: given an arbitrary audio speech and one lip image of arbitrary target identity, generate synthesized lip movements of the target identity saying the speech.

Generating Talking Face Landmarks from Speech

no code implementations26 Mar 2018 Sefik Emre Eskimez, Ross K. Maddox, Chenliang Xu, Zhiyao Duan

In this paper, we present a system that can generate landmark points of a talking face from an acoustic speech in real time.

Deep Cross-Modal Audio-Visual Generation

no code implementations26 Apr 2017 Lele Chen, Sudhanshu Srivastava, Zhiyao Duan, Chenliang Xu

Being the first to explore this new problem, we compose two new datasets with pairs of images and sounds of musical performances of different instruments.

Cannot find the paper you are looking for? You can Submit a new open access paper.