Search Results for author: Zeyu Jin

Found 12 papers, 4 papers with code

Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet

no code implementations5 Oct 2021 Max Morrison, Zeyu Jin, Nicholas J. Bryan, Juan-Pablo Caceres, Bryan Pardo

Modifying the pitch and timing of an audio signal are fundamental audio editing operations with applications in speech manipulation, audio-visual synchronization, and singing voice editing and synthesis.

Audio-Visual Synchronization

Controllable deep melody generation via hierarchical music structure representation

no code implementations2 Sep 2021 Shuqi Dai, Zeyu Jin, Celso Gomes, Roger B. Dannenberg

Recent advances in deep learning have expanded possibilities to generate music, but generating a customizable full piece of music with consistent long-term structure remains a challenge.

Music Generation

Context-Aware Prosody Correction for Text-Based Speech Editing

no code implementations16 Feb 2021 Max Morrison, Lucas Rencker, Zeyu Jin, Nicholas J. Bryan, Juan-Pablo Caceres, Bryan Pardo

Text-based speech editors expedite the process of editing speech recordings by permitting editing via intuitive cut, copy, and paste operations on a speech transcript.

Denoising speech editing

CDPAM: Contrastive learning for perceptual audio similarity

1 code implementation9 Feb 2021 Pranay Manocha, Zeyu Jin, Richard Zhang, Adam Finkelstein

The DPAM approach of Manocha et al. learns a full-reference metric trained directly on human judgments, and thus correlates well with human perception.

Contrastive Learning Speech Synthesis

Disentangled Multidimensional Metric Learning for Music Similarity

no code implementations9 Aug 2020 Jongpil Lee, Nicholas J. Bryan, Justin Salamon, Zeyu Jin, Juhan Nam

For this task, it is typically necessary to define a similarity metric to compare one recording to another.

Metric Learning Video Editing

Metric Learning vs Classification for Disentangled Music Representation Learning

no code implementations9 Aug 2020 Jongpil Lee, Nicholas J. Bryan, Justin Salamon, Zeyu Jin, Juhan Nam

For this, we (1) outline past work on the relationship between metric learning and classification, (2) extend this relationship to multi-label data by exploring three different learning approaches and their disentangled versions, and (3) evaluate all models on four tasks (training time, similarity retrieval, auto-tagging, and triplet prediction).

Classification General Classification +5

Controllable Neural Prosody Synthesis

no code implementations7 Aug 2020 Max Morrison, Zeyu Jin, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore

Speech synthesis has recently seen significant improvements in fidelity, driven by the advent of neural vocoders and neural prosody generators.

Speech Synthesis

HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

1 code implementation10 Jun 2020 Jiaqi Su, Zeyu Jin, Adam Finkelstein

Real-world audio recordings are often degraded by factors such as noise, reverberation, and equalization distortion.

Denoising Speech Dereverberation

F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder

1 code implementation15 Apr 2020 Kaizhi Qian, Zeyu Jin, Mark Hasegawa-Johnson, Gautham J. Mysore

Recently, AutoVC, a conditional autoencoders (CAEs) based method achieved state-of-the-art results by disentangling the speaker identity and speech content using information-constraining bottlenecks, and it achieves zero-shot conversion by swapping in a different speaker's identity embedding to synthesize a new voice.

Style Transfer Voice Conversion

A Differentiable Perceptual Audio Metric Learned from Just Noticeable Differences

1 code implementation13 Jan 2020 Pranay Manocha, Adam Finkelstein, Zeyu Jin, Nicholas J. Bryan, Richard Zhang, Gautham J. Mysore

Assessment of many audio processing tasks relies on subjective evaluation which is time-consuming and expensive.

Denoising

Text-based Editing of Talking-head Video

no code implementations4 Jun 2019 Ohad Fried, Ayush Tewari, Michael Zollhöfer, Adam Finkelstein, Eli Shechtman, Dan B. Goldman, Kyle Genova, Zeyu Jin, Christian Theobalt, Maneesh Agrawala

To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material.

Face Model Translation +1

Perceptually-motivated Environment-specific Speech Enhancement

no code implementations ICASSP 2019 Jiaqi Su, Adam Finkelstein, Zeyu Jin

This paper introduces a deep learning approach to enhance speech recordings made in a specific environment.

Speech Enhancement

Cannot find the paper you are looking for? You can Submit a new open access paper.