Search Results for author: Zeyu Jin

Found 16 papers, 7 papers with code

Efficient Spoken Language Recognition via Multilabel Classification

no code implementations2 Jun 2023 Oriol Nieto, Zeyu Jin, Franck Dernoncourt, Justin Salamon

Spoken language recognition (SLR) is the task of automatically identifying the language present in a speech signal.


Music Enhancement via Image Translation and Vocoding

no code implementations28 Apr 2022 Nikhil Kandpal, Oriol Nieto, Zeyu Jin

Consumer-grade music recordings such as those captured by mobile devices typically contain distortions in the form of background noise, reverb, and microphone-induced EQ.

Image-to-Image Translation Translation

Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet

1 code implementation5 Oct 2021 Max Morrison, Zeyu Jin, Nicholas J. Bryan, Juan-Pablo Caceres, Bryan Pardo

Modifying the pitch and timing of an audio signal are fundamental audio editing operations with applications in speech manipulation, audio-visual synchronization, and singing voice editing and synthesis.

Audio-Visual Synchronization

Controllable deep melody generation via hierarchical music structure representation

no code implementations2 Sep 2021 Shuqi Dai, Zeyu Jin, Celso Gomes, Roger B. Dannenberg

Recent advances in deep learning have expanded possibilities to generate music, but generating a customizable full piece of music with consistent long-term structure remains a challenge.

Music Generation

Context-Aware Prosody Correction for Text-Based Speech Editing

no code implementations16 Feb 2021 Max Morrison, Lucas Rencker, Zeyu Jin, Nicholas J. Bryan, Juan-Pablo Caceres, Bryan Pardo

Text-based speech editors expedite the process of editing speech recordings by permitting editing via intuitive cut, copy, and paste operations on a speech transcript.


CDPAM: Contrastive learning for perceptual audio similarity

1 code implementation9 Feb 2021 Pranay Manocha, Zeyu Jin, Richard Zhang, Adam Finkelstein

The DPAM approach of Manocha et al. learns a full-reference metric trained directly on human judgments, and thus correlates well with human perception.

Contrastive Learning Speech Enhancement +1

Disentangled Multidimensional Metric Learning for Music Similarity

no code implementations9 Aug 2020 Jongpil Lee, Nicholas J. Bryan, Justin Salamon, Zeyu Jin, Juhan Nam

For this task, it is typically necessary to define a similarity metric to compare one recording to another.

Metric Learning Specificity +1

Metric Learning vs Classification for Disentangled Music Representation Learning

no code implementations9 Aug 2020 Jongpil Lee, Nicholas J. Bryan, Justin Salamon, Zeyu Jin, Juhan Nam

For this, we (1) outline past work on the relationship between metric learning and classification, (2) extend this relationship to multi-label data by exploring three different learning approaches and their disentangled versions, and (3) evaluate all models on four tasks (training time, similarity retrieval, auto-tagging, and triplet prediction).

Classification Disentanglement +6

Controllable Neural Prosody Synthesis

no code implementations7 Aug 2020 Max Morrison, Zeyu Jin, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore

Speech synthesis has recently seen significant improvements in fidelity, driven by the advent of neural vocoders and neural prosody generators.

Speech Synthesis

HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

1 code implementation10 Jun 2020 Jiaqi Su, Zeyu Jin, Adam Finkelstein

Real-world audio recordings are often degraded by factors such as noise, reverberation, and equalization distortion.

Denoising Speech Dereverberation

F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder

1 code implementation15 Apr 2020 Kaizhi Qian, Zeyu Jin, Mark Hasegawa-Johnson, Gautham J. Mysore

Recently, AutoVC, a conditional autoencoders (CAEs) based method achieved state-of-the-art results by disentangling the speaker identity and speech content using information-constraining bottlenecks, and it achieves zero-shot conversion by swapping in a different speaker's identity embedding to synthesize a new voice.

Style Transfer Voice Conversion

Text-based Editing of Talking-head Video

1 code implementation4 Jun 2019 Ohad Fried, Ayush Tewari, Michael Zollhöfer, Adam Finkelstein, Eli Shechtman, Dan B. Goldman, Kyle Genova, Zeyu Jin, Christian Theobalt, Maneesh Agrawala

To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material.

Face Model Talking Head Generation +2

Perceptually-motivated Environment-specific Speech Enhancement

no code implementations ICASSP 2019 Jiaqi Su, Adam Finkelstein, Zeyu Jin

This paper introduces a deep learning approach to enhance speech recordings made in a specific environment.

Speech Enhancement

Cannot find the paper you are looking for? You can Submit a new open access paper.