Search Results for author: Zeyu Jin

Found 20 papers, 8 papers with code

Improving Generalization of Speech Separation in Real-World Scenarios: Strategies in Simulation, Optimization, and Evaluation

no code implementations28 Aug 2024 Ke Chen, Jiaqi Su, Taylor Berg-Kirkpatrick, Shlomo Dubnov, Zeyu Jin

In this paper, we present a novel data simulation pipeline that produces diverse training data from a range of acoustic environments and content, and propose new training paradigms to improve quality of a general speech separation model.

Speech Separation

SpeechCraft: A Fine-grained Expressive Speech Dataset with Natural Language Description

1 code implementation24 Aug 2024 Zeyu Jin, Jia Jia, Qixin Wang, Kehan Li, Shuoyi Zhou, Songtao Zhou, Xiaoyu Qin, Zhiyong Wu

To tackle this challenge, we propose an automatic speech annotation system for expressiveness interpretation that annotates in-the-wild speech clips with expressive and vivid human language descriptions.

Descriptive Speech Synthesis +1

VDGD: Mitigating LVLM Hallucinations in Cognitive Prompts by Bridging the Visual Perception Gap

no code implementations24 May 2024 Sreyan Ghosh, Chandra Kiran Reddy Evuru, Sonal Kumar, Utkarsh Tyagi, Oriol Nieto, Zeyu Jin, Dinesh Manocha

From our analysis, we show that: (1) The community's efforts have been primarily targeted towards reducing hallucinations related to visual recognition (VR) prompts (e. g., prompts that only require describing the image), thereby ignoring hallucinations for cognitive prompts (e. g., prompts that require additional skills like reasoning on contents of the image).

Hallucination

A Closer Look at the Limitations of Instruction Tuning

no code implementations3 Feb 2024 Sreyan Ghosh, Chandra Kiran Reddy Evuru, Sonal Kumar, Ramaneswaran S, Deepali Aneja, Zeyu Jin, Ramani Duraiswami, Dinesh Manocha

Our findings reveal that responses generated solely from pre-trained knowledge consistently outperform responses by models that learn any form of new knowledge from IT on open-source datasets.

Hallucination

Efficient Spoken Language Recognition via Multilabel Classification

no code implementations2 Jun 2023 Oriol Nieto, Zeyu Jin, Franck Dernoncourt, Justin Salamon

Spoken language recognition (SLR) is the task of automatically identifying the language present in a speech signal.

Classification

Music Enhancement via Image Translation and Vocoding

no code implementations28 Apr 2022 Nikhil Kandpal, Oriol Nieto, Zeyu Jin

Consumer-grade music recordings such as those captured by mobile devices typically contain distortions in the form of background noise, reverb, and microphone-induced EQ.

Image-to-Image Translation Translation

Neural Pitch-Shifting and Time-Stretching with Controllable LPCNet

1 code implementation5 Oct 2021 Max Morrison, Zeyu Jin, Nicholas J. Bryan, Juan-Pablo Caceres, Bryan Pardo

Modifying the pitch and timing of an audio signal are fundamental audio editing operations with applications in speech manipulation, audio-visual synchronization, and singing voice editing and synthesis.

Audio-Visual Synchronization

Controllable deep melody generation via hierarchical music structure representation

no code implementations2 Sep 2021 Shuqi Dai, Zeyu Jin, Celso Gomes, Roger B. Dannenberg

Recent advances in deep learning have expanded possibilities to generate music, but generating a customizable full piece of music with consistent long-term structure remains a challenge.

Music Generation

Context-Aware Prosody Correction for Text-Based Speech Editing

no code implementations16 Feb 2021 Max Morrison, Lucas Rencker, Zeyu Jin, Nicholas J. Bryan, Juan-Pablo Caceres, Bryan Pardo

Text-based speech editors expedite the process of editing speech recordings by permitting editing via intuitive cut, copy, and paste operations on a speech transcript.

Denoising

CDPAM: Contrastive learning for perceptual audio similarity

1 code implementation9 Feb 2021 Pranay Manocha, Zeyu Jin, Richard Zhang, Adam Finkelstein

The DPAM approach of Manocha et al. learns a full-reference metric trained directly on human judgments, and thus correlates well with human perception.

Contrastive Learning Speech Enhancement +2

Disentangled Multidimensional Metric Learning for Music Similarity

no code implementations9 Aug 2020 Jongpil Lee, Nicholas J. Bryan, Justin Salamon, Zeyu Jin, Juhan Nam

For this task, it is typically necessary to define a similarity metric to compare one recording to another.

Metric Learning Specificity +1

Metric Learning vs Classification for Disentangled Music Representation Learning

no code implementations9 Aug 2020 Jongpil Lee, Nicholas J. Bryan, Justin Salamon, Zeyu Jin, Juhan Nam

For this, we (1) outline past work on the relationship between metric learning and classification, (2) extend this relationship to multi-label data by exploring three different learning approaches and their disentangled versions, and (3) evaluate all models on four tasks (training time, similarity retrieval, auto-tagging, and triplet prediction).

Classification Disentanglement +7

Controllable Neural Prosody Synthesis

no code implementations7 Aug 2020 Max Morrison, Zeyu Jin, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore

Speech synthesis has recently seen significant improvements in fidelity, driven by the advent of neural vocoders and neural prosody generators.

Speech Synthesis

HiFi-GAN: High-Fidelity Denoising and Dereverberation Based on Speech Deep Features in Adversarial Networks

1 code implementation10 Jun 2020 Jiaqi Su, Zeyu Jin, Adam Finkelstein

Real-world audio recordings are often degraded by factors such as noise, reverberation, and equalization distortion.

Denoising Speech Dereverberation

F0-consistent many-to-many non-parallel voice conversion via conditional autoencoder

1 code implementation15 Apr 2020 Kaizhi Qian, Zeyu Jin, Mark Hasegawa-Johnson, Gautham J. Mysore

Recently, AutoVC, a conditional autoencoders (CAEs) based method achieved state-of-the-art results by disentangling the speaker identity and speech content using information-constraining bottlenecks, and it achieves zero-shot conversion by swapping in a different speaker's identity embedding to synthesize a new voice.

Style Transfer Voice Conversion

Text-based Editing of Talking-head Video

1 code implementation4 Jun 2019 Ohad Fried, Ayush Tewari, Michael Zollhöfer, Adam Finkelstein, Eli Shechtman, Dan B. Goldman, Kyle Genova, Zeyu Jin, Christian Theobalt, Maneesh Agrawala

To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material.

Face Model Sentence +3

Perceptually-motivated Environment-specific Speech Enhancement

no code implementations ICASSP 2019 Jiaqi Su, Adam Finkelstein, Zeyu Jin

This paper introduces a deep learning approach to enhance speech recordings made in a specific environment.

Speech Enhancement

Cannot find the paper you are looking for? You can Submit a new open access paper.