no code implementations • 2 Jun 2023 • Oriol Nieto, Zeyu Jin, Franck Dernoncourt, Justin Salamon
Spoken language recognition (SLR) is the task of automatically identifying the language present in a speech signal.
no code implementations • 27 Jun 2022 • Pranay Manocha, Zeyu Jin, Adam Finkelstein
Many audio processing tasks require perceptual assessment.
no code implementations • 28 Apr 2022 • Nikhil Kandpal, Oriol Nieto, Zeyu Jin
Consumer-grade music recordings such as those captured by mobile devices typically contain distortions in the form of background noise, reverb, and microphone-induced EQ.
3 code implementations • 6 Mar 2022 • Joseph Turian, Jordie Shier, Humair Raj Khan, Bhiksha Raj, Björn W. Schuller, Christian J. Steinmetz, Colin Malloy, George Tzanetakis, Gissel Velarde, Kirk McNally, Max Henry, Nicolas Pinto, Camille Noufi, Christian Clough, Dorien Herremans, Eduardo Fonseca, Jesse Engel, Justin Salamon, Philippe Esling, Pranay Manocha, Shinji Watanabe, Zeyu Jin, Yonatan Bisk
The aim of the HEAR benchmark is to develop a general-purpose audio representation that provides a strong basis for learning in a wide variety of tasks and scenarios.
1 code implementation • 5 Oct 2021 • Max Morrison, Zeyu Jin, Nicholas J. Bryan, Juan-Pablo Caceres, Bryan Pardo
Modifying the pitch and timing of an audio signal are fundamental audio editing operations with applications in speech manipulation, audio-visual synchronization, and singing voice editing and synthesis.
no code implementations • 2 Sep 2021 • Shuqi Dai, Zeyu Jin, Celso Gomes, Roger B. Dannenberg
Recent advances in deep learning have expanded possibilities to generate music, but generating a customizable full piece of music with consistent long-term structure remains a challenge.
no code implementations • 16 Feb 2021 • Max Morrison, Lucas Rencker, Zeyu Jin, Nicholas J. Bryan, Juan-Pablo Caceres, Bryan Pardo
Text-based speech editors expedite the process of editing speech recordings by permitting editing via intuitive cut, copy, and paste operations on a speech transcript.
1 code implementation • 9 Feb 2021 • Pranay Manocha, Zeyu Jin, Richard Zhang, Adam Finkelstein
The DPAM approach of Manocha et al. learns a full-reference metric trained directly on human judgments, and thus correlates well with human perception.
no code implementations • 9 Aug 2020 • Jongpil Lee, Nicholas J. Bryan, Justin Salamon, Zeyu Jin, Juhan Nam
For this task, it is typically necessary to define a similarity metric to compare one recording to another.
no code implementations • 9 Aug 2020 • Jongpil Lee, Nicholas J. Bryan, Justin Salamon, Zeyu Jin, Juhan Nam
For this, we (1) outline past work on the relationship between metric learning and classification, (2) extend this relationship to multi-label data by exploring three different learning approaches and their disentangled versions, and (3) evaluate all models on four tasks (training time, similarity retrieval, auto-tagging, and triplet prediction).
no code implementations • 7 Aug 2020 • Max Morrison, Zeyu Jin, Justin Salamon, Nicholas J. Bryan, Gautham J. Mysore
Speech synthesis has recently seen significant improvements in fidelity, driven by the advent of neural vocoders and neural prosody generators.
1 code implementation • 10 Jun 2020 • Jiaqi Su, Zeyu Jin, Adam Finkelstein
Real-world audio recordings are often degraded by factors such as noise, reverberation, and equalization distortion.
1 code implementation • 15 Apr 2020 • Kaizhi Qian, Zeyu Jin, Mark Hasegawa-Johnson, Gautham J. Mysore
Recently, AutoVC, a conditional autoencoders (CAEs) based method achieved state-of-the-art results by disentangling the speaker identity and speech content using information-constraining bottlenecks, and it achieves zero-shot conversion by swapping in a different speaker's identity embedding to synthesize a new voice.
1 code implementation • 13 Jan 2020 • Pranay Manocha, Adam Finkelstein, Zeyu Jin, Nicholas J. Bryan, Richard Zhang, Gautham J. Mysore
Assessment of many audio processing tasks relies on subjective evaluation which is time-consuming and expensive.
1 code implementation • 4 Jun 2019 • Ohad Fried, Ayush Tewari, Michael Zollhöfer, Adam Finkelstein, Eli Shechtman, Dan B. Goldman, Kyle Genova, Zeyu Jin, Christian Theobalt, Maneesh Agrawala
To edit a video, the user has to only edit the transcript, and an optimization strategy then chooses segments of the input corpus as base material.
no code implementations • ICASSP 2019 • Jiaqi Su, Adam Finkelstein, Zeyu Jin
This paper introduces a deep learning approach to enhance speech recordings made in a specific environment.