no code implementations • 21 Mar 2023 • Tejas Jayashankar, JiLong Wu, Leda Sari, David Kant, Vimal Manohar, Qing He
A singing voice conversion model converts a song in the voice of an arbitrary source singer to the voice of a target singer.
no code implementations • 1 Mar 2023 • Philipp Klumpp, Pooja Chitkara, Leda Sari, Prashant Serai, JiLong Wu, Irina-Elena Veliche, Rongqing Huang, Qing He
In this work, we improve an accent-conversion model (ACM) which transforms native US-English speech into accented pronunciation.
no code implementations • 4 Nov 2022 • Florian L. Kreyssig, Yangyang Shi, Jinxi Guo, Leda Sari, Abdelrahman Mohamed, Philip C. Woodland
Furthermore, this paper proposes a variant of MPPT that allows low-footprint streaming models to be trained effectively by computing the MPPT loss on masked and unmasked frames.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
no code implementations • 18 Nov 2021 • Chunxi Liu, Michael Picheny, Leda Sari, Pooja Chitkara, Alex Xiao, Xiaohui Zhang, Mark Chou, Andres Alvarado, Caner Hazirbas, Yatharth Saraf
This paper presents initial Speech Recognition results on "Casual Conversations" -- a publicly released 846 hour corpus designed to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of metadata, including age, gender, and skin tone.
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+2
3 code implementations • CVPR 2022 • Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik
We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.
no code implementations • NAACL 2021 • Kiran Ramnath, Leda Sari, Mark Hasegawa-Johnson, Chang Yoo
Three sub-tasks are proposed: (1) speech-to-text based, (2) end-to-end, without speech-to-text as an intermediate component, and (3) cross-lingual, in which the question is spoken in a language different from that in which the KG is recorded.
no code implementations • 11 Feb 2021 • Leda Sari, Kritika Singh, Jiatong Zhou, Lorenzo Torresani, Nayan Singhal, Yatharth Saraf
Although speaker verification has conventionally been an audio-only task, some practical applications provide both audio and visual streams of input.
no code implementations • 8 Aug 2020 • Leda Sari, Mark Hasegawa-Johnson
We propose a differentiable approximation to the F-measure and train the network with this objective using standard backpropagation.
1 code implementation • 22 May 2020 • Junzhe Zhu, Mark Hasegawa-Johnson, Leda Sari
In scenarios where multiple speakers talk at the same time, it is important to be able to identify the talkers accurately.
no code implementations • 14 Feb 2020 • Leda Sari, Niko Moritz, Takaaki Hori, Jonathan Le Roux
We propose an unsupervised speaker adaptation method inspired by the neural Turing machine for end-to-end (E2E) automatic speech recognition (ASR).
Automatic Speech Recognition
Automatic Speech Recognition (ASR)
+1