Search Results for author: Leda Sari

Found 10 papers, 2 papers with code

Self-Supervised Representations for Singing Voice Conversion

no code implementations21 Mar 2023 Tejas Jayashankar, JiLong Wu, Leda Sari, David Kant, Vimal Manohar, Qing He

A singing voice conversion model converts a song in the voice of an arbitrary source singer to the voice of a target singer.

Disentanglement Voice Conversion

Biased Self-supervised learning for ASR

no code implementations4 Nov 2022 Florian L. Kreyssig, Yangyang Shi, Jinxi Guo, Leda Sari, Abdelrahman Mohamed, Philip C. Woodland

Furthermore, this paper proposes a variant of MPPT that allows low-footprint streaming models to be trained effectively by computing the MPPT loss on masked and unmasked frames.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Towards Measuring Fairness in Speech Recognition: Casual Conversations Dataset Transcriptions

no code implementations18 Nov 2021 Chunxi Liu, Michael Picheny, Leda Sari, Pooja Chitkara, Alex Xiao, Xiaohui Zhang, Mark Chou, Andres Alvarado, Caner Hazirbas, Yatharth Saraf

This paper presents initial Speech Recognition results on "Casual Conversations" -- a publicly released 846 hour corpus designed to help researchers evaluate their computer vision and audio models for accuracy across a diverse set of metadata, including age, gender, and skin tone.

Automatic Speech Recognition Automatic Speech Recognition (ASR) +2

Ego4D: Around the World in 3,000 Hours of Egocentric Video

3 code implementations CVPR 2022 Kristen Grauman, Andrew Westbury, Eugene Byrne, Zachary Chavis, Antonino Furnari, Rohit Girdhar, Jackson Hamburger, Hao Jiang, Miao Liu, Xingyu Liu, Miguel Martin, Tushar Nagarajan, Ilija Radosavovic, Santhosh Kumar Ramakrishnan, Fiona Ryan, Jayant Sharma, Michael Wray, Mengmeng Xu, Eric Zhongcong Xu, Chen Zhao, Siddhant Bansal, Dhruv Batra, Vincent Cartillier, Sean Crane, Tien Do, Morrie Doulaty, Akshay Erapalli, Christoph Feichtenhofer, Adriano Fragomeni, Qichen Fu, Abrham Gebreselasie, Cristina Gonzalez, James Hillis, Xuhua Huang, Yifei HUANG, Wenqi Jia, Weslie Khoo, Jachym Kolar, Satwik Kottur, Anurag Kumar, Federico Landini, Chao Li, Yanghao Li, Zhenqiang Li, Karttikeya Mangalam, Raghava Modhugu, Jonathan Munro, Tullie Murrell, Takumi Nishiyasu, Will Price, Paola Ruiz Puentes, Merey Ramazanova, Leda Sari, Kiran Somasundaram, Audrey Southerland, Yusuke Sugano, Ruijie Tao, Minh Vo, Yuchen Wang, Xindi Wu, Takuma Yagi, Ziwei Zhao, Yunyi Zhu, Pablo Arbelaez, David Crandall, Dima Damen, Giovanni Maria Farinella, Christian Fuegen, Bernard Ghanem, Vamsi Krishna Ithapu, C. V. Jawahar, Hanbyul Joo, Kris Kitani, Haizhou Li, Richard Newcombe, Aude Oliva, Hyun Soo Park, James M. Rehg, Yoichi Sato, Jianbo Shi, Mike Zheng Shou, Antonio Torralba, Lorenzo Torresani, Mingfei Yan, Jitendra Malik

We introduce Ego4D, a massive-scale egocentric video dataset and benchmark suite.

De-identification Ethics

Worldly Wise (WoW) - Cross-Lingual Knowledge Fusion for Fact-based Visual Spoken-Question Answering

no code implementations NAACL 2021 Kiran Ramnath, Leda Sari, Mark Hasegawa-Johnson, Chang Yoo

Three sub-tasks are proposed: (1) speech-to-text based, (2) end-to-end, without speech-to-text as an intermediate component, and (3) cross-lingual, in which the question is spoken in a language different from that in which the KG is recorded.

Knowledge Graphs Question Answering +1

A Multi-View Approach To Audio-Visual Speaker Verification

no code implementations11 Feb 2021 Leda Sari, Kritika Singh, Jiatong Zhou, Lorenzo Torresani, Nayan Singhal, Yatharth Saraf

Although speaker verification has conventionally been an audio-only task, some practical applications provide both audio and visual streams of input.

Speaker Verification

Deep F-measure Maximization for End-to-End Speech Understanding

no code implementations8 Aug 2020 Leda Sari, Mark Hasegawa-Johnson

We propose a differentiable approximation to the F-measure and train the network with this objective using standard backpropagation.

Fairness Intent Detection +1

Identify Speakers in Cocktail Parties with End-to-End Attention

1 code implementation22 May 2020 Junzhe Zhu, Mark Hasegawa-Johnson, Leda Sari

In scenarios where multiple speakers talk at the same time, it is important to be able to identify the talkers accurately.

Speaker Identification Speech Separation

Unsupervised Speaker Adaptation using Attention-based Speaker Memory for End-to-End ASR

no code implementations14 Feb 2020 Leda Sari, Niko Moritz, Takaaki Hori, Jonathan Le Roux

We propose an unsupervised speaker adaptation method inspired by the neural Turing machine for end-to-end (E2E) automatic speech recognition (ASR).

Automatic Speech Recognition Automatic Speech Recognition (ASR) +1

Cannot find the paper you are looking for? You can Submit a new open access paper.