1 code implementation • 20 Feb 2023 • Jaesung Huh, Andrew Brown, Jee-weon Jung, Joon Son Chung, Arsha Nagrani, Daniel Garcia-Romero, Andrew Zisserman
This paper summarises the findings from the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22), which was held in conjunction with INTERSPEECH 2022.
1 code implementation • 1 Feb 2023 • Jaesung Huh, Jacob Chalk, Evangelos Kazakos, Dima Damen, Andrew Zisserman
We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos.
no code implementations • 26 Oct 2022 • Jee-weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesung Huh, Andrew Brown, Youngki Kwon, Shinji Watanabe, Joon Son Chung
First, the evaluation is not straightforward because the features required for better performance differ between speaker verification and diarisation.
1 code implementation • 1 Nov 2021 • Evangelos Kazakos, Jaesung Huh, Arsha Nagrani, Andrew Zisserman, Dima Damen
We capitalise on the action's temporal context and propose a method that learns to attend to surrounding actions in order to improve recognition performance.
no code implementations • 12 Dec 2020 • Arsha Nagrani, Joon Son Chung, Jaesung Huh, Andrew Brown, Ernesto Coto, Weidi Xie, Mitchell McLaren, Douglas A Reynolds, Andrew Zisserman
We held the second installment of the VoxCeleb Speaker Recognition Challenge in conjunction with Interspeech 2020.
no code implementations • 23 Jul 2020 • Jaesung Huh, Hee Soo Heo, Jingu Kang, Shinji Watanabe, Joon Son Chung
Since the augmentation simulates the acoustic characteristics, training the network to be invariant to augmentation also encourages the network to be invariant to the channel information in general.
no code implementations • 2 Jul 2020 • Joon Son Chung, Jaesung Huh, Arsha Nagrani, Triantafyllos Afouras, Andrew Zisserman
Finally, we use this pipeline to create a large-scale diarisation dataset called VoxConverse, collected from 'in the wild' videos, which we will release publicly to the research community.
no code implementations • 10 Feb 2020 • Jaesung Huh, Egil Martinsson, Adrian Kim, Jung-Woo Ha
Musical onset detection can be formulated as a time-to-event (TTE) or time-since-event (TSE) prediction task by defining music as a sequence of onset events.
1 code implementation • 24 Oct 2019 • Joon Son Chung, Jaesung Huh, Seongkyu Mun
Research in speaker recognition has recently seen significant progress due to the application of neural network models and the availability of new large-scale datasets.
7 code implementations • ICLR 2019 • Hyeong-Seok Choi, Jang-Hyun Kim, Jaesung Huh, Adrian Kim, Jung-Woo Ha, Kyogu Lee
Most deep learning-based models for speech enhancement have mainly focused on estimating the magnitude of spectrogram while reusing the phase from noisy speech for reconstruction.
no code implementations • 27 Sep 2018 • Egil Martinsson, Adrian Kim, Jaesung Huh, Jaegul Choo, Jung-Woo Ha
Predicting the time to the next event is an important task in various domains.