Search Results for author: Jaesung Huh

Found 15 papers, 7 papers with code

Neural Distribution Learning for generalized time-to-event prediction

no code implementations • 27 Sep 2018 • Egil Martinsson, Adrian Kim, Jaesung Huh, Jaegul Choo, Jung-Woo Ha

Predicting the time to the next event is an important task in various domains.

Probabilistic Programming Time-to-Event Prediction

Paper
Add Code

Phase-aware Speech Enhancement with Deep Complex U-Net

7 code implementations • ICLR 2019 • Hyeong-Seok Choi, Jang-Hyun Kim, Jaesung Huh, Adrian Kim, Jung-Woo Ha, Kyogu Lee

Most deep learning-based models for speech enhancement have mainly focused on estimating the magnitude of spectrogram while reusing the phase from noisy speech for reconstruction.

Speech Enhancement valid

309

Paper
Code

Delving into VoxCeleb: environment invariant speaker recognition

1 code implementation • 24 Oct 2019 • Joon Son Chung, Jaesung Huh, Seongkyu Mun

Research in speaker recognition has recently seen significant progress due to the application of neural network models and the availability of new large-scale datasets.

Speaker Identification Speaker Recognition

Paper
Code

Modeling Musical Onset Probabilities via Neural Distribution Learning

no code implementations • 10 Feb 2020 • Jaesung Huh, Egil Martinsson, Adrian Kim, Jung-Woo Ha

Musical onset detection can be formulated as a time-to-event (TTE) or time-since-event (TSE) prediction task by defining music as a sequence of onset events.

Paper
Add Code

Spot the conversation: speaker diarisation in the wild

no code implementations • 2 Jul 2020 • Joon Son Chung, Jaesung Huh, Arsha Nagrani, Triantafyllos Afouras, Andrew Zisserman

Finally, we use this pipeline to create a large-scale diarisation dataset called VoxConverse, collected from 'in the wild' videos, which we will release publicly to the research community.

Speaker Verification

Paper
Add Code

Augmentation adversarial training for self-supervised speaker recognition

no code implementations • 23 Jul 2020 • Jaesung Huh, Hee Soo Heo, Jingu Kang, Shinji Watanabe, Joon Son Chung

Since the augmentation simulates the acoustic characteristics, training the network to be invariant to augmentation also encourages the network to be invariant to the channel information in general.

Contrastive Learning Speaker Recognition

Paper
Add Code

VoxSRC 2020: The Second VoxCeleb Speaker Recognition Challenge

no code implementations • 12 Dec 2020 • Arsha Nagrani, Joon Son Chung, Jaesung Huh, Andrew Brown, Ernesto Coto, Weidi Xie, Mitchell McLaren, Douglas A Reynolds, Andrew Zisserman

We held the second installment of the VoxCeleb Speaker Recognition Challenge in conjunction with Interspeech 2020.

Speaker Recognition

Paper
Add Code

With a Little Help from my Temporal Context: Multimodal Egocentric Action Recognition

1 code implementation • 1 Nov 2021 • Evangelos Kazakos, Jaesung Huh, Arsha Nagrani, Andrew Zisserman, Dima Damen

We capitalise on the action's temporal context and propose a method that learns to attend to surrounding actions in order to improve recognition performance.

Action Recognition Language Modelling

Paper
Code

In search of strong embedding extractors for speaker diarisation

no code implementations • 26 Oct 2022 • Jee-weon Jung, Hee-Soo Heo, Bong-Jin Lee, Jaesung Huh, Andrew Brown, Youngki Kwon, Shinji Watanabe, Joon Son Chung

First, the evaluation is not straightforward because the features required for better performance differ between speaker verification and diarisation.

Data Augmentation Speaker Verification

Paper
Add Code

Disentangled representation learning for multilingual speaker recognition

no code implementations • 1 Nov 2022 • Kihyun Nam, Youkyum Kim, Jaesung Huh, Hee Soo Heo, Jee-weon Jung, Joon Son Chung

The goal of this paper is to learn robust speaker representation for bilingual speaking scenario.

Disentanglement Metric Learning +1

Paper
Add Code

Epic-Sounds: A Large-scale Dataset of Actions That Sound

1 code implementation • 1 Feb 2023 • Jaesung Huh, Jacob Chalk, Evangelos Kazakos, Dima Damen, Andrew Zisserman

We introduce EPIC-SOUNDS, a large-scale dataset of audio annotations capturing temporal extents and class labels within the audio stream of the egocentric videos.

Action Recognition

Paper
Code

VoxSRC 2022: The Fourth VoxCeleb Speaker Recognition Challenge

1 code implementation • 20 Feb 2023 • Jaesung Huh, Andrew Brown, Jee-weon Jung, Joon Son Chung, Arsha Nagrani, Daniel Garcia-Romero, Andrew Zisserman

This paper summarises the findings from the VoxCeleb Speaker Recognition Challenge 2022 (VoxSRC-22), which was held in conjunction with INTERSPEECH 2022.

Speaker Diarization Speaker Recognition +1

Paper
Code

OxfordVGG Submission to the EGO4D AV Transcription Challenge

1 code implementation • 18 Jul 2023 • Jaesung Huh, Max Bain, Andrew Zisserman

This report presents the technical details of our submission on the EGO4D Audio-Visual (AV) Automatic Speech Recognition Challenge 2023 from the OxfordVGG team.

Automatic Speech Recognition speech-recognition +1