Search Results for author: Joon Son Chung

Found 35 papers, 8 papers with code

Self-supervised curriculum learning for speaker verification

no code implementations28 Mar 2022 Hee-Soo Heo, Jee-weon Jung, Jingu Kang, Youngki Kwon, You Jin Kim, Bong-Jin Lee, Joon Son Chung

Self-supervised learning is one of the emerging approaches to machine learning today, and has been successfully applied to vision, speech and natural processing tasks.

Self-Supervised Learning Speaker Recognition +1

Pushing the limits of raw waveform speaker recognition

no code implementations16 Mar 2022 Jee-weon Jung, You Jin Kim, Hee-Soo Heo, Bong-Jin Lee, Youngki Kwon, Joon Son Chung

Our best model achieves an equal error rate of 0. 89%, which is competitive with the state-of-the-art models based on handcrafted features, and outperforms the best model based on raw waveform inputs by a large margin.

Self-Supervised Learning Speaker Recognition +1

Spell my name: keyword boosted speech recognition

no code implementations6 Oct 2021 Namkyu Jung, Geonmin Kim, Joon Son Chung

Recognition of uncommon words such as names and technical terminology is important to understanding conversations in context.

Automatic Speech Recognition Machine Translation +1

Look Who's Talking: Active Speaker Detection in the Wild

1 code implementation17 Aug 2021 You Jin Kim, Hee-Soo Heo, Soyeon Choe, Soo-Whan Chung, Yoohwan Kwon, Bong-Jin Lee, Youngki Kwon, Joon Son Chung

Face tracks are extracted from the videos and active segments are annotated based on the timestamps of VoxConverse in a semi-automatic way.

Graph Attention Networks for Speaker Verification

no code implementations22 Oct 2020 Jee-weon Jung, Hee-Soo Heo, Ha-Jin Yu, Joon Son Chung

The proposed framework inputs segment-wise speaker embeddings from an enrollment and a test utterance and directly outputs a similarity score.

Graph Attention Speaker Verification

Augmentation adversarial training for self-supervised speaker recognition

no code implementations23 Jul 2020 Jaesung Huh, Hee Soo Heo, Jingu Kang, Shinji Watanabe, Joon Son Chung

Since the augmentation simulates the acoustic characteristics, training the network to be invariant to augmentation also encourages the network to be invariant to the channel information in general.

Contrastive Learning Speaker Recognition

BSL-1K: Scaling up co-articulated sign language recognition using mouthing cues

1 code implementation ECCV 2020 Samuel Albanie, Gül Varol, Liliane Momeni, Triantafyllos Afouras, Joon Son Chung, Neil Fox, Andrew Zisserman

Recent progress in fine-grained gesture and action classification, and machine translation, point to the possibility of automated sign language recognition becoming a reality.

Action Classification Keyword Spotting +2

Spot the conversation: speaker diarisation in the wild

no code implementations2 Jul 2020 Joon Son Chung, Jaesung Huh, Arsha Nagrani, Triantafyllos Afouras, Andrew Zisserman

Finally, we use this pipeline to create a large-scale diarisation dataset called VoxConverse, collected from 'in the wild' videos, which we will release publicly to the research community.

Speaker Verification

FaceFilter: Audio-visual speech separation using still images

no code implementations14 May 2020 Soo-Whan Chung, Soyeon Choe, Joon Son Chung, Hong-Goo Kang

The objective of this paper is to separate a target speaker's speech from a mixture of two speakers using a deep audio-visual speech separation network.

Speech Separation

Disentangled Speech Embeddings using Cross-modal Self-supervision

no code implementations20 Feb 2020 Arsha Nagrani, Joon Son Chung, Samuel Albanie, Andrew Zisserman

The objective of this paper is to learn representations of speaker identity without access to manually annotated data.

Self-Supervised Learning Speaker Recognition

VoxSRC 2019: The first VoxCeleb Speaker Recognition Challenge

no code implementations5 Dec 2019 Joon Son Chung, Arsha Nagrani, Ernesto Coto, Weidi Xie, Mitchell McLaren, Douglas A. Reynolds, Andrew Zisserman

The VoxCeleb Speaker Recognition Challenge 2019 aimed to assess how well current speaker recognition technology is able to identify speakers in unconstrained or `in the wild' data.

Speaker Recognition

ASR is all you need: cross-modal distillation for lip reading

no code implementations28 Nov 2019 Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman

The goal of this work is to train strong models for visual speech recognition without requiring human annotated ground truth data.

Ranked #10 on Lipreading on LRS2 (using extra training data)

Automatic Speech Recognition Frame +3

Delving into VoxCeleb: environment invariant speaker recognition

no code implementations24 Oct 2019 Joon Son Chung, Jaesung Huh, Seongkyu Mun

Research in speaker recognition has recently seen significant progress due to the application of neural network models and the availability of new large-scale datasets.

Speaker Identification Speaker Recognition

My lips are concealed: Audio-visual speech enhancement through obstructions

no code implementations11 Jul 2019 Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman

To this end we introduce a deep audio-visual speech enhancement network that is able to separate a speaker's voice by conditioning on both the speaker's lip movements and/or a representation of their voice.

Speech Enhancement

Who said that?: Audio-visual speaker diarisation of real-world meetings

no code implementations24 Jun 2019 Joon Son Chung, Bong-Jin Lee, Icksang Han

The goal of this work is to determine 'who spoke when' in real-world meetings.

Utterance-level Aggregation For Speaker Recognition In The Wild

8 code implementations26 Feb 2019 Weidi Xie, Arsha Nagrani, Joon Son Chung, Andrew Zisserman

The objective of this paper is speaker recognition "in the wild"-where utterances may be of variable length and also contain irrelevant signals.

Frame Speaker Recognition +1

Perfect match: Improved cross-modal embeddings for audio-visual synchronisation

no code implementations21 Sep 2018 Soo-Whan Chung, Joon Son Chung, Hong-Goo Kang

This paper proposes a new strategy for learning powerful cross-modal embeddings for audio-to-video synchronization.

Cross-Modal Retrieval Video Synchronization +1

Deep Lip Reading: a comparison of models and an online application

no code implementations15 Jun 2018 Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman

The goal of this paper is to develop state-of-the-art models for lip reading -- visual speech recognition.

Lip Reading Visual Speech Recognition

VoxCeleb2: Deep Speaker Recognition

2 code implementations14 Jun 2018 Joon Son Chung, Arsha Nagrani, Andrew Zisserman

The objective of this paper is speaker recognition under noisy and unconstrained conditions.

 Ranked #1 on Speaker Verification on VoxCeleb2 (using extra training data)

Speaker Recognition Speaker Verification

The Conversation: Deep Audio-Visual Speech Enhancement

no code implementations11 Apr 2018 Triantafyllos Afouras, Joon Son Chung, Andrew Zisserman

Our goal is to isolate individual speakers from multi-talker simultaneous speech in videos.

Speech Enhancement

VoxCeleb: a large-scale speaker identification dataset

8 code implementations Interspeech 2018 Arsha Nagrani, Joon Son Chung, Andrew Zisserman

Our second contribution is to apply and compare various state of the art speaker identification techniques on our dataset to establish baseline performance.

Sound

You said that?

1 code implementation8 May 2017 Joon Son Chung, Amir Jamaludin, Andrew Zisserman

To achieve this we propose an encoder-decoder CNN model that uses a joint embedding of the face and audio to generate synthesised talking face video frames.

Unconstrained Lip-synchronization

Lip Reading Sentences in the Wild

no code implementations CVPR 2017 Joon Son Chung, Andrew Senior, Oriol Vinyals, Andrew Zisserman

The goal of this work is to recognise phrases and sentences being spoken by a talking face, with or without the audio.

Ranked #3 on Lipreading on GRID corpus (mixed-speech) (using extra training data)

Lipreading Lip Reading +1

Signs in time: Encoding human motion as a temporal image

no code implementations6 Aug 2016 Joon Son Chung, Andrew Zisserman

The goal of this work is to recognise and localise short temporal signals in image time series, where strong supervision is not available for training.

Time Series

Cannot find the paper you are looking for? You can Submit a new open access paper.