no code implementations • CVPR 2024 • Youngjoon Jang, Ji-Hoon Kim, Junseok Ahn, Doyeop Kwak, Hong-Sun Yang, Yoon-Cheol Ju, Il-Hwan Kim, Byeong-Yeol Kim, Joon Son Chung
The goal of this work is to simultaneously generate natural talking faces and speech outputs from text.
2 code implementations • 18 Jan 2024 • Tan Dat Nguyen, Ji-Hoon Kim, Youngjoon Jang, Jaehun Kim, Joon Son Chung
The goal of this paper is to generate realistic audio with a lightweight and fast diffusion-based vocoder, named FreGrad.
no code implementations • 30 Oct 2023 • Suyeon Lee, Chaeyoung Jung, Youngjoon Jang, Jaehun Kim, Joon Son Chung
For an effective fusion of the two modalities for diffusion, we also propose a cross-attention-based feature fusion mechanism.
no code implementations • 21 Sep 2023 • Chaeyoung Jung, Suyeon Lee, Kihyun Nam, Kyeongha Rho, You Jin Kim, Youngjoon Jang, Joon Son Chung
The goal of this work is Active Speaker Detection (ASD), a task to determine whether a person is speaking or not in a series of video frames.
Active Speaker Detection Audio-Visual Active Speaker Detection +1
1 code implementation • 21 Sep 2023 • Junseok Ahn, Youngjoon Jang, Joon Son Chung
The objective of this work is the effective extraction of spatial and dynamic features for Continuous Sign Language Recognition (CSLR).
no code implementations • 6 Apr 2023 • Youngjoon Jang, Kyeongha Rho, Jong-Bin Woo, Hyeongkeun Lee, Jihwan Park, Youshin Lim, Byeong-Yeol Kim, Joon Son Chung
The goal of this paper is to synthesise talking faces with controllable facial motions.
no code implementations • 21 Mar 2023 • Youngjoon Jang, Youngtaek Oh, Jae Won Cho, Myungchul Kim, Dong-Jin Kim, In So Kweon, Joon Son Chung
The goal of this work is to develop self-sufficient framework for Continuous Sign Language Recognition (CSLR) that addresses key issues of sign language recognition.
no code implementations • 1 Nov 2022 • Jaemin Jung, Youkyum Kim, Jihwan Park, Youshin Lim, Byeong-Yeol Kim, Youngjoon Jang, Joon Son Chung
In particular, we make the following contributions: (1) we construct a large-scale keyword dataset with an existing speech corpus and propose a filtering method to remove data that degrade model training; (2) we propose a metric learning-based two-stage training strategy, and demonstrate that the proposed method improves the performance on the user-defined keyword spotting task by enriching their representations; (3) to facilitate the fair comparison in the user-defined KWS field, we propose unified evaluation protocol and metrics.
1 code implementation • 1 Nov 2022 • Youngjoon Jang, Youngtaek Oh, Jae Won Cho, Dong-Jin Kim, Joon Son Chung, In So Kweon
Most existing Continuous Sign Language Recognition (CSLR) benchmarks have fixed backgrounds and are filmed in studios with a static monochromatic background.