no code implementations • 7 Aug 2024 • Youkyum Kim, Jaemin Jung, Jihwan Park, Byeong-Yeol Kim, Joon Son Chung
This paper proposes a novel user-defined keyword spotting framework that accurately detects audio keywords based on text enrollment.
1 code implementation • 27 Jul 2024 • Juhan Cha, Minseok Joo, Jihwan Park, Sanghyeok Lee, Injae Kim, Hyunwoo J. Kim
Additionally, existing fusion methods overlook the detrimental impact of sensor noise induced by environmental changes, on detection performance.
Ranked #15 on 3D Object Detection on nuScenes
1 code implementation • CVPR 2024 • Jongha Kim, Jihwan Park, Jinyoung Park, Jinyoung Kim, Sehyung Kim, Hyunwoo J. Kim
Groupwise Query Specialization trains a specialized query by dividing queries and relations into disjoint groups and directing a query in a specific query group solely toward relations in the corresponding relation group.
Ranked #1 on Scene Graph Generation on Visual Genome
1 code implementation • ICCV 2023 • Dohwan Ko, Ji Soo Lee, Miso Choi, Jaewon Chu, Jihwan Park, Hyunwoo J. Kim
We hence propose a new benchmark, Open-vocabulary Video Question Answering (OVQA), to measure the generalizability of VideoQA models by considering rare and unseen answers.
Ranked #8 on Visual Question Answering (VQA) on MSRVTT-QA
no code implementations • 6 Apr 2023 • Youngjoon Jang, Kyeongha Rho, Jong-Bin Woo, Hyeongkeun Lee, Jihwan Park, Youshin Lim, Byeong-Yeol Kim, Joon Son Chung
The goal of this paper is to synthesise talking faces with controllable facial motions.
no code implementations • 29 Mar 2023 • Jinseok Park, Hyung Yong Kim, Jihwan Park, Byeong-Yeol Kim, Shukjae Choi, Yunkyu Lim
Language identification (LID) recognizes the language of a spoken utterance automatically.
Automatic Speech Recognition Automatic Speech Recognition (ASR) +4
no code implementations • 1 Nov 2022 • Jaemin Jung, Youkyum Kim, Jihwan Park, Youshin Lim, Byeong-Yeol Kim, Youngjoon Jang, Joon Son Chung
In particular, we make the following contributions: (1) we construct a large-scale keyword dataset with an existing speech corpus and propose a filtering method to remove data that degrade model training; (2) we propose a metric learning-based two-stage training strategy, and demonstrate that the proposed method improves the performance on the user-defined keyword spotting task by enriching their representations; (3) to facilitate the fair comparison in the user-defined KWS field, we propose unified evaluation protocol and metrics.
1 code implementation • CVPR 2022 • Jihwan Park, Seungjun Lee, Hwan Heo, Hyeong Kyu Choi, Hyunwoo J. Kim
Motivated by various inference paths for HOI detection, we propose cross-path consistency learning (CPC), which is a novel end-to-end learning strategy to improve HOI detection for transformers by leveraging augmented decoding paths.
Ranked #1 on Human-Object Interaction Detection on V-COCO (MAP metric)
1 code implementation • 29 Dec 2021 • Jinyoung Park, Sungdong Yoo, Jihwan Park, Hyunwoo J. Kim
To address the two common problems of graph convolution, in this paper, we propose Deformable Graph Convolutional Networks (Deformable GCNs) that adaptively perform convolution in multiple latent spaces and capture short/long-range dependencies between nodes.
Ranked #3 on Node Classification on Non-Homophilic (Heterophilic) Graphs on Cornell (48%/32%/20% fixed splits)
Node Classification on Non-Homophilic (Heterophilic) Graphs Representation Learning