1 code implementation • 27 Jul 2024 • Juhan Cha, Minseok Joo, Jihwan Park, Sanghyeok Lee, Injae Kim, Hyunwoo J. Kim
Additionally, existing fusion methods overlook the detrimental impact of sensor noise induced by environmental changes, on detection performance.
Ranked #15 on 3D Object Detection on nuScenes
1 code implementation • 23 Jul 2024 • Sojin Lee, Dogyun Park, Inho Kong, Hyunwoo J. Kim
Recent studies on inverse problems have proposed posterior samplers that leverage the pre-trained diffusion models as powerful priors.
Ranked #1 on Image Super-Resolution on ImageNet (using extra training data)
1 code implementation • CVPR 2024 • Jooyeon Kim, Eulrang Cho, Sehyung Kim, Hyunwoo J. Kim
Specifically, RALF consists of two modules: Retrieval Augmented Losses (RAL) and Retrieval-Augmented visual Features (RAF).
Ranked #10 on Open Vocabulary Object Detection on MSCOCO (using extra training data)
1 code implementation • CVPR 2024 • Jinyoung Park, Juyeon Ko, Hyunwoo J. Kim
Recently, prompt learning approaches have been explored to efficiently and effectively adapt the vision-language models to a variety of downstream tasks.
Ranked #2 on Prompt Engineering on UCF101
1 code implementation • CVPR 2024 • Jongha Kim, Jihwan Park, Jinyoung Park, Jinyoung Kim, Sehyung Kim, Hyunwoo J. Kim
Groupwise Query Specialization trains a specialized query by dividing queries and relations into disjoint groups and directing a query in a specific query group solely toward relations in the corresponding relation group.
Ranked #1 on Scene Graph Generation on Visual Genome
1 code implementation • CVPR 2024 • Joonmyung Choi, Sanghyeok Lee, Jaewon Chu, Minhyuk Choi, Hyunwoo J. Kim
To tackle these issues, we propose training free token merging for lightweight video Transformer (vid-TLDR) that aims to enhance the efficiency of video Transformers by merging the background tokens without additional training.
Ranked #2 on Video Retrieval on SSv2-template retrieval (using extra training data)
1 code implementation • CVPR 2024 • Sanghyeok Lee, Joonmyung Choi, Hyunwoo J. Kim
Here, we argue that token fusion needs to consider diverse relations between tokens to minimize information loss.
Ranked #1 on Efficient ViTs on ImageNet-1K (With LV-ViT-S)
1 code implementation • 26 Feb 2024 • Juyeon Ko, Inho Kong, Dogyun Park, Hyunwoo J. Kim
This facilitates the generation of an image close to a clean image, enabling robust generation.
1 code implementation • 23 Jan 2024 • Dogyun Park, Sihyeon Kim, Sojin Lee, Hyunwoo J. Kim
Arguably, this architecture limits the expressive power of generative models and results in low-quality INR generation.
Ranked #2 on Video Generation on Sky Time-lapse
no code implementations • ECCV 2020 • Bumsoo Kim, Taeho Choi, Jaewoo Kang, Hyunwoo J. Kim
This is a major bottleneck in HOI detection inference time.
no code implementations • 16 Nov 2023 • Jinyoung Park, Ameen Patel, Omar Zia Khan, Hyunwoo J. Kim, Joo-Kyung Kim
To deal with them, we propose a GE-Reasoning method, which directs LLMs to generate proper sub-questions and corresponding answers.
1 code implementation • 7 Nov 2023 • Injae Kim, Minhyuk Choi, Hyunwoo J. Kim
Neural Radiance Field (NeRF) has enabled novel view synthesis with high fidelity given images and camera poses.
1 code implementation • NeurIPS 2023 • Hyeong Kyu Choi, Seunghun Lee, Jaewon Chu, Hyunwoo J. Kim
Multi-hop Knowledge Graph Question Answering (KGQA) is a task that involves retrieving nodes from a knowledge graph (KG) to answer natural language questions.
1 code implementation • 24 Oct 2023 • Dohwan Ko, Ji Soo Lee, Wooyoung Kang, Byungseok Roh, Hyunwoo J. Kim
We observe that the LLMs provide effective priors in exploiting $\textit{linguistic shortcuts}$ for temporal and causal reasoning in Video Question Answering (VideoQA).
Ranked #1 on Video Question Answering on TVQA
1 code implementation • ICCV 2023 • Eulrang Cho, Jooyeon Kim, Hyunwoo J. Kim
Pre-trained vision-language models (VLMs) have shown impressive performance on various downstream tasks by utilizing knowledge learned from large data.
1 code implementation • ICCV 2023 • Dongjun Lee, Seokwon Song, Jihee Suh, Joonmyung Choi, Sanghyeok Lee, Hyunwoo J. Kim
RPO leverages masked attention to prevent the internal representation shift in the pre-trained model.
Ranked #7 on Prompt Engineering on Caltech-101
1 code implementation • ICCV 2023 • Sihyeon Kim, Minseok Joo, Jaewon Lee, Juyeon Ko, Juhan Cha, Hyunwoo J. Kim
In this paper, we highlight the importance of part deformation consistency and propose a semantic-aware implicit template learning framework to enable semantically plausible deformation.
no code implementations • 23 Aug 2023 • Injae Kim, Jongha Kim, Joonmyung Choi, Hyunwoo J. Kim
However, those methods do not consider whether a concept is visually relevant or not, which is an important factor in computing meaningful concept scores.
1 code implementation • ICCV 2023 • Dohwan Ko, Ji Soo Lee, Miso Choi, Jaewon Chu, Jihwan Park, Hyunwoo J. Kim
We hence propose a new benchmark, Open-vocabulary Video Question Answering (OVQA), to measure the generalizability of VideoQA models by considering rare and unseen answers.
Ranked #8 on Visual Question Answering (VQA) on MSRVTT-QA
1 code implementation • CVPR 2023 • Jinyoung Park, Sanghyeok Lee, Sihyeon Kim, Yunyang Xiong, Hyunwoo J. Kim
In this paper, we present a Self-Positioning point-based Transformer (SPoTr), which is designed to capture both local and global shape contexts with reduced complexity.
Ranked #3 on 3D Part Segmentation on ShapeNet-Part
1 code implementation • CVPR 2023 • Dohwan Ko, Joonmyung Choi, Hyeong Kyu Choi, Kyoung-Woon On, Byungseok Roh, Hyunwoo J. Kim
Therefore, we propose MEta Loss TRansformer (MELTR), a plug-in module that automatically and non-linearly combines various loss functions to aid learning the target task via auxiliary learning.
Ranked #2 on Video Captioning on YouCook2
1 code implementation • 20 Mar 2023 • Minkyu Jeon, Hyeonjin Park, Hyunwoo J. Kim, Michael Morley, Hyunghoon Cho
While prior works have explored image de-identification strategies based on synthetic averaging of images in other domains (e. g. facial images), existing techniques face difficulty in preserving both privacy and clinical utility in retinal images, as we demonstrate in our work.
no code implementations • 5 Mar 2023 • Jaewon Lee, Injae Kim, Hwan Heo, Hyunwoo J. Kim
We present a learning framework for reconstructing neural scene representations from a small number of unconstrained tourist photos.
no code implementations • 3 Feb 2023 • Hwan Heo, Taekyung Kim, Jiyoung Lee, Jaewon Lee, Soohyun Kim, Hyunwoo J. Kim, Jin-Hwa Kim
Multi-resolution hash encoding has recently been proposed to reduce the computational cost of neural renderings, such as NeRF.
no code implementations • 2 Feb 2023 • Hwan Heo, Youngjin Oh, Jaewon Lee, Hyunwoo J. Kim
Recent studies have proven that DNNs, unlike human vision, tend to exploit texture information rather than shape.
1 code implementation • 2 Dec 2022 • Jinyoung Park, Hyeong Kyu Choi, Juyeon Ko, Hyeonjin Park, Ji-Hoon Kim, Jisu Jeong, KyungMin Kim, Hyunwoo J. Kim
To address these issues, we propose Question Answering Transformer (QAT), which is designed to jointly reason over language and graphs with respect to entity relations in a unified manner.
1 code implementation • 15 Oct 2022 • Byeongkeun Ahn, Chiyoon Kim, Youngjoon Hong, Hyunwoo J. Kim
Normalizing flows model probability distributions by learning invertible transformations that transfer a simple distribution into complex distributions.
1 code implementation • 14 Oct 2022 • Hyeong Kyu Choi, Joonmyung Choi, Hyunwoo J. Kim
To this end, we propose TokenMixup, an efficient attention-guided token-level data augmentation method that aims to maximize the saliency of a mixed set of tokens.
Ranked #72 on Image Classification on CIFAR-10
1 code implementation • 13 Oct 2022 • Sanghyeok Lee, Minkyu Jeon, Injae Kim, Yunyang Xiong, Hyunwoo J. Kim
Mixup is a simple and widely-used data augmentation technique that has proven effective in alleviating the problems of overfitting and data scarcity.
Ranked #41 on 3D Part Segmentation on ShapeNet-Part
no code implementations • 29 Jun 2022 • Jinyoung Park, Seongjun Yun, Hyeonjin Park, Jaewoo Kang, Jisu Jeong, Kyung-Min Kim, Jung-Woo Ha, Hyunwoo J. Kim
Transformer-based models have recently shown success in representation learning on graph-structured data beyond natural language processing and computer vision.
1 code implementation • NeurIPS 2021 • Seongjun Yun, Seoyoon Kim, Junhyun Lee, Jaewoo Kang, Hyunwoo J. Kim
Graph Neural Networks (GNNs) have been widely applied to various fields for learning over graph-structured data.
1 code implementation • CVPR 2022 • Jihwan Park, Seungjun Lee, Hwan Heo, Hyeong Kyu Choi, Hyunwoo J. Kim
Motivated by various inference paths for HOI detection, we propose cross-path consistency learning (CPC), which is a novel end-to-end learning strategy to improve HOI detection for transformers by leveraging augmented decoding paths.
Ranked #1 on Human-Object Interaction Detection on V-COCO (MAP metric)
1 code implementation • CVPR 2022 • Dohwan Ko, Joonmyung Choi, Juyeon Ko, Shinyeong Noh, Kyoung-Woon On, Eun-Sol Kim, Hyunwoo J. Kim
In this paper, we propose a novel multi-modal self-supervised framework Video-Text Temporally Weak Alignment-based Contrastive Learning (VT-TWINS) to capture significant information from noisy and weakly correlated data using a variant of Dynamic Time Warping (DTW).
no code implementations • NeurIPS 2021 • Hyeonjin Park, Seunghun Lee, Sihyeon Kim, Jinyoung Park, Jisu Jeong, Kyung-Min Kim, Jung-Woo Ha, Hyunwoo J. Kim
We also propose a simple and effective semi-supervised learning strategy with generated samples from MH-Aug. Our extensive experiments demonstrate that MH-Aug can generate a sequence of samples according to the target distribution to significantly improve the performance of GNNs.
4 code implementations • 5 Jan 2022 • Chongkeun Paik, Hyunwoo J. Kim
In the second approach, although DeepSORT only processes a quarter of all frames due to hardware and time limitations, our model with DeepSORT (42. 9%) outperforms FairMOT (71. 4%) in terms of recall.
1 code implementation • 29 Dec 2021 • Jinyoung Park, Sungdong Yoo, Jihwan Park, Hyunwoo J. Kim
To address the two common problems of graph convolution, in this paper, we propose Deformable Graph Convolutional Networks (Deformable GCNs) that adaptively perform convolution in multiple latent spaces and capture short/long-range dependencies between nodes.
Ranked #3 on Node Classification on Non-Homophilic (Heterophilic) Graphs on Cornell (48%/32%/20% fixed splits)
Node Classification on Non-Homophilic (Heterophilic) Graphs Representation Learning
1 code implementation • ICCV 2021 • Sihyeon Kim, Sanghyeok Lee, Dasol Hwang, Jaewon Lee, Seong Jae Hwang, Hyunwoo J. Kim
Although data augmentation is a standard approach to compensate for the scarcity of data, it has been less explored in the point cloud literature.
Ranked #11 on Point Cloud Classification on PointCloud-C
1 code implementation • 11 Jun 2021 • Seongjun Yun, Minbyul Jeong, Sungdong Yoo, Seunghun Lee, Sean S. Yi, Raehyun Kim, Jaewoo Kang, Hyunwoo J. Kim
Despite the success of GNNs, most existing GNNs are designed to learn node representations on the fixed and homogeneous graphs.
1 code implementation • CVPR 2021 • Bumsoo Kim, Junhyun Lee, Jaewoo Kang, Eun-Sol Kim, Hyunwoo J. Kim
Human-Object Interaction (HOI) detection is a task of identifying "a set of interactions" in an image, which involves the i) localization of the subject (i. e., humans) and target (i. e., objects) of interaction, and ii) the classification of the interaction labels.
Ranked #16 on Human-Object Interaction Detection on V-COCO
1 code implementation • 1 Mar 2021 • Dasol Hwang, Jinyoung Park, Sunyoung Kwon, Kyung-Min Kim, Jung-Woo Ha, Hyunwoo J. Kim
Our method is learning to learn a primary task with various auxiliary tasks to improve generalization performance.
1 code implementation • ECCV 2020 • Byungjoo Kim, Bryce Chudomelka, Jinyoung Park, Jaewoo Kang, Youngjoon Hong, Hyunwoo J. Kim
Motivated by the SSP property and a generalized Runge-Kutta method, we propose Strong Stability Preserving networks (SSP networks) which improve robustness against adversarial attacks.
1 code implementation • NeurIPS 2020 • Dasol Hwang, Jinyoung Park, Sunyoung Kwon, Kyung-Min Kim, Jung-Woo Ha, Hyunwoo J. Kim
Our proposed method is learning to learn a primary task by predicting meta-paths as auxiliary tasks.
no code implementations • 29 Nov 2019 • Wonwoong Cho, Kangyeol Kim, Eungyeup Kim, Hyunwoo J. Kim, Jaegul Choo
Disentangling content and style information of an image has played an important role in recent success in image translation.
1 code implementation • NeurIPS 2019 • Seongjun Yun, Minbyul Jeong, Raehyun Kim, Jaewoo Kang, Hyunwoo J. Kim
In this paper, we propose Graph Transformer Networks (GTNs) that are capable of generating new graph structures, which involve identifying useful connections between unconnected nodes on the original graph, while learning effective node representation on the new graphs in an end-to-end fashion.
no code implementations • 7 Apr 2019 • Yunyang Xiong, Hyunwoo J. Kim, Varsha Hedau
It boosts the representational power by modeling, in a high dimensional space, interdependency of channels between a depthwise convolution layer and a projection layer in the ANTBlocks.
1 code implementation • ECCV 2018 • Zihang Meng, Nagesh Adluru, Hyunwoo J. Kim, Glenn Fung, Vikas Singh
A sizable body of work on relative attributes provides compelling evidence that relating pairs of images along a continuum of strength pertaining to a visual attribute yields significant improvements in a wide variety of tasks in vision.
no code implementations • CVPR 2018 • Seong Jae Hwang, Sathya N. Ravi, Zirui Tao, Hyunwoo J. Kim, Maxwell D. Collins, Vikas Singh
Visual relationships provide higher-level information of objects and their relations in an image â this enables a semantic understanding of the scene and helps downstream applications.
no code implementations • 19 Apr 2018 • Seong Jae Hwang, Ronak Mehta, Hyunwoo J. Kim, Vikas Singh
There has recently been a concerted effort to derive mechanisms in vision and machine learning systems to offer uncertainty estimates of the predictions they make.
no code implementations • NeurIPS 2017 • Jinseok Nam, Eneldo Loza Mencía, Hyunwoo J. Kim, Johannes Fürnkranz
Multi-label classification is the task of predicting a set of labels for a given input instance.
no code implementations • 20 Nov 2017 • Ronak Mehta, Hyunwoo J. Kim, Shulei Wang, Sterling C. Johnson, Ming Yuan, Vikas Singh
Recent results in coupled or temporal graphical models offer schemes for estimating the relationship structure between features when the data come from related (but distinct) longitudinal sources.
no code implementations • CVPR 2017 • Hyunwoo J. Kim, Nagesh Adluru, Heemanshu Suri, Baba C. Vemuri, Sterling C. Johnson, Vikas Singh
Statistical machine learning models that operate on manifold-valued data are being extensively studied in vision, motivated by applications in activity recognition, feature tracking and medical imaging.
no code implementations • CVPR 2016 • Won Hwa Kim, Hyunwoo J. Kim, Nagesh Adluru, Vikas Singh
A major goal of imaging studies such as the (ongoing) Human Connectome Project (HCP) is to characterize the structural network map of the human brain and identify its associations with covariates such as genotype, risk factors, and so on that correspond to an individual.
no code implementations • ICCV 2015 • Hyunwoo J. Kim, Nagesh Adluru, Monami Banerjee, Baba C. Vemuri, Vikas Singh
Probability density functions (PDFs) are fundamental "objects" in mathematics with numerous applications in computer vision, machine learning and medical imaging.
no code implementations • CVPR 2014 • Hyunwoo J. Kim, Nagesh Adluru, Maxwell D. Collins, Moo. K. Chung, Barbara B. Bendlin, Sterling C. Johnson, Richard J. Davidson, Vikas Singh
Linear regression is a parametric model which is ubiquitous in scientific analysis.