1 code implementation • 15 Apr 2024 • Jungin Park, Jiyoung Lee, Kwanghoon Sohn
This paper introduces VLAP, a novel approach that bridges pretrained vision models and large language models (LLMs) to make frozen LLMs understand the visual world.
1 code implementation • ICCV 2023 • Jinhyun Jang, Jungin Park, Jin Kim, Hyeongjun Kwon, Kwanghoon Sohn
Recent DETR-based video grounding models have made the model directly predict moment timestamps without any hand-crafted components, such as a pre-defined proposal or non-maximum suppression, by learning moment queries.
no code implementations • CVPR 2023 • Minsu Kim, Seungryong Kim, Jungin Park, Seongheon Park, Kwanghoon Sohn
Modern data augmentation using a mixture-based technique can regularize the models from overfitting to the training data in various computer vision applications, but a proper data augmentation technique tailored for the part-based Visible-Infrared person Re-IDentification (VI-ReID) models remains unexplored.
1 code implementation • CVPR 2023 • Jungin Park, Jiyoung Lee, Kwanghoon Sohn
In this paper, we efficiently transfer the surpassing representation power of the vision foundation models, such as ViT and Swin, for video understanding with only a few trainable parameters.
Ranked #1 on Action Classification on Diving-48
1 code implementation • 8 Nov 2022 • Tuan N. Tang, Jungin Park, Kwonyoung Kim, Kwanghoon Sohn
In addition, the evaluation for Online Detection of Action Start (ODAS) demonstrates the effectiveness and robustness of our method in the online setting.
no code implementations • 24 Oct 2022 • Dahye Kim, Jungin Park, Jiyoung Lee, Seongheon Park, Kwanghoon Sohn
Given an untrimmed video and a language query depicting a specific temporal moment in the video, video grounding aims to localize the time interval by understanding the text and video simultaneously.
no code implementations • 27 Jul 2022 • Kwonyoung Kim, Jungin Park, Jiyoung Lee, Dongbo Min, Kwanghoon Sohn
To mitigate this issue, we propose to incorporate an auxiliary point-selective network into a meta-learning framework, called PointFix, to provide a robust initialization of stereo models for online stereo adaptation.
no code implementations • CVPR 2022 • Jungin Park, Jiyoung Lee, Ig-Jae Kim, Kwanghoon Sohn
This paper presents Probabilistic Video Contrastive Learning, a self-supervised representation learning method that bridges contrastive learning with probabilistic representation.
1 code implementation • CVPR 2022 • Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min, Kwanghoon Sohn
The rise of deep neural networks has led to several breakthroughs for semantic segmentation.
no code implementations • 31 Aug 2021 • Jin Kim, Jiyoung Lee, Jungin Park, Dongbo Min, Kwanghoon Sohn
Domain generalization aims to learn a prediction model on multi-domain source data such that the model can generalize to a target domain with unknown statistics.
no code implementations • CVPR 2021 • Jungin Park, Jiyoung Lee, Kwanghoon Sohn
As a result, our method can learn the question conditioned visual representations attributed to appearance and motion that show powerful capability for video question answering.
1 code implementation • 15 Dec 2020 • Minsu Kim, Sunghun Joung, Seungryong Kim, Jungin Park, Ig-Jae Kim, Kwanghoon Sohn
Existing techniques to adapt semantic segmentation networks across the source and target domains within deep convolutional neural networks (CNNs) deal with all the samples from the two domains in a global or category-aware manner.
no code implementations • ECCV 2020 • Jungin Park, Jiyoung Lee, Ig-Jae Kim, Kwanghoon Sohn
The goal of video summarization is to select keyframes that are visually diverse and can represent a whole story of an input video.
1 code implementation • ICCV 2019 • Jiyoung Lee, Seungryong Kim, Sunok Kim, Jungin Park, Kwanghoon Sohn
We present deep networks for context-aware emotion recognition, called CAER-Net, that exploit not only human facial expression but also context information in a joint and boosting manner.
Ranked #1 on Emotion Recognition in Context on CAER-Dynamic