no code implementations • 22 Apr 2024 • Jooeun Kim, Jinri Kim, Kwangeun Yeo, Eungi Kim, Kyoung-Woon On, Jonghwan Mun, Joonseok Lee
Cold-start item recommendation is a long-standing challenge in recommendation systems.
1 code implementation • CVPR 2024 • Junbum Cha, Wooyoung Kang, Jonghwan Mun, Byungseok Roh
In Multimodal Large Language Models (MLLMs), a visual projector plays a crucial role in bridging pre-trained vision encoders with LLMs, enabling profound visual understanding while harnessing the LLMs' robust capabilities.
Ranked #2 on
Science Question Answering
on ScienceQA
(using extra training data)
no code implementations • 4 Dec 2023 • Sunghun Kang, Junbum Cha, Jonghwan Mun, Byungseok Roh, Chang D. Yoo
Specifically, the proposed method aims to learn arbitrary image-to-text mapping for pseudo-labeling of arbitrary concepts, named Pseudo-Labeling for Arbitrary Concepts (PLAC).
no code implementations • 5 Sep 2023 • TaeHoon Kim, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Mark Marsden, Alessandra Sala, Seung Hwan Kim, Bohyung Han, Kyoung Mu Lee, Honglak Lee, Kyounghoon Bae, Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu, Youngtaek Oh, Jae Won Cho, Dong-Jin Kim, In So Kweon, Junmo Kim, Wooyoung Kang, Won Young Jhoo, Byungseok Roh, Jonghwan Mun, Solgil Oh, Kenan Emir Ak, Gwang-Gook Lee, Yan Xu, Mingwei Shen, Kyomin Hwang, Wonsik Shin, Kamin Lee, Wonhark Park, Dongkwan Lee, Nojun Kwak, Yujin Wang, Yimu Wang, Tiancheng Gu, Xingchang Lv, Mingmao Sun
In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge.
1 code implementation • ICCV 2023 • Wooyoung Kang, Jonghwan Mun, Sungjun Lee, Byungseok Roh
Image captioning is one of the straightforward tasks that can take advantage of large-scale web-crawled data which provides rich knowledge about the visual world for a captioning model.
1 code implementation • CVPR 2023 • Junbum Cha, Jonghwan Mun, Byungseok Roh
Existing open-world segmentation methods have shown impressive advances by employing contrastive learning (CL) to learn diverse visual concepts and transferring the learned image-level understanding to the segmentation task.
Ranked #2 on
Semantic Segmentation
on CC3M-TagMask
Contrastive Learning
Open Vocabulary Semantic Segmentation
+4
no code implementations • CVPR 2022 • Bumsoo Kim, Jonghwan Mun, Kyoung-Woon On, Minchul Shin, Junhyun Lee, Eun-Sol Kim
Human-Object Interaction (HOI) detection is the task of identifying a set of <human, object, interaction> triplets from an image.
1 code implementation • 14 Jan 2022 • Jonghwan Mun, Minchul Shin, Gunsoo Han, Sangho Lee, Seongsu Ha, Joonseok Lee, Eun-Sol Kim
Inspired from this, we tackle video scene segmentation, which is a task of temporally localizing scene boundaries in a video, with a self-supervised learning framework where we mainly focus on designing effective pretext tasks.
no code implementations • 13 Oct 2021 • Minchul Shin, Jonghwan Mun, Kyoung-Woon On, Woo-Young Kang, Gunsoo Han, Eun-Sol Kim
The VALUE (Video-And-Language Understanding Evaluation) benchmark is newly introduced to evaluate and analyze multi-modal representation learning algorithms on three video-and-language tasks: Retrieval, QA, and Captioning.
no code implementations • 29 Sep 2021 • Jonghwan Mun, Minchul Shin, Gunsoo Han, Sangho Lee, Seongsu Ha, Joonseok Lee, Eun-Sol Kim
Inspired from this, we tackle video scene segmentation, which is a task of temporally localizing scene boundaries in a video, with a self-supervised learning framework where we mainly focus on designing effective pretext tasks.
1 code implementation • CVPR 2020 • Jonghwan Mun, Minsu Cho, Bohyung Han
This paper addresses the problem of text-to-video temporal grounding, which aims to identify the time interval in a video semantically relevant to a text query.
no code implementations • 29 Nov 2019 • Minsoo Kang, Jonghwan Mun, Bohyung Han
We present a novel framework of knowledge distillation that is capable of learning powerful and efficient student models from ensemble teacher networks.
1 code implementation • CVPR 2019 • Jonghwan Mun, Linjie Yang, Zhou Ren, Ning Xu, Bohyung Han
Dense video captioning is an extremely challenging task since accurate and coherent description of events in a video requires holistic understanding of video contents as well as contextual reasoning of individual events.
no code implementations • NeurIPS 2018 • Jonghwan Mun, Kimin Lee, Jinwoo Shin, Bohyung Han
The proposed framework is model-agnostic and applicable to any tasks other than VQA, e. g., image classification with a large number of labels but few per-class examples, which is known to be difficult under existing MCL schemes.
1 code implementation • CVPR 2019 • Hyeonwoo Noh, Tae-hoon Kim, Jonghwan Mun, Bohyung Han
Specifically, we employ linguistic knowledge sources such as structured lexical database (e. g. WordNet) and visual descriptions for unsupervised task discovery, and transfer a learned task conditional visual classifier as an answering unit in a visual question answering model.
no code implementations • NeurIPS 2017 • Hyeonwoo Noh, Tackgeun You, Jonghwan Mun, Bohyung Han
Overfitting is one of the most critical challenges in deep neural networks, and there are various types of regularization methods to improve generalization performance.
1 code implementation • 12 Dec 2016 • Jonghwan Mun, Minsu Cho, Bohyung Han
Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images.
no code implementations • ICCV 2017 • Jonghwan Mun, Paul Hongsuck Seo, Ilchae Jung, Bohyung Han
To address this objective, we automatically generate a customized synthetic VideoQA dataset using {\em Super Mario Bros.} gameplay videos so that it contains events with different levels of reasoning complexity.