Search Results for author: Jonghwan Mun

Found 17 papers, 8 papers with code

MarioQA: Answering Questions by Watching Gameplay Videos

no code implementations • ICCV 2017 • Jonghwan Mun, Paul Hongsuck Seo, Ilchae Jung, Bohyung Han

To address this objective, we automatically generate a customized synthetic VideoQA dataset using {\em Super Mario Bros.} gameplay videos so that it contains events with different levels of reasoning complexity.

Question Answering Video Question Answering

Paper
Add Code

Text-guided Attention Model for Image Captioning

1 code implementation • 12 Dec 2016 • Jonghwan Mun, Minsu Cho, Bohyung Han

Visual attention plays an important role to understand images and demonstrates its effectiveness in generating natural language descriptions of images.

Image Captioning

Paper
Code

Regularizing Deep Neural Networks by Noise: Its Interpretation and Optimization

no code implementations • NeurIPS 2017 • Hyeonwoo Noh, Tackgeun You, Jonghwan Mun, Bohyung Han

Overfitting is one of the most critical challenges in deep neural networks, and there are various types of regularization methods to improve generalization performance.

Paper
Add Code

Transfer Learning via Unsupervised Task Discovery for Visual Question Answering

1 code implementation • CVPR 2019 • Hyeonwoo Noh, Tae-hoon Kim, Jonghwan Mun, Bohyung Han

Specifically, we employ linguistic knowledge sources such as structured lexical database (e. g. WordNet) and visual descriptions for unsupervised task discovery, and transfer a learned task conditional visual classifier as an answering unit in a visual question answering model.

Question Answering Transfer Learning +1

Paper
Code

Learning to Specialize with Knowledge Distillation for Visual Question Answering

no code implementations • NeurIPS 2018 • Jonghwan Mun, Kimin Lee, Jinwoo Shin, Bohyung Han

The proposed framework is model-agnostic and applicable to any tasks other than VQA, e. g., image classification with a large number of labels but few per-class examples, which is known to be difficult under existing MCL schemes.

General Classification General Knowledge +5

Paper
Add Code

Streamlined Dense Video Captioning

1 code implementation • CVPR 2019 • Jonghwan Mun, Linjie Yang, Zhou Ren, Ning Xu, Bohyung Han

Dense video captioning is an extremely challenging task since accurate and coherent description of events in a video requires holistic understanding of video contents as well as contextual reasoning of individual events.

Dense Video Captioning

Paper
Code

Towards Oracle Knowledge Distillation with Neural Architecture Search

no code implementations • 29 Nov 2019 • Minsoo Kang, Jonghwan Mun, Bohyung Han

We present a novel framework of knowledge distillation that is capable of learning powerful and efficient student models from ensemble teacher networks.

Image Classification Knowledge Distillation +1

Paper
Add Code

Local-Global Video-Text Interactions for Temporal Grounding

1 code implementation • CVPR 2020 • Jonghwan Mun, Minsu Cho, Bohyung Han

This paper addresses the problem of text-to-video temporal grounding, which aims to identify the time interval in a video semantically relevant to a text query.

126

Paper
Code

Boundary-aware Pre-training for Video Scene Segmentation

no code implementations • 29 Sep 2021 • Jonghwan Mun, Minchul Shin, Gunsoo Han, Sangho Lee, Seongsu Ha, Joonseok Lee, Eun-Sol Kim

Inspired from this, we tackle video scene segmentation, which is a task of temporally localizing scene boundaries in a video, with a self-supervised learning framework where we mainly focus on designing effective pretext tasks.

Scene Segmentation Self-Supervised Learning

Paper
Add Code

Winning the ICCV'2021 VALUE Challenge: Task-aware Ensemble and Transfer Learning with Visual Concepts

no code implementations • 13 Oct 2021 • Minchul Shin, Jonghwan Mun, Kyoung-Woon On, Woo-Young Kang, Gunsoo Han, Eun-Sol Kim

The VALUE (Video-And-Language Understanding Evaluation) benchmark is newly introduced to evaluate and analyze multi-modal representation learning algorithms on three video-and-language tasks: Retrieval, QA, and Captioning.

Model Optimization Representation Learning +2

Paper
Add Code

Boundary-aware Self-supervised Learning for Video Scene Segmentation

1 code implementation • 14 Jan 2022 • Jonghwan Mun, Minchul Shin, Gunsoo Han, Sangho Lee, Seongsu Ha, Joonseok Lee, Eun-Sol Kim

Scene Segmentation Self-Supervised Learning

107

Paper
Code

MSTR: Multi-Scale Transformer for End-to-End Human-Object Interaction Detection

no code implementations • CVPR 2022 • Bumsoo Kim, Jonghwan Mun, Kyoung-Woon On, Minchul Shin, Junhyun Lee, Eun-Sol Kim

Human-Object Interaction (HOI) detection is the task of identifying a set of <human, object, interaction> triplets from an image.

Human-Object Interaction Detection

Paper
Add Code

Learning to Generate Text-grounded Mask for Open-world Semantic Segmentation from Only Image-Text Pairs

1 code implementation • CVPR 2023 • Junbum Cha, Jonghwan Mun, Byungseok Roh

Existing open-world segmentation methods have shown impressive advances by employing contrastive learning (CL) to learn diverse visual concepts and transferring the learned image-level understanding to the segmentation task.

Ranked #2 on Semantic Segmentation on CC3M-TagMask

Contrastive Learning Open Vocabulary Semantic Segmentation +4

Paper
Code

Noise-aware Learning from Web-crawled Image-Text Data for Image Captioning

1 code implementation • ICCV 2023 • Wooyoung Kang, Jonghwan Mun, Sungjun Lee, Byungseok Roh

Image captioning is one of the straightforward tasks that can take advantage of large-scale web-crawled data which provides rich knowledge about the visual world for a captioning model.

Image Captioning Image Retrieval +1

Paper
Code

NICE: CVPR 2023 Challenge on Zero-shot Image Captioning

no code implementations • 5 Sep 2023 • TaeHoon Kim, Pyunghwan Ahn, Sangyun Kim, Sihaeng Lee, Mark Marsden, Alessandra Sala, Seung Hwan Kim, Bohyung Han, Kyoung Mu Lee, Honglak Lee, Kyounghoon Bae, Xiangyu Wu, Yi Gao, Hailiang Zhang, Yang Yang, Weili Guo, Jianfeng Lu, Youngtaek Oh, Jae Won Cho, Dong-Jin Kim, In So Kweon, Junmo Kim, Wooyoung Kang, Won Young Jhoo, Byungseok Roh, Jonghwan Mun, Solgil Oh, Kenan Emir Ak, Gwang-Gook Lee, Yan Xu, Mingwei Shen, Kyomin Hwang, Wonsik Shin, Kamin Lee, Wonhark Park, Dongkwan Lee, Nojun Kwak, Yujin Wang, Yimu Wang, Tiancheng Gu, Xingchang Lv, Mingmao Sun

In this report, we introduce NICE (New frontiers for zero-shot Image Captioning Evaluation) project and share the results and outcomes of 2023 challenge.

Fairness Image Captioning

Paper
Add Code

Learning Pseudo-Labeler beyond Noun Concepts for Open-Vocabulary Object Detection

no code implementations • 4 Dec 2023 • Sunghun Kang, Junbum Cha, Jonghwan Mun, Byungseok Roh, Chang D. Yoo

Specifically, the proposed method aims to learn arbitrary image-to-text mapping for pseudo-labeling of arbitrary concepts, named Pseudo-Labeling for Arbitrary Concepts (PLAC).

object-detection Open Vocabulary Object Detection +2

Paper
Add Code

Honeybee: Locality-enhanced Projector for Multimodal LLM

1 code implementation • 11 Dec 2023 • Junbum Cha, Wooyoung Kang, Jonghwan Mun, Byungseok Roh

In Multimodal Large Language Models (MLLMs), a visual projector plays a crucial role in bridging pre-trained vision encoders with LLMs, enabling profound visual understanding while harnessing the LLMs' robust capabilities.

Ranked #1 on Science Question Answering on ScienceQA (using extra training data)

Science Question Answering

351

Paper
Code

Cannot find the paper you are looking for? You can Submit a new open access paper.