Search Results for author: Dahun Kim

Found 28 papers, 13 papers with code

Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers

2 code implementations CVPR 2023 Dahun Kim, Anelia Angelova, Weicheng Kuo

We present Region-aware Open-vocabulary Vision Transformers (RO-ViT) - a contrastive image-text pretraining recipe to bridge the gap between image-level pretraining and open-vocabulary object detection.

Contrastive Learning object-detection +4

Detection-Oriented Image-Text Pretraining for Open-Vocabulary Detection

1 code implementation29 Sep 2023 Dahun Kim, Anelia Angelova, Weicheng Kuo

We present a new open-vocabulary detection approach based on detection-oriented image-text pretraining to bridge the gap between image-level pretraining and open-vocabulary object detection.

Contrastive Learning Object +2

DeepLab2: A TensorFlow Library for Deep Labeling

4 code implementations17 Jun 2021 Mark Weber, Huiyu Wang, Siyuan Qiao, Jun Xie, Maxwell D. Collins, Yukun Zhu, Liangzhe Yuan, Dahun Kim, Qihang Yu, Daniel Cremers, Laura Leal-Taixe, Alan L. Yuille, Florian Schroff, Hartwig Adam, Liang-Chieh Chen

DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a state-of-the-art and easy-to-use TensorFlow codebase for general dense pixel prediction problems in computer vision.

Video Panoptic Segmentation

1 code implementation CVPR 2020 Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon

In this paper, we propose and explore a new video extension of this task, called video panoptic segmentation.

Ranked #7 on Video Panoptic Segmentation on Cityscapes-VPS (using extra training data)

Instance Segmentation Segmentation +5

Learning Open-World Object Proposals without Learning to Classify

3 code implementations15 Aug 2021 Dahun Kim, Tsung-Yi Lin, Anelia Angelova, In So Kweon, Weicheng Kuo

In this paper, we identify that the problem is that the binary classifiers in existing proposal methods tend to overfit to the training categories.

Object object-detection +4

Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering

1 code implementation NeurIPS 2021 Youngjoong Kwon, Dahun Kim, Duygu Ceylan, Henry Fuchs

To tackle this, we propose Neural Human Performer, a novel approach that learns generalizable neural radiance fields based on a parametric human body model for robust performance capture.

Generalizable Novel View Synthesis

Deep Blind Video Decaptioning by Temporal Aggregation and Recurrence

1 code implementation CVPR 2019 Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon

Blind video decaptioning is a problem of automatically removing text overlays and inpainting the occluded parts in videos without any input masks.

Video Denoising Video Inpainting +1

LinkNet: Relational Embedding for Scene Graph

3 code implementations NeurIPS 2018 Sanghyun Woo, Dahun Kim, Donghyeon Cho, In So Kweon

In this paper, we present a method that improves scene graph generation by explicitly modeling inter-dependency among the entire object instances.

Graph Generation Scene Graph Generation

Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation

1 code implementation3 Aug 2023 Minsu Kim, Jeongsoo Choi, Dahun Kim, Yong Man Ro

A single pre-trained model with UTUT can be employed for diverse multilingual speech- and text-related tasks, such as Speech-to-Speech Translation (STS), multilingual Text-to-Speech Synthesis (TTS), and Text-to-Speech Translation (TTST).

Representation Learning Speech-to-Speech Translation +4

Discriminative Feature Learning for Unsupervised Video Summarization

1 code implementation24 Nov 2018 Yunjae Jung, Donghyeon Cho, Dahun Kim, Sanghyun Woo, In So Kweon

The proposed variance loss allows a network to predict output scores for each frame with high discrepancy which enables effective feature learning and significantly improves model performance.

Supervised Video Summarization Unsupervised Video Summarization

Learning Image Representations by Completing Damaged Jigsaw Puzzles

no code implementations6 Feb 2018 Dahun Kim, Donghyeon Cho, Donggeun Yoo, In So Kweon

The recovery of the aforementioned damage pushes the network to obtain robust and general-purpose representations.

Colorization Representation Learning +2

Two-Phase Learning for Weakly Supervised Object Localization

no code implementations ICCV 2017 Dahun Kim, Donghyeon Cho, Donggeun Yoo, In So Kweon

Weakly supervised semantic segmentation and localiza- tion have a problem of focusing only on the most important parts of an image since they use only image-level annota- tions.

Object Segmentation +5

Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles

no code implementations24 Nov 2018 Dahun Kim, Donghyeon Cho, In So Kweon

Self-supervised tasks such as colorization, inpainting and zigsaw puzzle have been utilized for visual representation learning for still images, when the number of labeled images is limited or absent at all.

Colorization Representation Learning +2

The Devil is in the Boundary: Exploiting Boundary Representation for Basis-based Instance Segmentation

no code implementations26 Nov 2020 Myungchul Kim, Sanghyun Woo, Dahun Kim, In So Kweon

In this work, we propose Boundary Basis based Instance Segmentation(B2Inst) to learn a global boundary representation that can complement existing global-mask-based methods that are often lacking high-frequency details.

Instance Segmentation Scene Understanding +2

Learning to Associate Every Segment for Video Panoptic Segmentation

no code implementations CVPR 2021 Sanghyun Woo, Dahun Kim, Joon-Young Lee, In So Kweon

Temporal correspondence - linking pixels or objects across frames - is a fundamental supervisory signal for the video models.

Ranked #6 on Video Panoptic Segmentation on Cityscapes-VPS (using extra training data)

Video Panoptic Segmentation

Video-kMaX: A Simple Unified Approach for Online and Near-Online Video Panoptic Segmentation

no code implementations10 Apr 2023 Inkyu Shin, Dahun Kim, Qihang Yu, Jun Xie, Hong-Seok Kim, Bradley Green, In So Kweon, Kuk-Jin Yoon, Liang-Chieh Chen

The meta architecture of the proposed Video-kMaX consists of two components: within clip segmenter (for clip-level segmentation) and cross-clip associater (for association beyond clips).

Scene Understanding Segmentation +2

Neural Image-based Avatars: Generalizable Radiance Fields for Human Avatar Modeling

no code implementations10 Apr 2023 Youngjoong Kwon, Dahun Kim, Duygu Ceylan, Henry Fuchs

We present a method that enables synthesizing novel views and novel poses of arbitrary human performers from sparse multi-view images.

RECLIP: Resource-efficient CLIP by Training with Small Images

no code implementations12 Apr 2023 Runze Li, Dahun Kim, Bir Bhanu, Weicheng Kuo

We present RECLIP (Resource-efficient CLIP), a simple method that minimizes computational resource footprint for CLIP (Contrastive Language Image Pretraining).

Contrastive Learning Retrieval +3

Contrastive Feature Masking Open-Vocabulary Vision Transformer

no code implementations ICCV 2023 Dahun Kim, Anelia Angelova, Weicheng Kuo

We present Contrastive Feature Masking Vision Transformer (CFM-ViT) - an image-text pretraining methodology that achieves simultaneous learning of image- and region-level representation for open-vocabulary object detection (OVD).

Contrastive Learning object-detection +3

Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

no code implementations9 Nov 2023 AJ Piergiovanni, Isaac Noble, Dahun Kim, Michael S. Ryoo, Victor Gomes, Anelia Angelova

We propose a multimodal model, called Mirasol3B, consisting of an autoregressive component for the time-synchronized modalities (audio and video), and an autoregressive component for the context modalities which are not necessarily aligned in time but are still sequential.

Action Classification Audio Classification +1

Cannot find the paper you are looking for? You can Submit a new open access paper.