Search Results for author: Dahun Kim

Found 28 papers, 13 papers with code

Region-Aware Pretraining for Open-Vocabulary Object Detection with Vision Transformers

2 code implementations • CVPR 2023 • Dahun Kim, Anelia Angelova, Weicheng Kuo

We present Region-aware Open-vocabulary Vision Transformers (RO-ViT) - a contrastive image-text pretraining recipe to bridge the gap between image-level pretraining and open-vocabulary object detection.

Ranked #5 on Zero-Shot Cross-Modal Retrieval on Flickr30k

Contrastive Learning object-detection +4

32,733

Paper
Code

Detection-Oriented Image-Text Pretraining for Open-Vocabulary Detection

1 code implementation • 29 Sep 2023 • Dahun Kim, Anelia Angelova, Weicheng Kuo

We present a new open-vocabulary detection approach based on detection-oriented image-text pretraining to bridge the gap between image-level pretraining and open-vocabulary object detection.

Ranked #1 on Open Vocabulary Object Detection on LVIS v1.0

Contrastive Learning Object +2

32,733

Paper
Code

DeepLab2: A TensorFlow Library for Deep Labeling

4 code implementations • 17 Jun 2021 • Mark Weber, Huiyu Wang, Siyuan Qiao, Jun Xie, Maxwell D. Collins, Yukun Zhu, Liangzhe Yuan, Dahun Kim, Qihang Yu, Daniel Cremers, Laura Leal-Taixe, Alan L. Yuille, Florian Schroff, Hartwig Adam, Liang-Chieh Chen

DeepLab2 is a TensorFlow library for deep labeling, aiming to provide a state-of-the-art and easy-to-use TensorFlow codebase for general dense pixel prediction problems in computer vision.

982

Paper
Code

Deep Video Inpainting

2 code implementations • CVPR 2019 • Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon

Video inpainting aims to fill spatio-temporal holes with plausible content in a video.

Ranked #7 on Video Inpainting on DAVIS

Image Inpainting Optical Flow Estimation +3

501

Paper
Code

Video Panoptic Segmentation

1 code implementation • CVPR 2020 • Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon

In this paper, we propose and explore a new video extension of this task, called video panoptic segmentation.

Ranked #7 on Video Panoptic Segmentation on Cityscapes-VPS (using extra training data)

Instance Segmentation Segmentation +5

303

Paper
Code

Learning Open-World Object Proposals without Learning to Classify

3 code implementations • 15 Aug 2021 • Dahun Kim, Tsung-Yi Lin, Anelia Angelova, In So Kweon, Weicheng Kuo

In this paper, we identify that the problem is that the binary classifiers in existing proposal methods tend to overfit to the training categories.

Ranked #2 on Open World Object Detection on COCO VOC to non-VOC

Object object-detection +4

188

Paper
Code

Neural Human Performer: Learning Generalizable Radiance Fields for Human Performance Rendering

1 code implementation • NeurIPS 2021 • Youngjoong Kwon, Dahun Kim, Duygu Ceylan, Henry Fuchs

To tackle this, we propose Neural Human Performer, a novel approach that learns generalizable neural radiance fields based on a parametric human body model for robust performance capture.

Ranked #3 on Generalizable Novel View Synthesis on ZJU-MoCap

Generalizable Novel View Synthesis

119

Paper
Code

CMT-DeepLab: Clustering Mask Transformers for Panoptic Segmentation

2 code implementations • CVPR 2022 • Qihang Yu, Huiyu Wang, Dahun Kim, Siyuan Qiao, Maxwell Collins, Yukun Zhu, Hartwig Adam, Alan Yuille, Liang-Chieh Chen

We propose Clustering Mask Transformer (CMT-DeepLab), a transformer-based framework for panoptic segmentation designed around clustering.

Ranked #6 on Panoptic Segmentation on COCO test-dev

Clustering Panoptic Segmentation +1

Paper
Code

Rotationally-Temporally Consistent Novel View Synthesis of Human Performance Video

1 code implementation • ECCV 2020 • Youngjoong Kwon, Stefano Petrangeli, Dahun Kim, Haoliang Wang, Eunbyung Park, Viswanathan Swaminathan, Henry Fuchs

Second, we introduce a novel loss to explicitly enforce consistency across generated views both in space and in time.

Novel View Synthesis

Paper
Code

Deep Blind Video Decaptioning by Temporal Aggregation and Recurrence

1 code implementation • CVPR 2019 • Dahun Kim, Sanghyun Woo, Joon-Young Lee, In So Kweon

Blind video decaptioning is a problem of automatically removing text overlays and inpainting the occluded parts in videos without any input masks.

Video Denoising Video Inpainting +1

Paper
Code

LinkNet: Relational Embedding for Scene Graph

3 code implementations • NeurIPS 2018 • Sanghyun Woo, Dahun Kim, Donghyeon Cho, In So Kweon

In this paper, we present a method that improves scene graph generation by explicitly modeling inter-dependency among the entire object instances.

Graph Generation Scene Graph Generation

Paper
Code

Many-to-Many Spoken Language Translation via Unified Speech and Text Representation Learning with Unit-to-Unit Translation

1 code implementation • 3 Aug 2023 • Minsu Kim, Jeongsoo Choi, Dahun Kim, Yong Man Ro

A single pre-trained model with UTUT can be employed for diverse multilingual speech- and text-related tasks, such as Speech-to-Speech Translation (STS), multilingual Text-to-Speech Synthesis (TTS), and Text-to-Speech Translation (TTST).

Representation Learning Speech-to-Speech Translation +4

Paper
Code

Discriminative Feature Learning for Unsupervised Video Summarization

1 code implementation • 24 Nov 2018 • Yunjae Jung, Donghyeon Cho, Dahun Kim, Sanghyun Woo, In So Kweon

The proposed variance loss allows a network to predict output scores for each frame with high discrepancy which enables effective feature learning and significantly improves model performance.

Ranked #3 on Unsupervised Video Summarization on SumMe

Supervised Video Summarization Unsupervised Video Summarization

Paper
Code

Learning Image Representations by Completing Damaged Jigsaw Puzzles

no code implementations • 6 Feb 2018 • Dahun Kim, Donghyeon Cho, Donggeun Yoo, In So Kweon

The recovery of the aforementioned damage pushes the network to obtain robust and general-purpose representations.

Colorization Representation Learning +2

Paper
Add Code

Two-Phase Learning for Weakly Supervised Object Localization

no code implementations • ICCV 2017 • Dahun Kim, Donghyeon Cho, Donggeun Yoo, In So Kweon

Weakly supervised semantic segmentation and localiza- tion have a problem of focusing only on the most important parts of an image since they use only image-level annota- tions.

Object Segmentation +5

Paper
Add Code

Self-Supervised Video Representation Learning with Space-Time Cubic Puzzles

no code implementations • 24 Nov 2018 • Dahun Kim, Donghyeon Cho, In So Kweon

Self-supervised tasks such as colorization, inpainting and zigsaw puzzle have been utilized for visual representation learning for still images, when the number of labeled images is limited or absent at all.

Ranked #42 on Self-Supervised Action Recognition on HMDB51

Colorization Representation Learning +2

Paper
Add Code

Align-and-Attend Network for Globally and Locally Coherent Video Inpainting

no code implementations • 30 May 2019 • Sanghyun Woo, Dahun Kim, KwanYong Park, Joon-Young Lee, In So Kweon

Our video inpainting network consists of two stages.

Video Inpainting

Paper
Add Code

Preserving Semantic and Temporal Consistency for Unpaired Video-to-Video Translation

no code implementations • 21 Aug 2019 • Kwanyong Park, Sanghyun Woo, Dahun Kim, Donghyeon Cho, In So Kweon

In this paper, we investigate the problem of unpaired video-to-video translation.

Domain Adaptation Translation

Paper
Add Code

Hide-and-Tell: Learning to Bridge Photo Streams for Visual Storytelling

no code implementations • 3 Feb 2020 • Yunjae Jung, Dahun Kim, Sanghyun Woo, Kyung-Su Kim, Sungjin Kim, In So Kweon

In this paper, we propose to explicitly learn to imagine a storyline that bridges the visual gap.

Ranked #7 on Visual Storytelling on VIST

Image Captioning Visual Storytelling

Paper
Add Code

The Devil is in the Boundary: Exploiting Boundary Representation for Basis-based Instance Segmentation

no code implementations • 26 Nov 2020 • Myungchul Kim, Sanghyun Woo, Dahun Kim, In So Kweon

In this work, we propose Boundary Basis based Instance Segmentation(B2Inst) to learn a global boundary representation that can complement existing global-mask-based methods that are often lacking high-frequency details.

Instance Segmentation Scene Understanding +2

Paper
Add Code

Learning to Associate Every Segment for Video Panoptic Segmentation

no code implementations • CVPR 2021 • Sanghyun Woo, Dahun Kim, Joon-Young Lee, In So Kweon

Temporal correspondence - linking pixels or objects across frames - is a fundamental supervisory signal for the video models.

Ranked #6 on Video Panoptic Segmentation on Cityscapes-VPS (using extra training data)

Video Panoptic Segmentation

Paper
Add Code

TubeFormer-DeepLab: Video Mask Transformer

no code implementations • CVPR 2022 • Dahun Kim, Jun Xie, Huiyu Wang, Siyuan Qiao, Qihang Yu, Hong-Seok Kim, Hartwig Adam, In So Kweon, Liang-Chieh Chen

We present TubeFormer-DeepLab, the first attempt to tackle multiple core video segmentation tasks in a unified manner.

Panoptic Segmentation Segmentation +2

Paper
Add Code

MaMMUT: A Simple Architecture for Joint Learning for MultiModal Tasks

no code implementations • 29 Mar 2023 • Weicheng Kuo, AJ Piergiovanni, Dahun Kim, Xiyang Luo, Ben Caine, Wei Li, Abhijit Ogale, Luowei Zhou, Andrew Dai, Zhifeng Chen, Claire Cui, Anelia Angelova

We propose a novel paradigm of training with a decoder-only model for multimodal tasks, which is surprisingly effective in jointly learning of these disparate vision-language tasks.

Ranked #1 on Visual Question Answering on COCO Visual Question Answering (VQA) real images 2.0 open ended

Cross-Modal Retrieval Image Retrieval +7

Paper
Add Code

Video-kMaX: A Simple Unified Approach for Online and Near-Online Video Panoptic Segmentation

no code implementations • 10 Apr 2023 • Inkyu Shin, Dahun Kim, Qihang Yu, Jun Xie, Hong-Seok Kim, Bradley Green, In So Kweon, Kuk-Jin Yoon, Liang-Chieh Chen

The meta architecture of the proposed Video-kMaX consists of two components: within clip segmenter (for clip-level segmentation) and cross-clip associater (for association beyond clips).

Scene Understanding Segmentation +2

Paper
Add Code

Neural Image-based Avatars: Generalizable Radiance Fields for Human Avatar Modeling

no code implementations • 10 Apr 2023 • Youngjoong Kwon, Dahun Kim, Duygu Ceylan, Henry Fuchs

We present a method that enables synthesizing novel views and novel poses of arbitrary human performers from sparse multi-view images.

Paper
Add Code

RECLIP: Resource-efficient CLIP by Training with Small Images

no code implementations • 12 Apr 2023 • Runze Li, Dahun Kim, Bir Bhanu, Weicheng Kuo

We present RECLIP (Resource-efficient CLIP), a simple method that minimizes computational resource footprint for CLIP (Contrastive Language Image Pretraining).

Contrastive Learning Retrieval +3

Paper
Add Code

Contrastive Feature Masking Open-Vocabulary Vision Transformer

no code implementations • ICCV 2023 • Dahun Kim, Anelia Angelova, Weicheng Kuo

We present Contrastive Feature Masking Vision Transformer (CFM-ViT) - an image-text pretraining methodology that achieves simultaneous learning of image- and region-level representation for open-vocabulary object detection (OVD).

Ranked #5 on Open Vocabulary Object Detection on LVIS v1.0

Contrastive Learning object-detection +3

Paper
Add Code

Mirasol3B: A Multimodal Autoregressive model for time-aligned and contextual modalities

no code implementations • 9 Nov 2023 • AJ Piergiovanni, Isaac Noble, Dahun Kim, Michael S. Ryoo, Victor Gomes, Anelia Angelova

We propose a multimodal model, called Mirasol3B, consisting of an autoregressive component for the time-synchronized modalities (audio and video), and an autoregressive component for the context modalities which are not necessarily aligned in time but are still sequential.

Ranked #1 on Audio Classification on VGGSound

Action Classification Audio Classification +1

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.