Search Results for author: Wonmin Byeon

Found 24 papers, 5 papers with code

RegionGPT: Towards Region Understanding Vision Language Model

no code implementations • 4 Mar 2024 • Qiushan Guo, Shalini De Mello, Hongxu Yin, Wonmin Byeon, Ka Chun Cheung, Yizhou Yu, Ping Luo, Sifei Liu

Vision language models (VLMs) have experienced rapid advancements through the integration of large language models (LLMs) with image-text pairs, yet they struggle with detailed regional visual understanding due to limited spatial awareness of the vision encoder, and the use of coarse-grained training data that lacks detailed, region-specific captions.

Language Modelling

Paper
Add Code

MTVG : Multi-text Video Generation with Text-to-Video Models

no code implementations • 7 Dec 2023 • Gyeongrok Oh, Jaehwan Jeong, Sieun Kim, Wonmin Byeon, Jinkyu Kim, Sungwoong Kim, Hyeokmin Kwon, Sangpil Kim

Concerning the characteristics of video, multi-text conditioning incorporating sequential events is necessary for next-step video generation.

Video Generation

Paper
Add Code

The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion

no code implementations • ICCV 2023 • Yujin Jeong, Wonjeong Ryoo, SeungHyun Lee, Dabin Seo, Wonmin Byeon, Sangpil Kim, Jinkyu Kim

Hence, we propose The Power of Sound (TPoS) model to incorporate audio input that includes both changeable temporal semantics and magnitude.

Video Generation

Paper
Add Code

Heterogeneous Continual Learning

no code implementations • CVPR 2023 • Divyam Madaan, Hongxu Yin, Wonmin Byeon, Jan Kautz, Pavlo Molchanov

We propose a novel framework and a solution to tackle the continual learning (CL) problem with changing network architectures.

Continual Learning Knowledge Distillation +1

Paper
Add Code

Open-Vocabulary Panoptic Segmentation with Text-to-Image Diffusion Models

1 code implementation • CVPR 2023 • Jiarui Xu, Sifei Liu, Arash Vahdat, Wonmin Byeon, Xiaolong Wang, Shalini De Mello

Our approach outperforms the previous state of the art by significant margins on both open-vocabulary panoptic and semantic segmentation tasks.

Ranked #2 on Open-World Instance Segmentation on UVO (using extra training data)

Open Vocabulary Panoptic Segmentation Open Vocabulary Semantic Segmentation +4

798

Paper
Code

LISA: Localized Image Stylization with Audio via Implicit Neural Representation

no code implementations • 21 Nov 2022 • Seung Hyun Lee, Chanyoung Kim, Wonmin Byeon, Sang Ho Yoon, Jinkyu Kim, Sangpil Kim

We present a novel framework, Localized Image Stylization with Audio (LISA) which performs audio-driven localized image stylization.

Image Stylization Object +1

Paper
Add Code

Robust Sound-Guided Image Manipulation

no code implementations • 30 Aug 2022 • Seung Hyun Lee, Gyeongrok Oh, Wonmin Byeon, Sang Ho Yoon, Jinkyu Kim, Sangpil Kim

Our extensive experiments show that our sound-guided image manipulation approach produces semantically and visually more plausible manipulation results than the state-of-the-art text and sound-guided image manipulation methods, which are further confirmed by our human evaluations.

Image Manipulation

Paper
Add Code

Sound-Guided Semantic Video Generation

no code implementations • 20 Apr 2022 • Seung Hyun Lee, Gyeongrok Oh, Wonmin Byeon, Chanyoung Kim, Won Jeong Ryoo, Sang Ho Yoon, Hyunjun Cho, Jihyun Bae, Jinkyu Kim, Sangpil Kim

The recent success in StyleGAN demonstrates that pre-trained StyleGAN latent space is useful for realistic video generation.

Video Editing Video Generation

Paper
Add Code

Physics Informed RNN-DCT Networks for Time-Dependent Partial Differential Equations

no code implementations • 24 Feb 2022 • Benjamin Wu, Oliver Hennigh, Jan Kautz, Sanjay Choudhry, Wonmin Byeon

This efficiently and flexibly produces a compressed representation which is used for additional conditioning of physics-informed models.

Paper
Add Code

GroupViT: Semantic Segmentation Emerges from Text Supervision

2 code implementations • CVPR 2022 • Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang

With only text supervision and without any pixel-level annotations, GroupViT learns to group together semantic regions and successfully transfers to the task of semantic segmentation in a zero-shot manner, i. e., without any further fine-tuning.

Ranked #3 on Unsupervised Semantic Segmentation with Language-image Pre-training on PascalVOC-20

Object Detection Scene Understanding +3

124,527

Paper
Code

Coupled Segmentation and Edge Learning via Dynamic Graph Propagation

no code implementations • NeurIPS 2021 • Zhiding Yu, Rui Huang, Wonmin Byeon, Sifei Liu, Guilin Liu, Thomas Breuel, Anima Anandkumar, Jan Kautz

It is therefore interesting to study how these two tasks can be coupled to benefit each other.

Edge Detection Image Segmentation +2

Paper
Add Code

Sound-Guided Semantic Image Manipulation

1 code implementation • CVPR 2022 • Seung Hyun Lee, Wonseok Roh, Wonmin Byeon, Sang Ho Yoon, Chan Young Kim, Jinkyu Kim, Sangpil Kim

Our audio encoder is trained to produce a latent representation from an audio input, which is forced to be aligned with image and text representations in the multi-modal embedding space.

Audio Classification Image Classification +2

Paper
Code

Scaling-up Diverse Orthogonal Convolutional Networks by a Paraunitary Framework

no code implementations • 29 Sep 2021 • Jiahao Su, Wonmin Byeon, Furong Huang

Some of these designs are not exactly orthogonal, while others only consider standard convolutional layers and propose specific classes of their realizations.

Paper
Add Code

Scaling-up Diverse Orthogonal Convolutional Networks with a Paraunitary Framework

no code implementations • 16 Jun 2021 • Jiahao Su, Wonmin Byeon, Furong Huang

To address this problem, we propose a theoretical framework for orthogonal convolutional layers, which establishes the equivalence between various orthogonal convolutional layers in the spatial domain and the paraunitary systems in the spectral domain.

Paper
Add Code

Weakly-Supervised Physically Unconstrained Gaze Estimation

1 code implementation • CVPR 2021 • Rakshit Kothari, Shalini De Mello, Umar Iqbal, Wonmin Byeon, Seonwook Park, Jan Kautz

A major challenge for physically unconstrained gaze estimation is acquiring training data with 3D gaze annotations for in-the-wild and outdoor scenarios.

Ranked #3 on Gaze Estimation on Gaze360

Domain Generalization Gaze Estimation

Paper
Code

NVIDIA SimNet^{TM}: an AI-accelerated multi-physics simulation framework

no code implementations • 14 Dec 2020 • Oliver Hennigh, Susheela Narasimhan, Mohammad Amin Nabian, Akshay Subramaniam, Kaustubh Tangsali, Max Rietmann, Jose del Aguila Ferrandis, Wonmin Byeon, Zhiwei Fang, Sanjay Choudhry

We present real-world use cases that range from challenging forward multi-physics simulations with turbulence and complex 3D geometries, to industrial design optimization and inverse problems that are not addressed efficiently by the traditional solvers.

Paper
Add Code

Displacement-Invariant Cost Computation for Efficient Stereo Matching

no code implementations • 1 Dec 2020 • Yiran Zhong, Charles Loop, Wonmin Byeon, Stan Birchfield, Yuchao Dai, Kaihao Zhang, Alexey Kamenev, Thomas Breuel, Hongdong Li, Jan Kautz

A common way to speed up the computation is to downsample the feature volume, but this loses high-frequency details.

Autonomous Driving Stereo Matching

Paper
Add Code

Convolutional Tensor-Train LSTM for Spatio-temporal Learning

2 code implementations • NeurIPS 2020 • Jiahao Su, Wonmin Byeon, Jean Kossaifi, Furong Huang, Jan Kautz, Animashree Anandkumar

Learning from spatio-temporal data has numerous applications such as human-behavior analysis, object tracking, video compression, and physics simulation. However, existing methods still perform poorly on challenging video tasks such as long-term forecasting.

Ranked #1 on Video Prediction on KTH (Cond metric)

Activity Recognition Video Compression +1

121

Paper
Code

Convolutional Tensor-Train LSTM for Long-Term Video Prediction

no code implementations • 25 Sep 2019 • Jiahao Su, Wonmin Byeon, Furong Huang, Jan Kautz, Animashree Anandkumar

Long-term video prediction is highly challenging since it entails simultaneously capturing spatial and temporal information across a long range of image frames. Standard recurrent models are ineffective since they are prone to error propagation and cannot effectively capture higher-order correlations.

Video Prediction

Paper
Add Code

Long History Short-Term Memory for Long-Term Video Prediction

no code implementations • 25 Sep 2019 • Wonmin Byeon, Jan Kautz

While video prediction approaches have advanced considerably in recent years, learning to predict long-term future is challenging — ambiguous future or error propagation over time yield blurry predictions.

Video Prediction

Paper
Add Code

Data-Driven Forecasting of High-Dimensional Chaotic Systems with Long Short-Term Memory Networks

no code implementations • 21 Feb 2018 • Pantelis R. Vlachas, Wonmin Byeon, Zhong Y. Wan, Themistoklis P. Sapsis, Petros Koumoutsakos

We introduce a data-driven forecasting method for high-dimensional chaotic systems using long short-term memory (LSTM) recurrent neural networks.

Gaussian Processes Time Series +1

Paper
Add Code

ContextVP: Fully Context-Aware Video Prediction

no code implementations • ECCV 2018 • Wonmin Byeon, Qin Wang, Rupesh Kumar Srivastava, Petros Koumoutsakos

Video prediction models based on convolutional networks, recurrent networks, and their combinations often result in blurry predictions.

Video Prediction

Paper
Add Code

Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation

no code implementations • NeurIPS 2015 • Marijn F. Stollenga, Wonmin Byeon, Marcus Liwicki, Juergen Schmidhuber

In contrast, Multi-Dimensional Recurrent NNs (MD-RNNs) can perceive the entire spatio-temporal context of each pixel in a few sweeps through all pixels, especially when the RNN is a Long Short-Term Memory (LSTM).

Brain Image Segmentation Image Segmentation +1

Paper
Add Code

Scene Labeling With LSTM Recurrent Neural Networks

no code implementations • CVPR 2015 • Wonmin Byeon, Thomas M. Breuel, Federico Raue, Marcus Liwicki

This paper addresses the problem of pixel-level segmentation and classification of scene images with an entirely learning-based approach using Long Short Term Memory (LSTM) recurrent neural networks, which are commonly used for sequence classification.

Classification General Classification +4

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.