Search Results for author: Wonmin Byeon

Found 24 papers, 5 papers with code

RegionGPT: Towards Region Understanding Vision Language Model

no code implementations4 Mar 2024 Qiushan Guo, Shalini De Mello, Hongxu Yin, Wonmin Byeon, Ka Chun Cheung, Yizhou Yu, Ping Luo, Sifei Liu

Vision language models (VLMs) have experienced rapid advancements through the integration of large language models (LLMs) with image-text pairs, yet they struggle with detailed regional visual understanding due to limited spatial awareness of the vision encoder, and the use of coarse-grained training data that lacks detailed, region-specific captions.

Language Modelling

MTVG : Multi-text Video Generation with Text-to-Video Models

no code implementations7 Dec 2023 Gyeongrok Oh, Jaehwan Jeong, Sieun Kim, Wonmin Byeon, Jinkyu Kim, Sungwoong Kim, Hyeokmin Kwon, Sangpil Kim

Concerning the characteristics of video, multi-text conditioning incorporating sequential events is necessary for next-step video generation.

Video Generation

The Power of Sound (TPoS): Audio Reactive Video Generation with Stable Diffusion

no code implementations ICCV 2023 Yujin Jeong, Wonjeong Ryoo, SeungHyun Lee, Dabin Seo, Wonmin Byeon, Sangpil Kim, Jinkyu Kim

Hence, we propose The Power of Sound (TPoS) model to incorporate audio input that includes both changeable temporal semantics and magnitude.

Video Generation

Heterogeneous Continual Learning

no code implementations CVPR 2023 Divyam Madaan, Hongxu Yin, Wonmin Byeon, Jan Kautz, Pavlo Molchanov

We propose a novel framework and a solution to tackle the continual learning (CL) problem with changing network architectures.

Continual Learning Knowledge Distillation +1

LISA: Localized Image Stylization with Audio via Implicit Neural Representation

no code implementations21 Nov 2022 Seung Hyun Lee, Chanyoung Kim, Wonmin Byeon, Sang Ho Yoon, Jinkyu Kim, Sangpil Kim

We present a novel framework, Localized Image Stylization with Audio (LISA) which performs audio-driven localized image stylization.

Image Stylization Object +1

Robust Sound-Guided Image Manipulation

no code implementations30 Aug 2022 Seung Hyun Lee, Gyeongrok Oh, Wonmin Byeon, Sang Ho Yoon, Jinkyu Kim, Sangpil Kim

Our extensive experiments show that our sound-guided image manipulation approach produces semantically and visually more plausible manipulation results than the state-of-the-art text and sound-guided image manipulation methods, which are further confirmed by our human evaluations.

Image Manipulation

Sound-Guided Semantic Video Generation

no code implementations20 Apr 2022 Seung Hyun Lee, Gyeongrok Oh, Wonmin Byeon, Chanyoung Kim, Won Jeong Ryoo, Sang Ho Yoon, Hyunjun Cho, Jihyun Bae, Jinkyu Kim, Sangpil Kim

The recent success in StyleGAN demonstrates that pre-trained StyleGAN latent space is useful for realistic video generation.

Video Editing Video Generation

Physics Informed RNN-DCT Networks for Time-Dependent Partial Differential Equations

no code implementations24 Feb 2022 Benjamin Wu, Oliver Hennigh, Jan Kautz, Sanjay Choudhry, Wonmin Byeon

This efficiently and flexibly produces a compressed representation which is used for additional conditioning of physics-informed models.

GroupViT: Semantic Segmentation Emerges from Text Supervision

2 code implementations CVPR 2022 Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang

With only text supervision and without any pixel-level annotations, GroupViT learns to group together semantic regions and successfully transfers to the task of semantic segmentation in a zero-shot manner, i. e., without any further fine-tuning.

Object Detection Scene Understanding +3

Sound-Guided Semantic Image Manipulation

1 code implementation CVPR 2022 Seung Hyun Lee, Wonseok Roh, Wonmin Byeon, Sang Ho Yoon, Chan Young Kim, Jinkyu Kim, Sangpil Kim

Our audio encoder is trained to produce a latent representation from an audio input, which is forced to be aligned with image and text representations in the multi-modal embedding space.

Audio Classification Image Classification +2

Scaling-up Diverse Orthogonal Convolutional Networks by a Paraunitary Framework

no code implementations29 Sep 2021 Jiahao Su, Wonmin Byeon, Furong Huang

Some of these designs are not exactly orthogonal, while others only consider standard convolutional layers and propose specific classes of their realizations.

Scaling-up Diverse Orthogonal Convolutional Networks with a Paraunitary Framework

no code implementations16 Jun 2021 Jiahao Su, Wonmin Byeon, Furong Huang

To address this problem, we propose a theoretical framework for orthogonal convolutional layers, which establishes the equivalence between various orthogonal convolutional layers in the spatial domain and the paraunitary systems in the spectral domain.

Weakly-Supervised Physically Unconstrained Gaze Estimation

1 code implementation CVPR 2021 Rakshit Kothari, Shalini De Mello, Umar Iqbal, Wonmin Byeon, Seonwook Park, Jan Kautz

A major challenge for physically unconstrained gaze estimation is acquiring training data with 3D gaze annotations for in-the-wild and outdoor scenarios.

Domain Generalization Gaze Estimation

NVIDIA SimNet^{TM}: an AI-accelerated multi-physics simulation framework

no code implementations14 Dec 2020 Oliver Hennigh, Susheela Narasimhan, Mohammad Amin Nabian, Akshay Subramaniam, Kaustubh Tangsali, Max Rietmann, Jose del Aguila Ferrandis, Wonmin Byeon, Zhiwei Fang, Sanjay Choudhry

We present real-world use cases that range from challenging forward multi-physics simulations with turbulence and complex 3D geometries, to industrial design optimization and inverse problems that are not addressed efficiently by the traditional solvers.

Convolutional Tensor-Train LSTM for Spatio-temporal Learning

2 code implementations NeurIPS 2020 Jiahao Su, Wonmin Byeon, Jean Kossaifi, Furong Huang, Jan Kautz, Animashree Anandkumar

Learning from spatio-temporal data has numerous applications such as human-behavior analysis, object tracking, video compression, and physics simulation. However, existing methods still perform poorly on challenging video tasks such as long-term forecasting.

 Ranked #1 on Video Prediction on KTH (Cond metric)

Activity Recognition Video Compression +1

Convolutional Tensor-Train LSTM for Long-Term Video Prediction

no code implementations25 Sep 2019 Jiahao Su, Wonmin Byeon, Furong Huang, Jan Kautz, Animashree Anandkumar

Long-term video prediction is highly challenging since it entails simultaneously capturing spatial and temporal information across a long range of image frames. Standard recurrent models are ineffective since they are prone to error propagation and cannot effectively capture higher-order correlations.

Video Prediction

Long History Short-Term Memory for Long-Term Video Prediction

no code implementations25 Sep 2019 Wonmin Byeon, Jan Kautz

While video prediction approaches have advanced considerably in recent years, learning to predict long-term future is challenging — ambiguous future or error propagation over time yield blurry predictions.

Video Prediction

Data-Driven Forecasting of High-Dimensional Chaotic Systems with Long Short-Term Memory Networks

no code implementations21 Feb 2018 Pantelis R. Vlachas, Wonmin Byeon, Zhong Y. Wan, Themistoklis P. Sapsis, Petros Koumoutsakos

We introduce a data-driven forecasting method for high-dimensional chaotic systems using long short-term memory (LSTM) recurrent neural networks.

Gaussian Processes Time Series +1

ContextVP: Fully Context-Aware Video Prediction

no code implementations ECCV 2018 Wonmin Byeon, Qin Wang, Rupesh Kumar Srivastava, Petros Koumoutsakos

Video prediction models based on convolutional networks, recurrent networks, and their combinations often result in blurry predictions.

Video Prediction

Parallel Multi-Dimensional LSTM, With Application to Fast Biomedical Volumetric Image Segmentation

no code implementations NeurIPS 2015 Marijn F. Stollenga, Wonmin Byeon, Marcus Liwicki, Juergen Schmidhuber

In contrast, Multi-Dimensional Recurrent NNs (MD-RNNs) can perceive the entire spatio-temporal context of each pixel in a few sweeps through all pixels, especially when the RNN is a Long Short-Term Memory (LSTM).

Brain Image Segmentation Image Segmentation +1

Scene Labeling With LSTM Recurrent Neural Networks

no code implementations CVPR 2015 Wonmin Byeon, Thomas M. Breuel, Federico Raue, Marcus Liwicki

This paper addresses the problem of pixel-level segmentation and classification of scene images with an entirely learning-based approach using Long Short Term Memory (LSTM) recurrent neural networks, which are commonly used for sequence classification.

Classification General Classification +4

Cannot find the paper you are looking for? You can Submit a new open access paper.