no code implementations • 13 Feb 2025 • Adjovi Sim, Zhengkui Wang, Aik Beng Ng, Shalini De Mello, Simon See, Wonmin Byeon
Online continual learning for image classification is crucial for models to adapt to new data while retaining knowledge of previously learned tasks.
no code implementations • 21 Jan 2025 • Hongjun Wang, Wonmin Byeon, Jiarui Xu, Jinwei Gu, Ka Chun Cheung, Xiaolong Wang, Kai Han, Jan Kautz, Sifei Liu
We present the Generalized Spatial Propagation Network (GSPN), a new attention mechanism optimized for vision tasks that inherently captures 2D spatial structures.
no code implementations • 20 Nov 2024 • Xin Dong, Yonggan Fu, Shizhe Diao, Wonmin Byeon, Zijia Chen, Ameya Sunil Mahabaleshwarkar, Shih-Yang Liu, Matthijs Van Keirsbilck, Min-Hung Chen, Yoshi Suhara, Yingyan Lin, Jan Kautz, Pavlo Molchanov
We propose Hymba, a family of small language models featuring a hybrid-head parallel architecture that integrates transformer attention mechanisms with state space models (SSMs) for enhanced efficiency.
1 code implementation • 12 Jun 2024 • Roger Waleffe, Wonmin Byeon, Duncan Riach, Brandon Norick, Vijay Korthikanti, Tri Dao, Albert Gu, Ali Hatamizadeh, Sudhakar Singh, Deepak Narayanan, Garvit Kulshreshtha, Vartika Singh, Jared Casper, Jan Kautz, Mohammad Shoeybi, Bryan Catanzaro
On an additional 23 long-context tasks, the hybrid model continues to closely match or exceed the Transformer on average.
no code implementations • CVPR 2024 • Qiushan Guo, Shalini De Mello, Hongxu Yin, Wonmin Byeon, Ka Chun Cheung, Yizhou Yu, Ping Luo, Sifei Liu
Vision language models (VLMs) have experienced rapid advancements through the integration of large language models (LLMs) with image-text pairs, yet they struggle with detailed regional visual understanding due to limited spatial awareness of the vision encoder, and the use of coarse-grained training data that lacks detailed, region-specific captions.
no code implementations • 7 Dec 2023 • Gyeongrok Oh, Jaehwan Jeong, Sieun Kim, Wonmin Byeon, Jinkyu Kim, Sungwoong Kim, Sangpil Kim
We introduce a novel diffusion-based video generation method, generating a video showing multiple events given multiple individual sentences from the user.
no code implementations • ICCV 2023 • Yujin Jeong, Wonjeong Ryoo, SeungHyun Lee, Dabin Seo, Wonmin Byeon, Sangpil Kim, Jinkyu Kim
Hence, we propose The Power of Sound (TPoS) model to incorporate audio input that includes both changeable temporal semantics and magnitude.
no code implementations • CVPR 2023 • Divyam Madaan, Hongxu Yin, Wonmin Byeon, Jan Kautz, Pavlo Molchanov
We propose a novel framework and a solution to tackle the continual learning (CL) problem with changing network architectures.
1 code implementation • CVPR 2023 • Jiarui Xu, Sifei Liu, Arash Vahdat, Wonmin Byeon, Xiaolong Wang, Shalini De Mello
Our approach outperforms the previous state of the art by significant margins on both open-vocabulary panoptic and semantic segmentation tasks.
Ranked #2 on
Open-World Instance Segmentation
on UVO
(using extra training data)
Open Vocabulary Panoptic Segmentation
Open Vocabulary Semantic Segmentation
+4
no code implementations • 21 Nov 2022 • Seung Hyun Lee, Chanyoung Kim, Wonmin Byeon, Sang Ho Yoon, Jinkyu Kim, Sangpil Kim
We present a novel framework, Localized Image Stylization with Audio (LISA) which performs audio-driven localized image stylization.
no code implementations • 30 Aug 2022 • Seung Hyun Lee, Gyeongrok Oh, Wonmin Byeon, Sang Ho Yoon, Jinkyu Kim, Sangpil Kim
Our extensive experiments show that our sound-guided image manipulation approach produces semantically and visually more plausible manipulation results than the state-of-the-art text and sound-guided image manipulation methods, which are further confirmed by our human evaluations.
no code implementations • 20 Apr 2022 • Seung Hyun Lee, Gyeongrok Oh, Wonmin Byeon, Chanyoung Kim, Won Jeong Ryoo, Sang Ho Yoon, Hyunjun Cho, Jihyun Bae, Jinkyu Kim, Sangpil Kim
The recent success in StyleGAN demonstrates that pre-trained StyleGAN latent space is useful for realistic video generation.
no code implementations • 24 Feb 2022 • Benjamin Wu, Oliver Hennigh, Jan Kautz, Sanjay Choudhry, Wonmin Byeon
This efficiently and flexibly produces a compressed representation which is used for additional conditioning of physics-informed models.
5 code implementations • CVPR 2022 • Jiarui Xu, Shalini De Mello, Sifei Liu, Wonmin Byeon, Thomas Breuel, Jan Kautz, Xiaolong Wang
With only text supervision and without any pixel-level annotations, GroupViT learns to group together semantic regions and successfully transfers to the task of semantic segmentation in a zero-shot manner, i. e., without any further fine-tuning.
no code implementations • NeurIPS 2021 • Zhiding Yu, Rui Huang, Wonmin Byeon, Sifei Liu, Guilin Liu, Thomas Breuel, Anima Anandkumar, Jan Kautz
It is therefore interesting to study how these two tasks can be coupled to benefit each other.
1 code implementation • CVPR 2022 • Seung Hyun Lee, Wonseok Roh, Wonmin Byeon, Sang Ho Yoon, Chan Young Kim, Jinkyu Kim, Sangpil Kim
Our audio encoder is trained to produce a latent representation from an audio input, which is forced to be aligned with image and text representations in the multi-modal embedding space.
no code implementations • 29 Sep 2021 • Jiahao Su, Wonmin Byeon, Furong Huang
Some of these designs are not exactly orthogonal, while others only consider standard convolutional layers and propose specific classes of their realizations.
no code implementations • 16 Jun 2021 • Jiahao Su, Wonmin Byeon, Furong Huang
To address this problem, we propose a theoretical framework for orthogonal convolutional layers, which establishes the equivalence between various orthogonal convolutional layers in the spatial domain and the paraunitary systems in the spectral domain.
2 code implementations • CVPR 2021 • Rakshit Kothari, Shalini De Mello, Umar Iqbal, Wonmin Byeon, Seonwook Park, Jan Kautz
A major challenge for physically unconstrained gaze estimation is acquiring training data with 3D gaze annotations for in-the-wild and outdoor scenarios.
Ranked #3 on
Gaze Estimation
on Gaze360
no code implementations • 14 Dec 2020 • Oliver Hennigh, Susheela Narasimhan, Mohammad Amin Nabian, Akshay Subramaniam, Kaustubh Tangsali, Max Rietmann, Jose del Aguila Ferrandis, Wonmin Byeon, Zhiwei Fang, Sanjay Choudhry
We present real-world use cases that range from challenging forward multi-physics simulations with turbulence and complex 3D geometries, to industrial design optimization and inverse problems that are not addressed efficiently by the traditional solvers.
no code implementations • 1 Dec 2020 • Yiran Zhong, Charles Loop, Wonmin Byeon, Stan Birchfield, Yuchao Dai, Kaihao Zhang, Alexey Kamenev, Thomas Breuel, Hongdong Li, Jan Kautz
A common way to speed up the computation is to downsample the feature volume, but this loses high-frequency details.
2 code implementations • NeurIPS 2020 • Jiahao Su, Wonmin Byeon, Jean Kossaifi, Furong Huang, Jan Kautz, Animashree Anandkumar
Learning from spatio-temporal data has numerous applications such as human-behavior analysis, object tracking, video compression, and physics simulation. However, existing methods still perform poorly on challenging video tasks such as long-term forecasting.
Ranked #1 on
Video Prediction
on KTH
(Cond metric)
no code implementations • 25 Sep 2019 • Jiahao Su, Wonmin Byeon, Furong Huang, Jan Kautz, Animashree Anandkumar
Long-term video prediction is highly challenging since it entails simultaneously capturing spatial and temporal information across a long range of image frames. Standard recurrent models are ineffective since they are prone to error propagation and cannot effectively capture higher-order correlations.
no code implementations • 25 Sep 2019 • Wonmin Byeon, Jan Kautz
While video prediction approaches have advanced considerably in recent years, learning to predict long-term future is challenging — ambiguous future or error propagation over time yield blurry predictions.
no code implementations • 21 Feb 2018 • Pantelis R. Vlachas, Wonmin Byeon, Zhong Y. Wan, Themistoklis P. Sapsis, Petros Koumoutsakos
We introduce a data-driven forecasting method for high-dimensional chaotic systems using long short-term memory (LSTM) recurrent neural networks.
no code implementations • ECCV 2018 • Wonmin Byeon, Qin Wang, Rupesh Kumar Srivastava, Petros Koumoutsakos
Video prediction models based on convolutional networks, recurrent networks, and their combinations often result in blurry predictions.
no code implementations • NeurIPS 2015 • Marijn F. Stollenga, Wonmin Byeon, Marcus Liwicki, Juergen Schmidhuber
In contrast, Multi-Dimensional Recurrent NNs (MD-RNNs) can perceive the entire spatio-temporal context of each pixel in a few sweeps through all pixels, especially when the RNN is a Long Short-Term Memory (LSTM).
no code implementations • CVPR 2015 • Wonmin Byeon, Thomas M. Breuel, Federico Raue, Marcus Liwicki
This paper addresses the problem of pixel-level segmentation and classification of scene images with an entirely learning-based approach using Long Short Term Memory (LSTM) recurrent neural networks, which are commonly used for sequence classification.