no code implementations • 28 Dec 2023 • Jiasen Lu, Christopher Clark, Sangho Lee, Zichen Zhang, Savya Khosla, Ryan Marten, Derek Hoiem, Aniruddha Kembhavi
We present Unified-IO 2, the first autoregressive multimodal model that is capable of understanding and generating image, text, audio, and action.
1 code implementation • 22 Oct 2023 • Dayoon Ko, Sangho Lee, Gunhee Kim
Our ExFunTube is unique over existing datasets in that our videos cover a wide range of domains with various types of humor that necessitate a multimodal understanding of the content.
1 code implementation • 14 Jan 2022 • Jonghwan Mun, Minchul Shin, Gunsoo Han, Sangho Lee, Seongsu Ha, Joonseok Lee, Eun-Sol Kim
Inspired from this, we tackle video scene segmentation, which is a task of temporally localizing scene boundaries in a video, with a self-supervised learning framework where we mainly focus on designing effective pretext tasks.
1 code implementation • 7 Dec 2021 • Yookoon Park, Sangho Lee, Gunhee Kim, David M. Blei
We argue that the deep encoder should maximize its nonlinear expressivity on the data for downstream predictors to take full advantage of its representation power.
no code implementations • 29 Sep 2021 • Jonghwan Mun, Minchul Shin, Gunsoo Han, Sangho Lee, Seongsu Ha, Joonseok Lee, Eun-Sol Kim
Inspired from this, we tackle video scene segmentation, which is a task of temporally localizing scene boundaries in a video, with a self-supervised learning framework where we mainly focus on designing effective pretext tasks.
1 code implementation • ICCV 2021 • Sangho Lee, Jiwan Chung, Youngjae Yu, Gunhee Kim, Thomas Breuel, Gal Chechik, Yale Song
We demonstrate that our approach finds videos with high audio-visual correspondence and show that self-supervised models trained on our data achieve competitive performances compared to models trained on existing manually curated datasets.
no code implementations • ICLR 2021 • Youngjae Yu, Sangho Lee, Gunhee Kim, Yale Song
We show that our approach achieves competitive performance on self-supervised learning of video representations with a considerable improvement in speed compared to the traditional methods.
no code implementations • ICLR 2021 • Sangho Lee, Youngjae Yu, Gunhee Kim, Thomas Breuel, Jan Kautz, Yale Song
The recent success of Transformers in the language domain has motivated adapting it to a multimodal setting, where a new visual model is trained in tandem with an already pretrained language model.
no code implementations • 20 Oct 2020 • Sangho Lee, KiYoon Yoo, Nojun Kwak
Federated learning (FL), which utilizes communication between the server (core) and local devices (edges) to indirectly learn from more data, is an emerging field in deep learning research.