no code implementations • 6 Apr 2024 • Seungjae Jung, Gunsoo Han, Daniel Wontae Nam, Kyoung-Woon On
In the process of this discovery, we identified two techniques for effective alignment: reward shift and underlying distribution matching.
1 code implementation • 14 Mar 2024 • Hyunji Lee, Doyoung Kim, Jihoon Jun, Sejune Joo, Joel Jang, Kyoung-Woon On, Minjoon Seo
Especially, the robustness of parametric token space which is established during the pretraining step tends to effectively enhance the stability of nonparametric sequence embedding space, a new space established by another language model.
1 code implementation • 15 Nov 2023 • Hyunji Lee, Sejune Joo, Chaeeun Kim, Joel Jang, Doyoung Kim, Kyoung-Woon On, Minjoon Seo
Reliance on the inherent knowledge of Large Language Models (LLMs) can cause issues such as hallucinations, lack of control, and difficulties in integrating variable knowledge.
no code implementations • 10 Oct 2023 • DaeJin Jo, Daniel Wontae Nam, Gunsoo Han, Kyoung-Woon On, Taehwan Kwon, Seungeun Rho, Sungwoong Kim
A common practice in knowledge-grounded dialogue generation is to explicitly utilize intermediate steps (e. g., web-search, memory retrieval) with modular approaches.
no code implementations • 27 Jul 2023 • Jihyeon Lee, Dain Kim, Doohae Jung, Boseop Kim, Kyoung-Woon On
In-context learning, which offers substantial advantages over fine-tuning, is predominantly observed in decoder-only models, while encoder-decoder (i. e., seq2seq) models excel in methods that rely on weight updates.
no code implementations • 23 May 2023 • Eunbi Choi, Kyoung-Woon On, Gunsoo Han, Sungwoong Kim, Daniel Wontae Nam, DaeJin Jo, Seung Eun Rho, Taehwan Kwon, Minjoon Seo
Open-domain conversation systems integrate multiple conversation skills into a single system through a modular approach.
1 code implementation • CVPR 2023 • Dohwan Ko, Joonmyung Choi, Hyeong Kyu Choi, Kyoung-Woon On, Byungseok Roh, Hyunwoo J. Kim
Therefore, we propose MEta Loss TRansformer (MELTR), a plug-in module that automatically and non-linearly combines various loss functions to aid learning the target task via auxiliary learning.
Ranked #2 on Video Captioning on YouCook2
1 code implementation • CVPR 2022 • Dohwan Ko, Joonmyung Choi, Juyeon Ko, Shinyeong Noh, Kyoung-Woon On, Eun-Sol Kim, Hyunwoo J. Kim
In this paper, we propose a novel multi-modal self-supervised framework Video-Text Temporally Weak Alignment-based Contrastive Learning (VT-TWINS) to capture significant information from noisy and weakly correlated data using a variant of Dynamic Time Warping (DTW).
no code implementations • CVPR 2022 • Bumsoo Kim, Jonghwan Mun, Kyoung-Woon On, Minchul Shin, Junhyun Lee, Eun-Sol Kim
Human-Object Interaction (HOI) detection is the task of identifying a set of <human, object, interaction> triplets from an image.
no code implementations • 13 Oct 2021 • Minchul Shin, Jonghwan Mun, Kyoung-Woon On, Woo-Young Kang, Gunsoo Han, Eun-Sol Kim
The VALUE (Video-And-Language Understanding Evaluation) benchmark is newly introduced to evaluate and analyze multi-modal representation learning algorithms on three video-and-language tasks: Retrieval, QA, and Captioning.
no code implementations • 1 Jan 2021 • Il-Jae Kwon, Kyoung-Woon On, Dong-Geon Lee, Byoung-Tak Zhang
Most real-world graphs are dynamic and eventually face the cold start problem.
no code implementations • 1 Jan 2021 • Kyoung-Woon On, Eun-Sol Kim, Il-Jae Kwon, Sangwoong Yoon, Byoung-Tak Zhang
To further investigate the effectiveness of our proposed method, we evaluate our approach on a real-world problem, image retrieval with visual scene graphs.
no code implementations • WS 2020 • Woo Suk Choi, Kyoung-Woon On, Yu-Jung Heo, Byoung-Tak Zhang
In experiment, the integrated scene graph is applied to the image-caption retrieval task as a down-stream task.
1 code implementation • 7 May 2020 • Seong-Ho Choi, Kyoung-Woon On, Yu-Jung Heo, Ahjeong Seo, Youwon Jang, Minsu Lee, Byoung-Tak Zhang
Despite recent progress on computer vision and natural language processing, developing a machine that can understand video story is still hard to achieve due to the intrinsic difficulty of video story.
no code implementations • 17 Jan 2020 • Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang
Here, we propose Cut-Based Graph Learning Networks (CB-GLNs) for learning video data by discovering these complex structures of the video.
no code implementations • 3 Jul 2019 • Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang
However, most of sequential data, as seen with videos, have complex temporal dependencies that imply variable-length semantic flows and their compositions, and those are hard to be captured by conventional methods.
no code implementations • 1 Apr 2019 • Yu-Jung Heo, Kyoung-Woon On, SeongHo Choi, Jaeseo Lim, Jinah Kim, Jeh-Kwang Ryu, Byung-Chull Bae, Byoung-Tak Zhang
Video understanding is emerging as a new paradigm for studying human-like AI.
no code implementations • 20 Jan 2019 • Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang
While conventional methods for sequential learning focus on interaction between consecutive inputs, we suggest a new method which captures composite semantic flows with variable-length dependencies.
8 code implementations • 14 Oct 2016 • Jin-Hwa Kim, Kyoung-Woon On, Woosang Lim, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang
Bilinear models provide rich representations compared with linear models.