no code implementations • 23 Jul 2024 • Eunseop Yoon, Hee Suk Yoon, SooHwan Eom, Gunsoo Han, Daniel Wontae Nam, DaeJin Jo, Kyoung-Woon On, Mark A. Hasegawa-Johnson, Sungwoong Kim, Chang D. Yoo
These human preference data, however, are labeled at the sequence level, creating a mismatch between sequence-level preference labels and tokens, which are autoregressively generated from the language model.
no code implementations • 22 Apr 2024 • Jooeun Kim, Jinri Kim, Kwangeun Yeo, Eungi Kim, Kyoung-Woon On, Jonghwan Mun, Joonseok Lee
Cold-start item recommendation is a long-standing challenge in recommendation systems.
no code implementations • 6 Apr 2024 • Seungjae Jung, Gunsoo Han, Daniel Wontae Nam, Kyoung-Woon On
In the process of this discovery, we identified two techniques for effective alignment: reward shift and underlying distribution matching.
1 code implementation • 14 Mar 2024 • Hyunji Lee, Doyoung Kim, Jihoon Jun, Sejune Joo, Joel Jang, Kyoung-Woon On, Minjoon Seo
Especially, the robustness of parametric token space which is established during the pretraining step tends to effectively enhance the stability of nonparametric sequence embedding space, a new space established by another language model.
1 code implementation • 15 Nov 2023 • Hyunji Lee, Sejune Joo, Chaeeun Kim, Joel Jang, Doyoung Kim, Kyoung-Woon On, Minjoon Seo
To reduce issues like hallucinations and lack of control in Large Language Models (LLMs), a common method is to generate responses by grounding on external contexts given as input, known as knowledge-augmented models.
no code implementations • 10 Oct 2023 • DaeJin Jo, Daniel Wontae Nam, Gunsoo Han, Kyoung-Woon On, Taehwan Kwon, Seungeun Rho, Sungwoong Kim
A common practice in knowledge-grounded dialogue generation is to explicitly utilize intermediate steps (e. g., web-search, memory retrieval) with modular approaches.
no code implementations • 27 Jul 2023 • Jihyeon Lee, Dain Kim, Doohae Jung, Boseop Kim, Kyoung-Woon On
In-context learning, which offers substantial advantages over fine-tuning, is predominantly observed in decoder-only models, while encoder-decoder (i. e., seq2seq) models excel in methods that rely on weight updates.
no code implementations • 23 May 2023 • Eunbi Choi, Kyoung-Woon On, Gunsoo Han, Sungwoong Kim, Daniel Wontae Nam, DaeJin Jo, Seung Eun Rho, Taehwan Kwon, Minjoon Seo
Open-domain conversation systems integrate multiple conversation skills into a single system through a modular approach.
1 code implementation • CVPR 2023 • Dohwan Ko, Joonmyung Choi, Hyeong Kyu Choi, Kyoung-Woon On, Byungseok Roh, Hyunwoo J. Kim
Therefore, we propose MEta Loss TRansformer (MELTR), a plug-in module that automatically and non-linearly combines various loss functions to aid learning the target task via auxiliary learning.
Ranked #2 on Video Captioning on YouCook2
1 code implementation • CVPR 2022 • Dohwan Ko, Joonmyung Choi, Juyeon Ko, Shinyeong Noh, Kyoung-Woon On, Eun-Sol Kim, Hyunwoo J. Kim
In this paper, we propose a novel multi-modal self-supervised framework Video-Text Temporally Weak Alignment-based Contrastive Learning (VT-TWINS) to capture significant information from noisy and weakly correlated data using a variant of Dynamic Time Warping (DTW).
no code implementations • CVPR 2022 • Bumsoo Kim, Jonghwan Mun, Kyoung-Woon On, Minchul Shin, Junhyun Lee, Eun-Sol Kim
Human-Object Interaction (HOI) detection is the task of identifying a set of <human, object, interaction> triplets from an image.
no code implementations • 13 Oct 2021 • Minchul Shin, Jonghwan Mun, Kyoung-Woon On, Woo-Young Kang, Gunsoo Han, Eun-Sol Kim
The VALUE (Video-And-Language Understanding Evaluation) benchmark is newly introduced to evaluate and analyze multi-modal representation learning algorithms on three video-and-language tasks: Retrieval, QA, and Captioning.
no code implementations • 1 Jan 2021 • Kyoung-Woon On, Eun-Sol Kim, Il-Jae Kwon, Sangwoong Yoon, Byoung-Tak Zhang
To further investigate the effectiveness of our proposed method, we evaluate our approach on a real-world problem, image retrieval with visual scene graphs.
no code implementations • 1 Jan 2021 • Il-Jae Kwon, Kyoung-Woon On, Dong-Geon Lee, Byoung-Tak Zhang
Most real-world graphs are dynamic and eventually face the cold start problem.
no code implementations • WS 2020 • Woo Suk Choi, Kyoung-Woon On, Yu-Jung Heo, Byoung-Tak Zhang
In experiment, the integrated scene graph is applied to the image-caption retrieval task as a down-stream task.
1 code implementation • 7 May 2020 • Seong-Ho Choi, Kyoung-Woon On, Yu-Jung Heo, Ahjeong Seo, Youwon Jang, Minsu Lee, Byoung-Tak Zhang
Despite recent progress on computer vision and natural language processing, developing a machine that can understand video story is still hard to achieve due to the intrinsic difficulty of video story.
no code implementations • 17 Jan 2020 • Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang
Here, we propose Cut-Based Graph Learning Networks (CB-GLNs) for learning video data by discovering these complex structures of the video.
no code implementations • 3 Jul 2019 • Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang
However, most of sequential data, as seen with videos, have complex temporal dependencies that imply variable-length semantic flows and their compositions, and those are hard to be captured by conventional methods.
no code implementations • 1 Apr 2019 • Yu-Jung Heo, Kyoung-Woon On, SeongHo Choi, Jaeseo Lim, Jinah Kim, Jeh-Kwang Ryu, Byung-Chull Bae, Byoung-Tak Zhang
Video understanding is emerging as a new paradigm for studying human-like AI.
no code implementations • 20 Jan 2019 • Kyoung-Woon On, Eun-Sol Kim, Yu-Jung Heo, Byoung-Tak Zhang
While conventional methods for sequential learning focus on interaction between consecutive inputs, we suggest a new method which captures composite semantic flows with variable-length dependencies.
8 code implementations • 14 Oct 2016 • Jin-Hwa Kim, Kyoung-Woon On, Woosang Lim, Jeonghee Kim, Jung-Woo Ha, Byoung-Tak Zhang
Bilinear models provide rich representations compared with linear models.