no code implementations • 8 Mar 2025 • Seil Kang, Jinyeong Kim, Junhyeok Kim, Seong Jae Hwang
However, we discover that a few attention heads in frozen LVLMs demonstrate strong visual grounding capabilities.
no code implementations • 5 Mar 2025 • Seil Kang, Jinyeong Kim, Junhyeok Kim, Seong Jae Hwang
Large multimodal models (LMMs) "see" images by leveraging the attention mechanism between text and visual tokens in the transformer decoder.
no code implementations • 19 Mar 2024 • Seil Kang, Donghyun Kim, Junhyeok Kim, Hyo Kyung Lee, Seong Jae Hwang
(1) Previous methods solely use CXR reports, which are insufficient for comprehensive Visual Question Answering (VQA), especially when additional health-related data like medication history and prior diagnoses are needed.
1 code implementation • 16 Oct 2023 • Seungju Han, Junhyeok Kim, Jack Hessel, Liwei Jiang, Jiwan Chung, Yejin Son, Yejin Choi, Youngjae Yu
NORMLENS consists of 10K human judgments accompanied by free-form explanations covering 2K multimodal situations, and serves as a probe to address two questions: (1) to what extent can models align with average human judgment?