no code implementations • 8 Mar 2025 • Seil Kang, Jinyeong Kim, Junhyeok Kim, Seong Jae Hwang
However, we discover that a few attention heads in frozen LVLMs demonstrate strong visual grounding capabilities.
no code implementations • 5 Mar 2025 • Seil Kang, Jinyeong Kim, Junhyeok Kim, Seong Jae Hwang
Large multimodal models (LMMs) "see" images by leveraging the attention mechanism between text and visual tokens in the transformer decoder.
no code implementations • 1 Jul 2024 • Donghyun Kim, Seil Kang, Seong Jae Hwang
This work introduces FALCON (Frequency Adjoint Link with CONtinuous density mask), a single-image dehazing system achieving state-of-the-art performance on both quality and speed.
no code implementations • 19 Mar 2024 • Seil Kang, Donghyun Kim, Junhyeok Kim, Hyo Kyung Lee, Seong Jae Hwang
(1) Previous methods solely use CXR reports, which are insufficient for comprehensive Visual Question Answering (VQA), especially when additional health-related data like medication history and prior diagnoses are needed.
no code implementations • 5 Feb 2024 • Woojung Han, Seil Kang, Kyobin Choo, Seong Jae Hwang
This includes not only the masks generated by our model, but also the segmentation results derived from utilizing these masks as pseudo labels.