no code implementations • 15 Mar 2024 • Dongmin Park, Zhaofang Qian, Guangxing Han, Ser-Nam Lim
To precisely measure this, we first present an evaluation benchmark by extending popular multi-modal benchmark datasets with prepended hallucinatory dialogues generated by our novel Adversarial Question Generator, which can automatically generate image-related yet adversarial dialogues by adopting adversarial attacks on LMMs.
no code implementations • 19 Dec 2023 • Shraman Pramanick, Guangxing Han, Rui Hou, Sayan Nag, Ser-Nam Lim, Nicolas Ballas, Qifan Wang, Rama Chellappa, Amjad Almahairi
In this work, we introduce VistaLLM, a powerful visual system that addresses coarse- and fine-grained VL tasks over single and multiple input images using a unified framework.
1 code implementation • CVPR 2023 • Han Lin, Guangxing Han, Jiawei Ma, Shiyuan Huang, Xudong Lin, Shih-Fu Chang
Vision Transformers (ViTs) emerge to achieve impressive performance on many data-abundant computer vision tasks by capturing long-range dependencies among local features.
1 code implementation • CVPR 2023 • Jiawei Ma, Yulei Niu, Jincheng Xu, Shiyuan Huang, Guangxing Han, Shih-Fu Chang
Generalized few-shot object detection aims to achieve precise detection on both base classes with abundant annotations and novel classes with limited training data.
1 code implementation • 28 Dec 2022 • Yuncong Yang, Jiawei Ma, Shiyuan Huang, Long Chen, Xudong Lin, Guangxing Han, Shih-Fu Chang
For long videos, given a paragraph of description where the sentences describe different segments of the video, by matching all sentence-clip pairs, the paragraph and the full video are aligned implicitly.
1 code implementation • 22 Oct 2022 • Long Chen, Yulei Niu, Brian Chen, Xudong Lin, Guangxing Han, Christopher Thomas, Hammad Ayyubi, Heng Ji, Shih-Fu Chang
Specifically, given an article and a relevant video, WSAG aims to localize all ``groundable'' sentences to the video, and these sentences are possibly at different semantic scales.
1 code implementation • 20 Jul 2022 • Zhen Wang, Long Chen, Wenbo Ma, Guangxing Han, Yulei Niu, Jian Shao, Jun Xiao
Given an image and a reference caption, the image caption editing task aims to correct the misalignment errors and generate a refined caption.
no code implementations • 16 Apr 2022 • Guangxing Han, Long Chen, Jiawei Ma, Shiyuan Huang, Rama Chellappa, Shih-Fu Chang
Our approach is motivated by the high-level conceptual similarity of (metric-based) meta-learning and prompt-based learning to learn generalizable few-shot and zero-shot object detection models respectively without fine-tuning.
1 code implementation • CVPR 2022 • Guangxing Han, Jiawei Ma, Shiyuan Huang, Long Chen, Shih-Fu Chang
Inspired by the recent work on vision transformers and vision-language transformers, we propose a novel Fully Cross-Transformer based model (FCT) for FSOD by incorporating cross-transformer into both the feature backbone and detection head.
no code implementations • 3 Feb 2022 • Nikolaos-Antonios Ypsilantis, Noa Garcia, Guangxing Han, Sarah Ibrahimi, Nanne van Noord, Giorgos Tolias
Testing is primarily performed on photos taken by museum guests depicting exhibits, which introduces a distribution shift between training and testing.
1 code implementation • ICCV 2021 • Guangxing Han, Yicheng He, Shiyuan Huang, Jiawei Ma, Shih-Fu Chang
Few-shot object detection (FSOD) aims to detect never-seen objects using few examples.
no code implementations • ICCV 2021 • Jiawei Ma, Hanchen Xie, Guangxing Han, Shih-Fu Chang, Aram Galstyan, Wael Abd-Almageed
In this paper, we focus on the design of training strategy to obtain an elemental representation such that the prototype of each novel class can be estimated from a few labeled samples.
2 code implementations • 15 Apr 2021 • Guangxing Han, Shiyuan Huang, Jiawei Ma, Yicheng He, Shih-Fu Chang
To improve the fine-grained few-shot proposal classification, we propose a novel attentive feature alignment method to address the spatial misalignment between the noisy proposals and few-shot classes, thus improving the performance of few-shot object detection.
1 code implementation • CVPR 2022 • Shiyuan Huang, Jiawei Ma, Guangxing Han, Shih-Fu Chang
In this paper, we instead propose task-adaptive negative class envision for FSOR to integrate threshold tuning into the learning process.
no code implementations • NAACL 2021 • Qingyun Wang, Manling Li, Xuan Wang, Nikolaus Parulian, Guangxing Han, Jiawei Ma, Jingxuan Tu, Ying Lin, Haoran Zhang, Weili Liu, Aabhas Chauhan, Yingjun Guan, Bangzheng Li, Ruisong Li, Xiangchen Song, Yi R. Fung, Heng Ji, Jiawei Han, Shih-Fu Chang, James Pustejovsky, Jasmine Rah, David Liem, Ahmed Elsayed, Martha Palmer, Clare Voss, Cynthia Schneider, Boyan Onyshkevych
To combat COVID-19, both clinicians and scientists need to digest vast amounts of relevant biomedical knowledge in scientific literature to understand the disease mechanism and related biological functions.