no code implementations • 20 Dec 2024 • Yangyang Guo, Ziwei Xu, Xilie Xu, Yongkang Wong, Liqiang Nie, Mohan Kankanhalli
This technical report introduces our top-ranked solution that employs two approaches, \ie suffix injection and projected gradient descent (PGD) , to address the TiFA workshop MLLM attack challenge.
1 code implementation • 25 Nov 2024 • Wey Yeh Choong, Yangyang Guo, Mohan Kankanhalli
Through our benchmark, we aim to inspire further research on 1) holistic understanding of VLLM capabilities, particularly regarding hallucination, and 2) extensive development of advanced VLLMs to alleviate this problem.
no code implementations • 19 Nov 2024 • Haoyu Zhang, Yangyang Guo, Mohan Kankanhalli
Additionally, we advocate a new evaluation protocol that can 1) holistically quantify the model debiasing and V-L alignment ability, and 2) evaluate the generalization of social bias removal models.
1 code implementation • 14 Nov 2024 • Yangyang Guo, Mohan Kankanhalli
In particular, we individually pre-train seven CLIP models on two large-scale image-text pair datasets, and two MoCo models on the ImageNet dataset, resulting in a total of 16 pre-trained models.
no code implementations • 13 Nov 2024 • Yangyang Guo, Fangkai Jiao, Liqiang Nie, Mohan Kankanhalli
The vulnerability of Vision Large Language Models (VLLMs) to jailbreak attacks appears as no surprise.
1 code implementation • 8 Oct 2024 • Yangyang Guo, Yanjun Zhao, Sizhe Dang, Tian Zhou, Liang Sun, Yi Qian
It can enhance the representation ability of the model effectively, and maintain the excellent robustness, avoiding the risk of overfitting compared with the vanilla implementation.
no code implementations • 29 Aug 2024 • Yingying Ren, Qiuli Li, Yangyang Guo, Witold Pedrycz, Lining Xing, Anfeng Liu, Yanjie Song
In this paper, we hope to provide a task execution scheme that maximizes the profit of the networking task for satellite ground network planning considering feeding mode (SGNPFM).
no code implementations • 13 Aug 2024 • Harry Cheng, Yangyang Guo, Qingpei Guo, Ming Yang, Tian Gan, Liqiang Nie
Multi-modal Large Language Models (MLLMs) have advanced significantly, offering powerful vision-language understanding capabilities.
no code implementations • 27 May 2024 • Zhenyang Li, Yangyang Guo, Kejie Wang, Xiaolin Chen, Liqiang Nie, Mohan Kankanhalli
Visual Commonsense Reasoning (VCR) calls for explanatory reasoning behind question answering over visual scenes.
1 code implementation • 29 Jan 2024 • Harry Cheng, Yangyang Guo, Tianyi Wang, Liqiang Nie, Mohan Kankanhalli
In particular, this dataset leverages 30, 000 carefully collected textual and visual prompts, ensuring the synthesis of images with both high fidelity and semantic consistency.
no code implementations • CVPR 2024 • Guangzhi Wang, Yangyang Guo, Ziwei Xu, Mohan Kankanhalli
Human-Object Interaction (HOI) Detection constitutes an important aspect of human-centric scene understanding which requires precise object detection and interaction recognition.
1 code implementation • 17 Oct 2023 • Yangyang Guo, Fangkai Jiao, Zhiqi Shen, Liqiang Nie, Mohan Kankanhalli
Teaching Visual Question Answering (VQA) models to refrain from answering unanswerable questions is necessary for building a trustworthy AI system.
1 code implementation • CVPR 2024 • Yangyang Guo, Guangzhi Wang, Mohan Kankanhalli
This allows for direct and efficient utilization of the low-rank model for downstream fine-tuning tasks.
1 code implementation • 28 Sep 2023 • Yangyang Guo, Haoyu Zhang, Yongkang Wong, Liqiang Nie, Mohan Kankanhalli
Learning a versatile language-image model is computationally prohibitive under a limited computing budget.
no code implementations • 25 Aug 2023 • Yanjie Song, Yutong Wu, Yangyang Guo, Ran Yan, P. N. Suganthan, Yue Zhang, Witold Pedrycz, Swagatam Das, Rammohan Mallipeddi, Oladayo Solomon Ajani. Qiang Feng
Reinforcement learning (RL) integrated as a component in the EA framework has demonstrated superior performance in recent years.
1 code implementation • 27 Jul 2023 • Harry Cheng, Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Mohan Kankanhalli
Training an effective video action recognition model poses significant computational challenges, particularly under limited resource budgets.
no code implementations • 24 Jul 2023 • Harry Cheng, Yangyang Guo, Tianyi Wang, Liqiang Nie, Mohan Kankanhalli
The existing deepfake detection methods have reached a bottleneck in generalizing to unseen forgeries and manipulation approaches.
no code implementations • 19 Jul 2023 • Guangzhi Wang, Yangyang Guo, Mohan Kankanhalli
Human-Object Interaction Detection is a crucial aspect of human-centric scene understanding, with important applications in various domains.
no code implementations • 8 Apr 2023 • Yangyang Guo, Hao Wang, Lei He, Witold Pedrycz, P. N. Suganthan, Yanjie Song
The RL-GP adopts the ensemble population strategies.
no code implementations • 4 Feb 2023 • Zhenyang Li, Yangyang Guo, Kejie Wang, Fan Liu, Liqiang Nie, Mohan Kankanhalli
Visual Commonsense Reasoning (VCR) remains a significant yet challenging research problem in the realm of visual reasoning.
1 code implementation • 6 Jul 2022 • Guangzhi Wang, Yangyang Guo, Yongkang Wong, Mohan Kankanhalli
To quantitatively study the object bias problem, we advocate a new protocol for evaluating model performance.
1 code implementation • 5 Jul 2022 • Guangzhi Wang, Yangyang Guo, Yongkang Wong, Mohan Kankanhalli
2) Insufficient number of distant interactions in benchmark datasets results in under-fitting on these instances.
1 code implementation • 30 Jun 2022 • Yangyang Guo, Liqiang Nie, Yongkang Wong, Yibing Liu, Zhiyong Cheng, Mohan Kankanhalli
On the other hand, pertaining to the implicit knowledge, the multi-modal implicit knowledge for knowledge-based VQA still remains largely unexplored.
no code implementations • 4 Mar 2022 • Harry Cheng, Yangyang Guo, Tianyi Wang, Qi Li, Xiaojun Chang, Liqiang Nie
To this end, a voice-face matching method is devised to measure the matching degree of these two.
1 code implementation • Findings (ACL) 2022 • Fangkai Jiao, Yangyang Guo, Xuemeng Song, Liqiang Nie
Logical reasoning is of vital importance to natural language understanding.
Ranked #3 on
Reading Comprehension
on ReClor
1 code implementation • 25 Feb 2022 • Zhenyang Li, Yangyang Guo, Kejie Wang, Yinwei Wei, Liqiang Nie, Mohan Kankanhalli
Given that our framework is model-agnostic, we apply it to the existing popular baselines and validate its effectiveness on the benchmark dataset.
1 code implementation • 25 Feb 2022 • Yangyang Guo, Liqiang Nie, Harry Cheng, Zhiyong Cheng, Mohan Kankanhalli, Alberto del Bimbo
From the results on four datasets regarding the above three tasks, our method yields remarkable performance improvements compared with the baselines, demonstrating its superiority on reducing the modality bias problem.
1 code implementation • 28 Jan 2022 • Yibing Liu, Haoliang Li, Yangyang Guo, Chenqi Kong, Jing Li, Shiqi Wang
Attention mechanisms are dominating the explainability of deep models.
no code implementations • 17 Aug 2021 • Xiangkun Yin, Yangyang Guo, Liqiang Nie, Zhiyong Cheng
In addition, we empirically prove that collaborative filtering and semantic matching are complementary to each other in product search performance enhancement.
1 code implementation • 8 Jun 2021 • Han Liu, Yangyang Guo, Jianhua Yin, Zan Gao, Liqiang Nie
To be specific, in this model, positive and negative reviews are separately gathered and utilized to model the user-preferred and user-rejected aspects, respectively.
1 code implementation • Findings (ACL) 2021 • Fangkai Jiao, Yangyang Guo, Yilin Niu, Feng Ji, Feng-Lin Li, Liqiang Nie
Pre-trained Language Models (PLMs) have achieved great success on Machine Reading Comprehension (MRC) over the past few years.
1 code implementation • 5 May 2021 • Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Feng Ji, Ji Zhang, Alberto del Bimbo
Experimental results demonstrate that our adapted margin cosine loss can greatly enhance the baseline models with an absolute performance gain of 15\% on average, strongly verifying the potential of tackling the language prior problem in VQA from the angle of the answer feature space learning.
no code implementations • 24 Feb 2021 • Shaobo Cui, Xintong Bao, Xinxing Zu, Yangyang Guo, Zhongzhou Zhao, Ji Zhang, Haiqing Chen
This pipeline approach, however, is undesired in mining the most appropriate QA pairs from documents since it ignores the connection between question generation and answer extraction, which may lead to incompatible QA pair generation, i. e., the selected answer span is inappropriate for question generation.
1 code implementation • 22 Feb 2021 • Zhiyong Cheng, Fan Liu, Shenghan Mei, Yangyang Guo, Lei Zhu, Liqiang Nie
To demonstrate the effectiveness of our method, we design a light attention neural network to integrate both item-level and feature-level attention for neural ICF models.
1 code implementation • 3 Feb 2021 • Yibing Liu, Yangyang Guo, Jianhua Yin, Xuemeng Song, Weifeng Liu, Liqiang Nie
However, recent studies have pointed out that the highlighted image regions from the visual attention are often irrelevant to the given question and answer, leading to model confusion for correct visual reasoning.
1 code implementation • 30 Oct 2020 • Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Qi Tian, Min Zhang
Concretely, we design a novel interpretation scheme whereby the loss of mis-predicted frequent and sparse answers of the same question type is distinctly exhibited during the late training phase.
1 code implementation • 20 Jun 2020 • Yangyang Guo, Zhiyong Cheng, Jiazheng Jing, Yanpeng Lin, Liqiang Nie, Meng Wang
Traditional FMs adopt the inner product to model the second-order interactions between different attributes, which are represented via feature vectors.
1 code implementation • 13 May 2019 • Yangyang Guo, Zhiyong Cheng, Liqiang Nie, Yibing Liu, Yinglong Wang, Mohan Kankanhalli
Benefiting from the advancement of computer vision, natural language processing and information retrieval techniques, visual question answering (VQA), which aims to answer questions about an image or a video, has received lots of attentions over the past few years.