Search Results for author: Yangyang Guo

Found 38 papers, 23 papers with code

Technical Report for ICML 2024 TiFA Workshop MLLM Attack Challenge: Suffix Injection and Projected Gradient Descent Can Easily Fool An MLLM

no code implementations20 Dec 2024 Yangyang Guo, Ziwei Xu, Xilie Xu, Yongkang Wong, Liqiang Nie, Mohan Kankanhalli

This technical report introduces our top-ranked solution that employs two approaches, \ie suffix injection and projected gradient descent (PGD) , to address the TiFA workshop MLLM attack challenge.

VidHal: Benchmarking Temporal Hallucinations in Vision LLMs

1 code implementation25 Nov 2024 Wey Yeh Choong, Yangyang Guo, Mohan Kankanhalli

Through our benchmark, we aim to inspire further research on 1) holistic understanding of VLLM capabilities, particularly regarding hallucination, and 2) extensive development of advanced VLLMs to alleviate this problem.

Benchmarking Hallucination

Joint Vision-Language Social Bias Removal for CLIP

no code implementations19 Nov 2024 Haoyu Zhang, Yangyang Guo, Mohan Kankanhalli

Additionally, we advocate a new evaluation protocol that can 1) holistically quantify the model debiasing and V-L alignment ability, and 2) evaluate the generalization of social bias removal models.

Attribute

SCAN: Bootstrapping Contrastive Pre-training for Data Efficiency

1 code implementation14 Nov 2024 Yangyang Guo, Mohan Kankanhalli

In particular, we individually pre-train seven CLIP models on two large-scale image-text pair datasets, and two MoCo models on the ImageNet dataset, resulting in a total of 16 pre-trained models.

The VLLM Safety Paradox: Dual Ease in Jailbreak Attack and Defense

no code implementations13 Nov 2024 Yangyang Guo, Fangkai Jiao, Liqiang Nie, Mohan Kankanhalli

The vulnerability of Vision Large Language Models (VLLMs) to jailbreak attacks appears as no surprise.

Less is more: Embracing sparsity and interpolation with Esiformer for time series forecasting

1 code implementation8 Oct 2024 Yangyang Guo, Yanjun Zhao, Sizhe Dang, Tian Zhou, Liang Sun, Yi Qian

It can enhance the representation ability of the model effectively, and maintain the excellent robustness, avoiding the risk of overfitting compared with the vanilla implementation.

Multivariate Time Series Forecasting Time Series

A Distance Similarity-based Genetic Optimization Algorithm for Satellite Ground Network Planning Considering Feeding Mode

no code implementations29 Aug 2024 Yingying Ren, Qiuli Li, Yangyang Guo, Witold Pedrycz, Lining Xing, Anfeng Liu, Yanjie Song

In this paper, we hope to provide a task execution scheme that maximizes the profit of the networking task for satellite ground network planning considering feeding mode (SGNPFM).

Scheduling

Social Debiasing for Fair Multi-modal LLMs

no code implementations13 Aug 2024 Harry Cheng, Yangyang Guo, Qingpei Guo, Ming Yang, Tian Gan, Liqiang Nie

Multi-modal Large Language Models (MLLMs) have advanced significantly, offering powerful vision-language understanding capabilities.

counterfactual

Diffusion Facial Forgery Detection

1 code implementation29 Jan 2024 Harry Cheng, Yangyang Guo, Tianyi Wang, Liqiang Nie, Mohan Kankanhalli

In particular, this dataset leverages 30, 000 carefully collected textual and visual prompts, ensuring the synthesis of images with both high fidelity and semantic consistency.

Image Generation

Bilateral Adaptation for Human-Object Interaction Detection with Occlusion-Robustness

no code implementations CVPR 2024 Guangzhi Wang, Yangyang Guo, Ziwei Xu, Mohan Kankanhalli

Human-Object Interaction (HOI) Detection constitutes an important aspect of human-centric scene understanding which requires precise object detection and interaction recognition.

Human-Object Interaction Detection object-detection +2

UNK-VQA: A Dataset and a Probe into the Abstention Ability of Multi-modal Large Models

1 code implementation17 Oct 2023 Yangyang Guo, Fangkai Jiao, Zhiqi Shen, Liqiang Nie, Mohan Kankanhalli

Teaching Visual Question Answering (VQA) models to refrain from answering unanswerable questions is necessary for building a trustworthy AI system.

Attribute Question Answering +1

PELA: Learning Parameter-Efficient Models with Low-Rank Approximation

1 code implementation CVPR 2024 Yangyang Guo, Guangzhi Wang, Mohan Kankanhalli

This allows for direct and efficient utilization of the low-rank model for downstream fine-tuning tasks.

Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration

1 code implementation27 Jul 2023 Harry Cheng, Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Mohan Kankanhalli

Training an effective video action recognition model poses significant computational challenges, particularly under limited resource budgets.

Action Recognition Temporal Action Localization

Towards Generalizable Deepfake Detection by Primary Region Regularization

no code implementations24 Jul 2023 Harry Cheng, Yangyang Guo, Tianyi Wang, Liqiang Nie, Mohan Kankanhalli

The existing deepfake detection methods have reached a bottleneck in generalizing to unseen forgeries and manipulation approaches.

DeepFake Detection Face Swapping

Mining Conditional Part Semantics with Occluded Extrapolation for Human-Object Interaction Detection

no code implementations19 Jul 2023 Guangzhi Wang, Yangyang Guo, Mohan Kankanhalli

Human-Object Interaction Detection is a crucial aspect of human-centric scene understanding, with important applications in various domains.

Human-Object Interaction Detection Object +1

Learning to Agree on Vision Attention for Visual Commonsense Reasoning

no code implementations4 Feb 2023 Zhenyang Li, Yangyang Guo, Kejie Wang, Fan Liu, Liqiang Nie, Mohan Kankanhalli

Visual Commonsense Reasoning (VCR) remains a significant yet challenging research problem in the realm of visual reasoning.

Visual Commonsense Reasoning

Distance Matters in Human-Object Interaction Detection

1 code implementation5 Jul 2022 Guangzhi Wang, Yangyang Guo, Yongkang Wong, Mohan Kankanhalli

2) Insufficient number of distant interactions in benchmark datasets results in under-fitting on these instances.

Human-Object Interaction Detection Object +1

A Unified End-to-End Retriever-Reader Framework for Knowledge-based VQA

1 code implementation30 Jun 2022 Yangyang Guo, Liqiang Nie, Yongkang Wong, Yibing Liu, Zhiyong Cheng, Mohan Kankanhalli

On the other hand, pertaining to the implicit knowledge, the multi-modal implicit knowledge for knowledge-based VQA still remains largely unexplored.

Question Answering Retrieval +1

Voice-Face Homogeneity Tells Deepfake

no code implementations4 Mar 2022 Harry Cheng, Yangyang Guo, Tianyi Wang, Qi Li, Xiaojun Chang, Liqiang Nie

To this end, a voice-face matching method is devised to measure the matching degree of these two.

Joint Answering and Explanation for Visual Commonsense Reasoning

1 code implementation25 Feb 2022 Zhenyang Li, Yangyang Guo, Kejie Wang, Yinwei Wei, Liqiang Nie, Mohan Kankanhalli

Given that our framework is model-agnostic, we apply it to the existing popular baselines and validate its effectiveness on the benchmark dataset.

Knowledge Distillation Question Answering +2

On Modality Bias Recognition and Reduction

1 code implementation25 Feb 2022 Yangyang Guo, Liqiang Nie, Harry Cheng, Zhiyong Cheng, Mohan Kankanhalli, Alberto del Bimbo

From the results on four datasets regarding the above three tasks, our method yields remarkable performance improvements compared with the baselines, demonstrating its superiority on reducing the modality bias problem.

Action Recognition Multi-modal Classification +3

When Product Search Meets Collaborative Filtering: A Hierarchical Heterogeneous Graph Neural Network Approach

no code implementations17 Aug 2021 Xiangkun Yin, Yangyang Guo, Liqiang Nie, Zhiyong Cheng

In addition, we empirically prove that collaborative filtering and semantic matching are complementary to each other in product search performance enhancement.

Collaborative Filtering Graph Neural Network +2

Review Polarity-wise Recommender

1 code implementation8 Jun 2021 Han Liu, Yangyang Guo, Jianhua Yin, Zan Gao, Liqiang Nie

To be specific, in this model, positive and negative reviews are separately gathered and utilized to model the user-preferred and user-rejected aspects, respectively.

Recommendation Systems

AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss

1 code implementation5 May 2021 Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Feng Ji, Ji Zhang, Alberto del Bimbo

Experimental results demonstrate that our adapted margin cosine loss can greatly enhance the baseline models with an absolute performance gain of 15\% on average, strongly verifying the potential of tackling the language prior problem in VQA from the angle of the answer feature space learning.

Question Answering Visual Question Answering

OneStop QAMaker: Extract Question-Answer Pairs from Text in a One-Stop Approach

no code implementations24 Feb 2021 Shaobo Cui, Xintong Bao, Xinxing Zu, Yangyang Guo, Zhongzhou Zhao, Ji Zhang, Haiqing Chen

This pipeline approach, however, is undesired in mining the most appropriate QA pairs from documents since it ignores the connection between question generation and answer extraction, which may lead to incompatible QA pair generation, i. e., the selected answer span is inappropriate for question generation.

Machine Reading Comprehension Question Answering +2

Feature-level Attentive ICF for Recommendation

1 code implementation22 Feb 2021 Zhiyong Cheng, Fan Liu, Shenghan Mei, Yangyang Guo, Lei Zhu, Liqiang Nie

To demonstrate the effectiveness of our method, we design a light attention neural network to integrate both item-level and feature-level attention for neural ICF models.

Collaborative Filtering Recommendation Systems

Answer Questions with Right Image Regions: A Visual Attention Regularization Approach

1 code implementation3 Feb 2021 Yibing Liu, Yangyang Guo, Jianhua Yin, Xuemeng Song, Weifeng Liu, Liqiang Nie

However, recent studies have pointed out that the highlighted image regions from the visual attention are often irrelevant to the given question and answer, leading to model confusion for correct visual reasoning.

Question Answering Visual Grounding +2

Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance View

1 code implementation30 Oct 2020 Yangyang Guo, Liqiang Nie, Zhiyong Cheng, Qi Tian, Min Zhang

Concretely, we design a novel interpretation scheme whereby the loss of mis-predicted frequent and sparse answers of the same question type is distinctly exhibited during the late training phase.

Face Recognition Image Classification +2

Enhancing Factorization Machines with Generalized Metric Learning

1 code implementation20 Jun 2020 Yangyang Guo, Zhiyong Cheng, Jiazheng Jing, Yanpeng Lin, Liqiang Nie, Meng Wang

Traditional FMs adopt the inner product to model the second-order interactions between different attributes, which are represented via feature vectors.

Attribute Metric Learning +1

Quantifying and Alleviating the Language Prior Problem in Visual Question Answering

1 code implementation13 May 2019 Yangyang Guo, Zhiyong Cheng, Liqiang Nie, Yibing Liu, Yinglong Wang, Mohan Kankanhalli

Benefiting from the advancement of computer vision, natural language processing and information retrieval techniques, visual question answering (VQA), which aims to answer questions about an image or a video, has received lots of attentions over the past few years.

Information Retrieval Question Answering +2

Cannot find the paper you are looking for? You can Submit a new open access paper.