no code implementations • 29 May 2025 • Haokun Chen, Yueqi Zhang, Yuan Bi, Yao Zhang, Tong Liu, Jinhe Bi, Jian Lan, Jindong Gu, Claudia Grosser, Denis Krompass, Nassir Navab, Volker Tresp
In this work, we introduce a comprehensive auditing framework for unlearning evaluation, comprising three benchmark datasets, six unlearning algorithms, and five prompt-based auditing methods.
1 code implementation • 19 May 2025 • Zekai Li, Xinhao Zhong, Samir Khaki, Zhiyuan Liang, Yuhao Zhou, Mingjia Shi, Ziqiao Wang, Xuanlei Zhao, Wangbo Zhao, Ziheng Qin, Mengxuan Wu, Pengfei Zhou, Haonan Wang, David Junhao Zhang, Jia-Wei Liu, Shaobo Wang, Dai Liu, Linfeng Zhang, Guang Li, Kun Wang, Zheng Zhu, Zhiheng Ma, Joey Tianyi Zhou, Jiancheng Lv, Yaochu Jin, Peihao Wang, Kaipeng Zhang, Lingjuan Lyu, Yiran Huang, Zeynep Akata, Zhiwei Deng, Xindi Wu, George Cazenavette, Yuzhang Shang, Justin Cui, Jindong Gu, Qian Zheng, Hao Ye, Shuo Wang, Xiaobo Wang, Yan Yan, Angela Yao, Mike Zheng Shou, Tianlong Chen, Hakan Bilen, Baharan Mirzasoleiman, Manolis Kellis, Konstantinos N. Plataniotis, Zhangyang Wang, Bo Zhao, Yang You, Kai Wang
In recent years, dataset distillation has provided a reliable solution for data compression, where models trained on the resulting smaller synthetic datasets achieve performance comparable to those trained on the original datasets.
no code implementations • 22 Apr 2025 • Kun Wang, Guibin Zhang, Zhenhong Zhou, Jiahao Wu, Miao Yu, Shiqian Zhao, Chenlong Yin, Jinhu Fu, Yibo Yan, Hanjun Luo, Liang Lin, Zhihao Xu, Haolang Lu, Xinye Cao, Xinyun Zhou, Weifei Jin, Fanci Meng, Junyuan Mao, Yu Wang, Hao Wu, Minghe Wang, Fan Zhang, Junfeng Fang, Wenjie Qu, Yue Liu, Chengwei Liu, Yifan Zhang, Qiankun Li, Chongye Guo, Yalan Qin, Zhaoxin Fan, Yi Ding, Donghai Hong, Jiaming Ji, Yingxin Lai, Zitong Yu, Xinfeng Li, Yifan Jiang, Yanhui Li, Xinyu Deng, Junlin Wu, Dongxia Wang, Yihao Huang, Yufei Guo, Jen-tse Huang, Qiufeng Wang, Wenxuan Wang, Dongrui Liu, Yanwei Yue, Wenke Huang, Guancheng Wan, Heng Chang, Tianlin Li, Yi Yu, Chenghao Li, Jiawei Li, Lei Bai, Jie Zhang, Qing Guo, Jingyi Wang, Tianlong Chen, Joey Tianyi Zhou, Xiaojun Jia, Weisong Sun, Cong Wu, Jing Chen, Xuming Hu, Yiming Li, Xiao Wang, Ningyu Zhang, Luu Anh Tuan, Guowen Xu, Jiaheng Zhang, Tianwei Zhang, Xingjun Ma, Jindong Gu, Xiang Wang, Bo An, Jun Sun, Mohit Bansal, Shirui Pan, Lingjuan Lyu, Yuval Elovici, Bhavya Kailkhura, Yaodong Yang, Hongwei Li, Wenyuan Xu, Yizhou Sun, Wei Wang, Qing Li, Ke Tang, Yu-Gang Jiang, Felix Juefei-Xu, Hui Xiong, XiaoFeng Wang, DaCheng Tao, Philip S. Yu, Qingsong Wen, Yang Liu
Currently, existing surveys on LLM safety primarily focus on specific stages of the LLM lifecycle, e. g., deployment phase or fine-tuning phase, lacking a comprehensive understanding of the entire "lifechain" of LLMs.
2 code implementations • 15 Apr 2025 • Divyansh Garg, Shaun VanWeelden, Diego Caples, Andis Draguns, Nikil Ravi, Pranav Putta, Naman Garg, Tomas Abraham, Michael Lara, Federico Lopez, James Liu, Atharva Gundawar, Prannay Hebbar, Youngchul Joo, Jindong Gu, Charles London, Christian Schroeder de Witt, Sumeet Motwani
We introduce REAL, a benchmark and framework for multi-turn agent evaluations on deterministic simulations of real-world websites.
no code implementations • 2 Apr 2025 • Souradip Chakraborty, Mohammadreza Pourreza, Ruoxi Sun, Yiwen Song, Nino Scherrer, Jindong Gu, Furong Huang, Amrit Singh Bedi, Ahmad Beirami, Hamid Palangi, Tomas Pfister
Hence, we propose Iterative Agent Decoding (IAD) which combines iterative refinement with dynamic candidate evaluation and selection guided by a verifier.
no code implementations • 1 Apr 2025 • Wenjun Zeng, Dana Kurniawan, Ryan Mullins, Yuchi Liu, Tamoghna Saha, Dirichi Ike-Njoku, Jindong Gu, Yiwen Song, Cai Xu, Jingjing Zhou, Aparna Joshi, Shravan Dheep, Mani Malek, Hamid Palangi, Joon Baek, Rick Pereira, Karthik Narasimhan
We introduce ShieldGemma 2, a 4B parameter image content moderation model built on Gemma 3.
no code implementations • 14 Mar 2025 • Hao Cheng, Erjia Xiao, Yichi Wang, Kaidi Xu, Mengshu Sun, Jindong Gu, Renjing Xu
To better observe performance modifications and characteristics of this threat, we also introduce the TVPI Dataset.
no code implementations • 10 Mar 2025 • Fan Yin, Zifeng Wang, I-Hung Hsu, Jun Yan, Ke Jiang, Yanfei Chen, Jindong Gu, Long T. Le, Kai-Wei Chang, Chen-Yu Lee, Hamid Palangi, Tomas Pfister
To address this, we propose Magnet, a principled framework for synthesizing high-quality training trajectories to enhance the function calling capability of large language model agents in multi-turn conversations with humans.
no code implementations • 27 Feb 2025 • Chenhe Gu, Jindong Gu, Andong Hua, Yao Qin
Multimodal Large Language Models (MLLMs), built upon LLMs, have recently gained attention for their capabilities in image recognition and understanding.
no code implementations • 22 Feb 2025 • Mihir Parmar, Xin Liu, Palash Goyal, Yanfei Chen, Long Le, Swaroop Mishra, Hossein Mobahi, Jindong Gu, Zifeng Wang, Hootan Nakhost, Chitta Baral, Chen-Yu Lee, Tomas Pfister, Hamid Palangi
Recent agent frameworks and inference-time algorithms often struggle with complex planning problems due to limitations in verifying generated plans or reasoning and varying complexity of instances within a single task.
1 code implementation • 2 Feb 2025 • Xingjun Ma, Yifeng Gao, Yixu Wang, Ruofan Wang, Xin Wang, Ye Sun, Yifan Ding, Hengyuan Xu, Yunhao Chen, Yunhan Zhao, Hanxun Huang, Yige Li, Jiaming Zhang, Xiang Zheng, Yang Bai, Zuxuan Wu, Xipeng Qiu, Jingfeng Zhang, Yiming Li, Xudong Han, Haonan Li, Jun Sun, Cong Wang, Jindong Gu, Baoyuan Wu, Siheng Chen, Tianwei Zhang, Yang Liu, Mingming Gong, Tongliang Liu, Shirui Pan, Cihang Xie, Tianyu Pang, Yinpeng Dong, Ruoxi Jia, Yang Zhang, Shiqing Ma, Xiangyu Zhang, Neil Gong, Chaowei Xiao, Sarah Erfani, Tim Baldwin, Bo Li, Masashi Sugiyama, DaCheng Tao, James Bailey, Yu-Gang Jiang
The rapid advancement of large models, driven by their exceptional abilities in learning and generalization through large-scale pre-training, has reshaped the landscape of Artificial Intelligence (AI).
no code implementations • 23 Jan 2025 • Hao Cheng, Erjia Xiao, Jing Shao, Yichi Wang, Le Yang, Chao Shen, Philip Torr, Jindong Gu, Renjing Xu
Integrating various modality encoders further expands their capabilities, giving rise to Multimodal Large Language Models (MLLMs) that process not only text but also visual and auditory modality inputs.
no code implementations • 11 Jan 2025 • Tong Liu, Xiao Yu, Wenxuan Zhou, Jindong Gu, Volker Tresp
These algorithms implicitly treat the LLM as a reward model, and focus on training it to correct misranked preference pairs.
1 code implementation • 11 Jan 2025 • Shan Zhang, Aotian Chen, Yanpeng Sun, Jindong Gu, Yi-Yu Zheng, Piotr Koniusz, Kai Zou, Anton Van Den Hengel, Yuan Xue
Current multimodal large language models (MLLMs) often underperform on mathematical problem-solving tasks that require fine-grained visual understanding.
no code implementations • 13 Dec 2024 • Runtao Liu, Chen I Chieh, Jindong Gu, Jipeng Zhang, Renjie Pi, Qifeng Chen, Philip Torr, Ashkan Khakzar, Fabio Pizzati
Using a custom DPO strategy and this dataset, we train safety experts, in the form of low-rank adaptation (LoRA) matrices, able to guide the generation process away from specific safety-related concepts.
no code implementations • CVPR 2025 • Hao Cheng, Erjia Xiao, Jiayan Yang, Jiahang Cao, Qiang Zhang, Jize Zhang, Kaidi Xu, Jindong Gu, Renjing Xu
Current image generation models can effortlessly produce high-quality, highly realistic images, but this also increases the risk of misuse.
no code implementations • 6 Dec 2024 • Kuofeng Gao, Shu-Tao Xia, Ke Xu, Philip Torr, Jindong Gu
Recent advances, such as GPT-4o, have enabled LALMs in back-and-forth audio dialogues with humans.
no code implementations • 25 Nov 2024 • KaiZhou Li, Jindong Gu, Xinchun Yu, Junjie Cao, Yansong Tang, Xiao-Ping Zhang
The security risks of AI-driven video editing have garnered significant attention.
1 code implementation • CVPR 2025 • Tanveer Hannan, Md Mohaiminul Islam, Jindong Gu, Thomas Seidl, Gedas Bertasius
We propose ReVisionLLM, a recursive vision-language model designed to locate events in hour-long videos.
no code implementations • CVPR 2025 • Haokun Chen, Hang Li, Yao Zhang, Jinhe Bi, Gengyuan Zhang, Yueqi Zhang, Philip Torr, Jindong Gu, Denis Krompass, Volker Tresp
However, directly applying pretrained LDM to heterogeneous OSFL results in significant distribution shifts in synthetic data, leading to performance degradation in classification models trained on such data.
no code implementations • 28 Sep 2024 • Haowei Zhang, Jianzhe Liu, Zhen Han, Shuo Chen, Bailan He, Volker Tresp, Zhiqiang Xu, Jindong Gu
The finetuning pipeline consists of our proposed dataset and a training objective for selective decomposition.
no code implementations • 27 Sep 2024 • Tong Liu, Zhixin Lai, Gengyuan Zhang, Philip Torr, Vera Demberg, Volker Tresp, Jindong Gu
This work introduces a novel type of jailbreak, which triggers T2I models to generate the image with visual text, where the image and the text, although considered to be safe in isolation, combine to form unsafe content.
no code implementations • 20 Sep 2024 • Hao Cheng, Erjia Xiao, Chengyuan Yu, Zhao Yao, Jiahang Cao, Qiang Zhang, Jiaxu Wang, Mengshu Sun, Kaidi Xu, Jindong Gu, Renjing Xu
Recently, driven by advancements in Multimodal Large Language Models (MLLMs), Vision Language Action Models (VLAMs) are being proposed to achieve better performance in open-vocabulary scenarios for robotic manipulation tasks.
no code implementations • 25 Aug 2024 • Sensen Gao, Xiaojun Jia, Yihao Huang, Ranjie Duan, Jindong Gu, Yang Bai, Yang Liu, Qing Guo
Text-to-Image(T2I) models have achieved remarkable success in image generation and editing, yet these models still have many potential issues, particularly in generating inappropriate or Not-Safe-For-Work(NSFW) content.
1 code implementation • 29 Jul 2024 • Canyu Chen, Baixiang Huang, Zekun Li, Zhaorun Chen, Shiyang Lai, Xiongxiao Xu, Jia-Chen Gu, Jindong Gu, Huaxiu Yao, Chaowei Xiao, Xifeng Yan, William Yang Wang, Philip Torr, Dawn Song, Kai Shu
Then, we find that editing attacks can inject both types of misinformation into LLMs, and the effectiveness is particularly high for commonsense misinformation injection.
1 code implementation • 19 Jul 2024 • Dai Liu, Jindong Gu, Hu Cao, Carsten Trinitis, Martin Schulz
Dataset Distillation is used to create a concise, yet informative, synthetic dataset that can replace the original dataset for training purposes.
no code implementations • 28 Jun 2024 • Jinming Li, Yichen Zhu, Zhiyuan Xu, Jindong Gu, Minjie Zhu, Xin Liu, Ning Liu, Yaxin Peng, Feifei Feng, Jian Tang
It is fundamentally challenging for robots to serve as useful assistants in human environments because this requires addressing a spectrum of sub-problems across robotics, including perception, language understanding, reasoning, and planning.
no code implementations • CVPR 2025 • Gengyuan Zhang, Mang Ling Ada Fok, Jialu Ma, Yan Xia, Daniel Cremers, Philip Torr, Volker Tresp, Jindong Gu
Localizing events in videos based on semantic queries is a pivotal task in video understanding, with the growing significance of user-oriented applications like video search.
no code implementations • 7 Jun 2024 • Thomas Decker, Ananta R. Bhattarai, Jindong Gu, Volker Tresp, Florian Buettner
Using feature attributions for post-hoc explanations is a common practice to understand and verify the predictions of opaque machine learning models.
no code implementations • 5 Jun 2024 • Razieh Rezaei, Masoud Jalili Sabet, Jindong Gu, Daniel Rueckert, Philip Torr, Ashkan Khakzar
The learned visual prompt, added to any input image would redirect the attention of the pre-trained vision transformer to its spatial location on the image.
1 code implementation • 31 May 2024 • Xiaojun Jia, Tianyu Pang, Chao Du, Yihao Huang, Jindong Gu, Yang Liu, Xiaochun Cao, Min Lin
Many red-teaming efforts aim to jailbreak LLMs, where among these efforts, the Greedy Coordinate Gradient (GCG) attack's success has led to a growing interest in the study of optimization-based jailbreaking techniques.
no code implementations • 30 May 2024 • Hao Cheng, Erjia Xiao, Jiayan Yang, Jiahang Cao, Qiang Zhang, Le Yang, Jize Zhang, Kaidi Xu, Jindong Gu, Renjing Xu
Recently, Multimodal Large Language Models (MLLMs) achieve remarkable performance in numerous zero-shot tasks due to their outstanding cross-modal interaction and comprehension abilities.
1 code implementation • 16 May 2024 • Xianzheng Ma, Yash Bhalgat, Brandon Smart, Shuai Chen, Xinghui Li, Jian Ding, Jindong Gu, Dave Zhenyu Chen, Songyou Peng, Jia-Wang Bian, Philip H Torr, Marc Pollefeys, Matthias Nießner, Ian D Reid, Angel X. Chang, Iro Laina, Victor Adrian Prisacariu
Hence, with this paper, we aim to chart a course for future research that explores and expands the capabilities of 3D-LLMs in understanding and interacting with the complex 3D world.
no code implementations • 9 May 2024 • Yang Bai, Ge Pei, Jindong Gu, Yong Yang, Xingjun Ma
In this paper, we take a step further and show that certain special characters or their combinations with English letters are stronger memory triggers, leading to more severe data leakage.
no code implementations • 25 Apr 2024 • Kuofeng Gao, Jindong Gu, Yang Bai, Shu-Tao Xia, Philip Torr, Wei Liu, Zhifeng Li
For verbose videos, a frame feature diversity loss is proposed to increase the feature diversity among frames.
1 code implementation • 11 Apr 2024 • Runtao Liu, Ashkan Khakzar, Jindong Gu, Qifeng Chen, Philip Torr, Fabio Pizzati
Hence, we propose Latent Guard, a framework designed to improve safety measures in text-to-image generation.
no code implementations • 8 Apr 2024 • Jindong Gu
To answer the question, this paper investigates the practical responsible requirements of both textual and visual generative models, outlining five key considerations: generating truthful content, avoiding toxic content, refusing harmful instruction, leaking no training data-related content, and ensuring generated content identifiable.
1 code implementation • 4 Apr 2024 • Shuo Chen, Zhen Han, Bailan He, Zifeng Ding, Wenqian Yu, Philip Torr, Volker Tresp, Jindong Gu
Various jailbreak attacks have been proposed to red-team Large Language Models (LLMs) and revealed the vulnerable safeguards of LLMs.
1 code implementation • 3 Apr 2024 • Fengyuan Liu, Haochen Luo, Yiming Li, Philip Torr, Jindong Gu
In this work, we study the origin attribution of generated images in a practical setting where only a few images generated by a source model are available and the source model cannot be accessed.
no code implementations • 19 Mar 2024 • Anjun Hu, Jindong Gu, Francesco Pinto, Konstantinos Kamnitsas, Philip Torr
Foundation models pre-trained on web-scale vision-language data, such as CLIP, are widely used as cornerstones of powerful machine learning systems.
1 code implementation • 14 Mar 2024 • Haochen Luo, Jindong Gu, Fengyuan Liu, Philip Torr
Given that VLMs rely on prompts to adapt to different tasks, an intriguing question emerges: Can a single adversarial image mislead all predictions of VLMs when a thousand different prompts are given?
1 code implementation • CVPR 2024 • Tianrui Lou, Xiaojun Jia, Jindong Gu, Li Liu, Siyuan Liang, Bangyan He, Xiaochun Cao
We find that concealing deformation perturbations in areas insensitive to human eyes can achieve a better trade-off between imperceptibility and adversarial strength, specifically in parts of the object surface that are complex and exhibit drastic curvature changes.
no code implementations • 29 Feb 2024 • Hao Cheng, Erjia Xiao, Jindong Gu, Le Yang, Jinhao Duan, Jize Zhang, Jiahang Cao, Kaidi Xu, Renjing Xu
Large Vision-Language Models (LVLMs) rely on vision encoders and Large Language Models (LLMs) to exhibit remarkable capabilities on various multi-modal tasks in the joint space of vision and language.
1 code implementation • 22 Feb 2024 • Zefeng Wang, Zhen Han, Shuo Chen, Fan Xue, Zifeng Ding, Xun Xiao, Volker Tresp, Philip Torr, Jindong Gu
Based on our findings, we further propose a novel attack method, termed as stop-reasoning attack, that attacks the model while bypassing the CoT reasoning process.
1 code implementation • 7 Feb 2024 • Chengxing Xie, Canyu Chen, Feiran Jia, Ziyu Ye, Shiyang Lai, Kai Shu, Jindong Gu, Adel Bibi, Ziniu Hu, David Jurgens, James Evans, Philip Torr, Bernard Ghanem, Guohao Li
In this paper, we focus on one critical and elemental behavior in human interactions, trust, and investigate whether LLM agents can simulate human trust behavior.
1 code implementation • 20 Jan 2024 • Kuofeng Gao, Yang Bai, Jindong Gu, Shu-Tao Xia, Philip Torr, Zhifeng Li, Wei Liu
Once attackers maliciously induce high energy consumption and latency time (energy-latency cost) during inference of VLMs, it will exhaust computational resources.
no code implementations • 31 Dec 2023 • Xinwei Liu, Xiaojun Jia, Jindong Gu, Yuan Xun, Siyuan Liang, Xiaochun Cao
However, in this paper, we propose the Few-shot Learning Backdoor Attack (FLBA) to show that FSL can still be vulnerable to backdoor attacks.
1 code implementation • 29 Dec 2023 • Xingqiao Li, Jindong Gu, Zhiyong Wang, Yancheng Yuan, Bo Du, Fengxiang He
To address this issue, this paper proposes an eXplainable Multimodal Mortality Predictor (X-MMP) approaching an efficient, explainable AI solution for predicting in-hospital mortality via multimodal ICU data.
1 code implementation • CVPR 2024 • Andong Hua, Jindong Gu, Zhiyu Xue, Nicholas Carlini, Eric Wong, Yao Qin
Based on this, we propose Robust Linear Initialization (RoLI) for adversarial finetuning, which initializes the linear head with the weights obtained by adversarial linear probing to maximally inherit the robustness from pretraining.
no code implementations • 7 Dec 2023 • Dongchen Han, Xiaojun Jia, Yang Bai, Jindong Gu, Yang Liu, Xiaochun Cao
Investigating the generation of high-transferability adversarial examples is crucial for uncovering VLP models' vulnerabilities in practical scenarios.
no code implementations • 3 Dec 2023 • Xiaojun Jia, Jindong Gu, Yihao Huang, Simeng Qin, Qing Guo, Yang Liu, Xiaochun Cao
At the second stage, the pixels are divided into different branches based on their transferable property which is dependent on Kullback-Leibler divergence.
1 code implementation • 30 Nov 2023 • Avery Ma, Amir-Massoud Farahmand, Yangchen Pan, Philip Torr, Jindong Gu
During the alignment process, the parameters of the source model are fine-tuned to minimize an alignment loss.
no code implementations • 29 Nov 2023 • Shuo Chen, Zhen Han, Bailan He, Jianzhe Liu, Mark Buckley, Yao Qin, Philip Torr, Volker Tresp, Jindong Gu
Experiments revealed that multimodal ICL is predominantly driven by the textual content whereas the visual information in the demos has little influence.
2 code implementations • 29 Nov 2023 • Xin Liu, Yichen Zhu, Jindong Gu, Yunshi Lan, Chao Yang, Yu Qiao
The security concerns surrounding Large Language Models (LLMs) have been extensively explored, yet the safety of Multimodal Large Language Models (MLLMs) remains understudied.
1 code implementation • CVPR 2024 • Hang Li, Chengzhi Shen, Philip Torr, Volker Tresp, Jindong Gu
A risk with these models is the potential generation of inappropriate content, such as biased or harmful images.
1 code implementation • 24 Nov 2023 • Shitong Sun, Jindong Gu, Shaogang Gong
In this paper, we perform the first robustness study and establish three new diversified benchmarks for systematic analysis of text-image composed retrieval against natural corruptions in both vision and text and further probe textural understanding.
no code implementations • 21 Nov 2023 • Gengyuan Zhang, Jinhe Bi, Jindong Gu, Yanyu Chen, Volker Tresp
This raises a question: with such weak supervision, can video representation in video-language models gain the ability to distinguish even factual discrepancies in textual description and understand fine-grained events?
1 code implementation • 26 Oct 2023 • Jindong Gu, Xiaojun Jia, Pau de Jorge, Wenqain Yu, Xinwei Liu, Avery Ma, Yuan Xun, Anjun Hu, Ashkan Khakzar, Zhijiang Li, Xiaochun Cao, Philip Torr
This survey explores the landscape of the adversarial transferability of adversarial examples.
1 code implementation • 24 Oct 2023 • Xiaojun Jia, Jianshu Li, Jindong Gu, Yang Bai, Xiaochun Cao
Besides, we provide theoretical analysis to show the model robustness can be improved by the single-step adversarial training with sampled subnetworks.
1 code implementation • 15 Sep 2023 • Zhihao Hu, Yiran Xu, Mengnan Du, Jindong Gu, Xinmei Tian, Fengxiang He
Our adaptive reweighing method prioritizes samples closer to the decision boundary and assigns a higher weight to improve the generalizability of fair classifiers.
no code implementations • 12 Sep 2023 • Jindong Gu, Fangyun Wei, Philip Torr, Han Hu
In this work, we first taxonomize the stochastic defense strategies against QBBA.
1 code implementation • 22 Aug 2023 • Xiaojun Jia, Yuefeng Chen, Xiaofeng Mao, Ranjie Duan, Jindong Gu, Rong Zhang, Hui Xue, Xiaochun Cao
In this paper, we conduct a comprehensive study of over 10 fast adversarial training methods in terms of adversarial robustness and training costs.
1 code implementation • ICCV 2023 • Gengyuan Zhang, Jisen Ren, Jindong Gu, Volker Tresp
In this study, we introduce the Multi-event Video-Text Retrieval (MeVTR) task, addressing scenarios in which each video contains multiple different events, as a niche scenario of the conventional Video-Text Retrieval Task.
1 code implementation • 21 Aug 2023 • Haokun Chen, Yao Zhang, Denis Krompass, Jindong Gu, Volker Tresp
FedDAT is the first approach that enables an efficient distributed finetuning of foundation models for a variety of heterogeneous Vision-Language tasks.
1 code implementation • 17 Aug 2023 • Xuanlong Yu, Gianni Franchi, Jindong Gu, Emanuel Aldea
In this work, we propose a generalized AuxUE scheme for more robust uncertainty quantification on regression tasks.
2 code implementations • 16 Aug 2023 • Haokun Chen, Denis Krompass, Jindong Gu, Volker Tresp
This is mainly because their "training-after-tuning" framework is unsuitable for FL with limited client computation power.
2 code implementations • 24 Jul 2023 • Jindong Gu, Zhen Han, Shuo Chen, Ahmad Beirami, Bailan He, Gengyuan Zhang, Ruotong Liao, Yao Qin, Volker Tresp, Philip Torr
This paper aims to provide a comprehensive survey of cutting-edge research in prompt engineering on three types of vision-language models: multimodal-to-text generation models (e. g. Flamingo), image-text matching models (e. g.
no code implementations • 14 Jun 2023 • Wenqian Yu, Jindong Gu, Zhijiang Li, Philip Torr
Adversarial examples (AEs) with small adversarial perturbations can mislead deep neural networks (DNNs) into wrong predictions.
no code implementations • 17 Apr 2023 • Jindong Gu, Ahmad Beirami, Xuezhi Wang, Alex Beutel, Philip Torr, Yao Qin
With the advent of vision-language models (VLMs) that can perform in-context and prompt-based learning, how can we design prompting approaches that robustly generalize to distribution shift and can be used on novel classes outside the support set of the prompts?
1 code implementation • CVPR 2023 • Kuofeng Gao, Yang Bai, Jindong Gu, Yong Yang, Shu-Tao Xia
With the split clean data pool and polluted data pool, ASD successfully defends against backdoor attacks during training.
1 code implementation • 21 Mar 2023 • Haoheng Lan, Jindong Gu, Philip Torr, Hengshuang Zhao
In this work, we explore backdoor attacks on segmentation models to misclassify all pixels of a victim class by injecting a specific trigger on non-victim pixels during inferences, which is dubbed Influencer Backdoor Attack (IBA).
no code implementations • 3 Jan 2023 • Jindong Gu
The vulnerability of deep neural networks poses challenges to current visual classification models.
no code implementations • ICCV 2023 • Hang Li, Jindong Gu, Rajat Koner, Sahand Sharifzadeh, Volker Tresp
To study this question, we propose a reconstruction task where Flamingo generates a description for a given image and DALL-E uses this description as input to synthesize a new image.
no code implementations • 19 Nov 2022 • Yao Zhang, Haokun Chen, Ahmed Frikha, Yezi Yang, Denis Krompass, Gengyuan Zhang, Jindong Gu, Volker Tresp
Visual Question Answering (VQA) is a multi-discipline research task.
2 code implementations • 25 Jul 2022 • Jindong Gu, Hengshuang Zhao, Volker Tresp, Philip Torr
Since SegPGD can create more effective adversarial examples, the adversarial training with our SegPGD can boost the robustness of segmentation models.
no code implementations • 21 Jul 2022 • Boxi Wu, Jindong Gu, Zhifeng Li, Deng Cai, Xiaofei He, Wei Liu
Vision Transformer (ViT), as a powerful alternative to Convolutional Neural Network (CNN), has received much attention.
1 code implementation • 17 Jul 2022 • Xinwei Liu, Jian Liu, Yang Bai, Jindong Gu, Tao Chen, Xiaojun Jia, Xiaochun Cao
Inspired by the vulnerability of DNNs on adversarial perturbations, we propose a novel defence mechanism by adversarial machine learning for good.
1 code implementation • ICCV 2023 • Haokun Chen, Ahmed Frikha, Denis Krompass, Jindong Gu, Volker Tresp
Real-world applications usually involve a distribution shift across the datasets of the different clients, which hurts the generalization ability of the clients to unseen samples from their respective data distributions.
no code implementations • 17 Mar 2022 • Zhen Han, Ruotong Liao, Jindong Gu, Yao Zhang, Zifeng Ding, Yujia Gu, Heinz Köppl, Hinrich Schütze, Volker Tresp
Since conventional knowledge embedding models cannot take full advantage of the abundant textual information, there have been extensive research efforts in enhancing knowledge embedding using texts.
no code implementations • 22 Nov 2021 • Jindong Gu, Hengshuang Zhao, Volker Tresp, Philip Torr
The high transferability achieved by our method shows that, in contrast to the observations in previous work, adversarial examples on a segmentation model can be easy to transfer to other segmentation models.
no code implementations • 20 Nov 2021 • Jindong Gu, Volker Tresp, Yao Qin
However, when ViTs are attacked by an adversary, the attention mechanism can be easily fooled to focus more on the adversarially perturbed patches and cause a mistake.
no code implementations • 29 Sep 2021 • Jindong Gu, Volker Tresp, Yao Qin
Based on extensive qualitative and quantitative experiments, we discover that ViT's stronger robustness to natural corrupted patches and higher vulnerability against adversarial patches are both caused by the attention mechanism.
1 code implementation • 21 Jun 2021 • Jindong Gu, Wei Liu, Yonglong Tian
While large self-supervised models have rivalled the performance of their supervised counterparts, small models still struggle.
no code implementations • 9 Jun 2021 • Boxi Wu, Heng Pan, Li Shen, Jindong Gu, Shuai Zhao, Zhifeng Li, Deng Cai, Xiaofei He, Wei Liu
In this work, we find that the adversarial attacks can also be vulnerable to small perturbations.
1 code implementation • 1 Jun 2021 • Zhiliang Wu, Yinchong Yang, Jindong Gu, Volker Tresp
We propose an uncertainty-aware deep kernel learning model which permits the estimation of the uncertainty in the prediction by a pipeline of a Convolutional Neural Network and a sparse Gaussian Process.
no code implementations • CVPR 2021 • Jindong Gu, Volker Tresp, Han Hu
The examination reveals five major new/different components in CapsNet: a transformation process, a dynamic routing layer, a squashing function, a marginal loss other than cross-entropy loss, and an additional class-conditional reconstruction loss for regularization.
1 code implementation • ICLR 2021 • Jindong Gu, Baoyuan Wu, Volker Tresp
As alternatives to CNNs, the recently proposed Capsule Networks (CapsNets) are shown to be more robust to white-box attacks than CNNs under popular attack protocols.
no code implementations • 3 Dec 2020 • Jindong Gu, Volker Tresp
In the proposed model, individual classification explanations can be created effectively and efficiently.
no code implementations • 19 Sep 2020 • Jindong Gu, Zhiliang Wu, Volker Tresp
Motivated by the conclusion, we propose an implementation of introspective learning by distilling knowledge from online self-explanations.
no code implementations • 30 Jan 2020 • Jindong Gu, Volker Tresp
The knowledge of a well-performed teacher is distilled to a student with a small architecture.
no code implementations • 21 Nov 2019 • Jindong Gu, Volker Tresp
What is the difference between DNNs trained with random labels and the ones trained with true labels?
no code implementations • CVPR 2020 • Jindong Gu, Volker Tresp
Our investigation reveals that the routing procedure contributes neither to the generalization ability nor to the affine robustness of the CapsNets.
no code implementations • 21 Oct 2019 • Jindong Gu, Volker Tresp
Deep neural networks (DNNs) with high expressiveness have achieved state-of-the-art performance in many tasks.
no code implementations • 21 Oct 2019 • Jindong Gu, Volker Tresp
In this work, we first show that PDA can suffer from saturated classifiers.
no code implementations • 2 Sep 2019 • Jindong Gu, Daniela Oelke
Bias is known to be an impediment to fair decisions in many domains such as human resources, the public sector, health care etc.
no code implementations • 22 Aug 2019 • Jindong Gu, Volker Tresp
The idea behind saliency methods is to explain the classification decisions of neural networks by creating so-called saliency maps.
2 code implementations • 5 Dec 2018 • Jindong Gu, Yinchong Yang, Volker Tresp
The experiments and analysis conclude that the explanations generated by LRP are not class-discriminative.
no code implementations • ICLR 2018 • Jindong Gu, Matthias Schubert, Volker Tresp
In the adversarial process of training CorGAN, the Generator is supposed to generate outlier samples for negative class, and the Discriminator as an one-class classifier is trained to distinguish data from training datasets (i. e. positive class) and generated data from the Generator (i. e. negative class).