1 code implementation • 29 Jan 2024 • Junyang Wang, Haiyang Xu, Jiabo Ye, Ming Yan, Weizhou Shen, Ji Zhang, Fei Huang, Jitao Sang
To assess the performance of Mobile-Agent, we introduced Mobile-Eval, a benchmark for evaluating mobile device operations.
1 code implementation • 13 Nov 2023 • Junyang Wang, Yuhang Wang, Guohai Xu, Jing Zhang, Yukai Gu, Haitao Jia, Jiaqi Wang, Haiyang Xu, Ming Yan, Ji Zhang, Jitao Sang
Despite making significant progress in multi-modal tasks, current Multi-modal Large Language Models (MLLMs) encounter the significant challenge of hallucinations, which may lead to harmful consequences.
1 code implementation • 29 Aug 2023 • Junyang Wang, Yiyang Zhou, Guohai Xu, Pengcheng Shi, Chenlin Zhao, Haiyang Xu, Qinghao Ye, Ming Yan, Ji Zhang, Jihua Zhu, Jitao Sang, Haoyu Tang
In this paper, we propose Hallucination Evaluation based on Large Language Models (HaELM), an LLM-based hallucination evaluation framework.
no code implementations • 18 Aug 2023 • Pengcheng Shi, Jie Zhang, Haozhe Cheng, Junyang Wang, Yiyang Zhou, Chenlin Zhao, Jihua Zhu
Specifically, we propose a plug-and-play Overlap Bias Matching Module (OBMM) comprising two integral components, overlap sampling module and bias prediction module.
no code implementations • 13 Aug 2023 • Yi Zhang, Jitao Sang, Junyang Wang, Dongmei Jiang, YaoWei Wang
To this end, we propose \emph{Shortcut Debiasing}, to first transfer the target task's learning of bias attributes from bias features to shortcut features, and then employ causal intervention to eliminate shortcut features during inference.
1 code implementation • 27 Apr 2023 • Qinghao Ye, Haiyang Xu, Guohai Xu, Jiabo Ye, Ming Yan, Yiyang Zhou, Junyang Wang, Anwen Hu, Pengcheng Shi, Yaya Shi, Chenliang Li, Yuanhong Xu, Hehong Chen, Junfeng Tian, Qi Qian, Ji Zhang, Fei Huang, Jingren Zhou
Our code, pre-trained model, instruction-tuned models, and evaluation set are available at https://github. com/X-PLUG/mPLUG-Owl.
Ranked #3 on Visual Question Answering (VQA) on HallusionBench
Visual Question Answering (VQA) Zero-Shot Video Question Answer
1 code implementation • 26 Apr 2023 • Junyang Wang, Ming Yan, Yi Zhang, Jitao Sang
Although previous works have created generation capacity for CLIP through additional language models, a modality gap between the CLIP representations of different modalities and the inability of CLIP to model the offset of this gap, which fails the concept to transfer across modalities.
1 code implementation • ICCV 2023 • Junyang Wang, Yuanhong Xu, Juhua Hu, Ming Yan, Jitao Sang, Qi Qian
Fine-tuning a visual pre-trained model can leverage the semantic information from large-scale pre-training data and mitigate the over-fitting problem on downstream vision tasks with limited training examples.
no code implementations • 14 Nov 2022 • Junyang Wang, Yi Zhang, Ming Yan, Ji Zhang, Jitao Sang
We further propose Anchor Augment to guide the generative model's attention to the fine-grained information in the representation of CLIP.
no code implementations • 2 Nov 2022 • Yi Zhang, Jitao Sang, Junyang Wang
To this end, we propose \emph{Proxy Debiasing}, to first transfer the target task's learning of bias information from bias features to artificial proxy features, and then employ causal intervention to eliminate proxy features in inference.
no code implementations • 26 Oct 2022 • Junyang Wang, Yi Zhang, Jitao Sang
Although FairCLIP is used to eliminate bias in image retrieval, it achieves the neutralization of the representation which is common to all CLIP downstream tasks.
1 code implementation • 3 Jul 2022 • Yi Zhang, Junyang Wang, Jitao Sang
Vision-Language Pre-training (VLP) models have achieved state-of-the-art performance in numerous cross-modal tasks.
no code implementations • 22 Apr 2021 • Junyang Wang, Jon Cockayne, Oksana Chkrebtii, T. J. Sullivan, Chris. J. Oates
The numerical solution of differential equations can be formulated as an inference problem to which formal statistical approaches can be applied.