Search Results for author: Jianbing Zhang

Found 14 papers, 10 papers with code

The Devil is in the Few Shots: Iterative Visual Knowledge Completion for Few-shot Learning

1 code implementation15 Apr 2024 Yaohui Li, Qifeng Zhou, Haoxing Chen, Jianbing Zhang, Xinyu Dai, Hao Zhou

Few-shot learning aims to further enhance the transfer capability of CLIP by giving few images in each class, aka 'few shots'.

Few-Shot Learning Zero-Shot Learning

MixRED: A Mix-lingual Relation Extraction Dataset

1 code implementation23 Mar 2024 Lingxing Kong, Yougang Chu, Zheng Ma, Jianbing Zhang, Liang He, Jiajun Chen

Relation extraction is a critical task in the field of natural language processing with numerous real-world applications.

Relation Relation Extraction

Cobra Effect in Reference-Free Image Captioning Metrics

no code implementations18 Feb 2024 Zheng Ma, Changxin Wang, Yawen Ouyang, Fei Zhao, Jianbing Zhang, ShuJian Huang, Jiajun Chen

If a certain metric has flaws, it will be exploited by the model and reflected in the generated sentences.

Image Captioning

EFUF: Efficient Fine-grained Unlearning Framework for Mitigating Hallucinations in Multimodal Large Language Models

no code implementations15 Feb 2024 Shangyu Xing, Fei Zhao, Zhen Wu, Tuo An, WeiHao Chen, Chunhui Li, Jianbing Zhang, Xinyu Dai

Multimodal large language models (MLLMs) have attracted increasing attention in the past few years, but they may still generate descriptions that include objects not present in the corresponding images, a phenomenon known as object hallucination.


SeeClick: Harnessing GUI Grounding for Advanced Visual GUI Agents

1 code implementation17 Jan 2024 Kanzhi Cheng, Qiushi Sun, Yougang Chu, Fangzhi Xu, Yantao Li, Jianbing Zhang, Zhiyong Wu

In our preliminary study, we have discovered a key challenge in developing visual GUI agents: GUI grounding -- the capacity to accurately locate screen elements based on instructions.

DRIN: Dynamic Relation Interactive Network for Multimodal Entity Linking

1 code implementation9 Oct 2023 Shangyu Xing, Fei Zhao, Zhen Wu, Chunhui Li, Jianbing Zhang, Xinyu Dai

Multimodal Entity Linking (MEL) is a task that aims to link ambiguous mentions within multimodal contexts to referential entities in a multimodal knowledge base.

Entity Linking Relation

Food-500 Cap: A Fine-Grained Food Caption Benchmark for Evaluating Vision-Language Models

1 code implementation6 Aug 2023 Zheng Ma, Mianzhi Pan, Wenhan Wu, Kanzhi Cheng, Jianbing Zhang, ShuJian Huang, Jiajun Chen

Experiments on our proposed datasets demonstrate that popular VLMs underperform in the food domain compared with their performance in the general domain.

Beyond Generic: Enhancing Image Captioning with Real-World Knowledge using Vision-Language Pre-Training Model

1 code implementation2 Aug 2023 Kanzhi Cheng, Wenpo Song, Zheng Ma, Wenhao Zhu, Zixuan Zhu, Jianbing Zhang

Considering that Vision-Language Pre-Training (VLP) models master massive such knowledge from large-scale web-harvested data, it is promising to utilize the generalizability of VLP models to incorporate knowledge into image descriptions.

Hallucination Image Captioning +2

ADS-Cap: A Framework for Accurate and Diverse Stylized Captioning with Unpaired Stylistic Corpora

1 code implementation2 Aug 2023 Kanzhi Cheng, Zheng Ma, Shi Zong, Jianbing Zhang, Xinyu Dai, Jiajun Chen

Generating visually grounded image captions with specific linguistic styles using unpaired stylistic corpora is a challenging task, especially since we expect stylized captions with a wide variety of stylistic patterns.

Contrastive Learning Image Captioning

Probing Cross-modal Semantics Alignment Capability from the Textual Perspective

no code implementations18 Oct 2022 Zheng Ma, Shi Zong, Mianzhi Pan, Jianbing Zhang, ShuJian Huang, Xinyu Dai, Jiajun Chen

In recent years, vision and language pre-training (VLP) models have advanced the state-of-the-art results in a variety of cross-modal downstream tasks.

Image Captioning Sentence

Cannot find the paper you are looking for? You can Submit a new open access paper.