Search Results for author: Chaoya Jiang

Found 10 papers, 3 papers with code

Hal-Eval: A Universal and Fine-grained Hallucination Evaluation Framework for Large Vision Language Models

no code implementations • 24 Feb 2024 • Chaoya Jiang, Wei Ye, Mengfan Dong, Hongrui Jia, Haiyang Xu, Ming Yan, Ji Zhang, Shikun Zhang

Large Vision Language Models exhibit remarkable capabilities but struggle with hallucinations inconsistencies between images and their descriptions.

Hallucination Hallucination Evaluation

Paper
Add Code

Efficient Vision-and-Language Pre-training with Text-Relevant Image Patch Selection

no code implementations • 11 Jan 2024 • Wei Ye, Chaoya Jiang, Haiyang Xu, Chenhao Ye, Chenliang Li, Ming Yan, Shikun Zhang, Songhang Huang, Fei Huang

Vision Transformers (ViTs) have become increasingly popular in large-scale Vision and Language Pre-training (VLP) models.

Paper
Add Code

TiMix: Text-aware Image Mixing for Effective Vision-Language Pre-training

1 code implementation • 14 Dec 2023 • Chaoya Jiang, Wei Ye, Haiyang Xu, Qinghao Ye, Ming Yan, Ji Zhang, Shikun Zhang

Self-supervised Multi-modal Contrastive Learning (SMCL) remarkably advances modern Vision-Language Pre-training (VLP) models by aligning visual and linguistic modalities.

Contrastive Learning Data Augmentation

Paper
Code

Hallucination Augmented Contrastive Learning for Multimodal Large Language Model

1 code implementation • 12 Dec 2023 • Chaoya Jiang, Haiyang Xu, Mengfan Dong, Jiaxing Chen, Wei Ye, Ming Yan, Qinghao Ye, Ji Zhang, Fei Huang, Shikun Zhang

We first analyzed the representation distribution of textual and visual tokens in MLLM, revealing two important findings: 1) there is a significant gap between textual and visual representations, indicating unsatisfactory cross-modal representation alignment; 2) representations of texts that contain and do not contain hallucinations are entangled, making it challenging to distinguish them.

Ranked #74 on Visual Question Answering on MM-Vet

Contrastive Learning Hallucination +4

Paper
Code

BUS:Efficient and Effective Vision-language Pre-training with Bottom-Up Patch Summarization

no code implementations • 17 Jul 2023 • Chaoya Jiang, Haiyang Xu, Wei Ye, Qinghao Ye, Chenliang Li, Ming Yan, Bin Bi, Shikun Zhang, Fei Huang, Songfang Huang

Specifically, We incorporate a Text-Semantics-Aware Patch Selector (TSPS) into the ViT backbone to perform a coarse-grained visual token extraction and then attach a flexible Transformer-based Patch Abstraction Decoder (PAD) upon the backbone for top-level visual abstraction.

Text Summarization

Paper
Add Code

PandaLM: An Automatic Evaluation Benchmark for LLM Instruction Tuning Optimization

2 code implementations • 8 Jun 2023 • Yidong Wang, Zhuohao Yu, Zhengran Zeng, Linyi Yang, Cunxiang Wang, Hao Chen, Chaoya Jiang, Rui Xie, Jindong Wang, Xing Xie, Wei Ye, Shikun Zhang, Yue Zhang

To ensure the reliability of PandaLM, we collect a diverse human-annotated test dataset, where all contexts are generated by humans and labels are aligned with human preferences.

Language Modelling Large Language Model

842

Paper
Code

Exploiting Pseudo Image Captions for Multimodal Summarization

no code implementations • 9 May 2023 • Chaoya Jiang, Rui Xie, Wei Ye, Jinan Sun, Shikun Zhang

Cross-modal contrastive learning in vision language pretraining (VLP) faces the challenge of (partial) false negatives.

Common Sense Reasoning Contrastive Learning +1

Paper
Add Code

Vision Language Pre-training by Contrastive Learning with Cross-Modal Similarity Regulation

no code implementations • 8 May 2023 • Chaoya Jiang, Wei Ye, Haiyang Xu, Miang yan, Shikun Zhang, Jie Zhang, Fei Huang

Cross-modal contrastive learning in vision language pretraining (VLP) faces the challenge of (partial) false negatives.

Common Sense Reasoning Contrastive Learning

Paper
Add Code

BUS: Efficient and Effective Vision-Language Pre-Training with Bottom-Up Patch Summarization.

no code implementations • ICCV 2023 • Chaoya Jiang, Haiyang Xu, Wei Ye, Qinghao Ye, Chenliang Li, Ming Yan, Bin Bi, Shikun Zhang, Fei Huang, Songfang Huang

In this paper, we propose a Bottom-Up Patch Summarization approach named BUS which is inspired by the Document Summarization Task in NLP to learn a concise visual summary of lengthy visual token sequences, guided by textual semantics.

Abstractive Text Summarization Document Summarization

Paper
Add Code

SIMILARITY LEARNING FOR COVER SONG IDENTIFICATION USING CROSS-SIMILARITY MATRICES OF MULTI-LEVEL DEEP SEQUENCES

no code implementations • 14 May 2020 • Chaoya Jiang, Deshun Yang, Xiaoou Chen

One part is a network for learn- ing the deep sequence representation of music tracks, and the other is a similarity estimation network which takes as input the cross- similarity matrices calculated from the deep sequences of a pair of tracks.

Ranked #1 on Cover song identification on YouTube350

Cover song identification Metric Learning

Paper
Add Code

Cannot find the paper you are looking for? You can Submit a new open access paper.