Search Results for author: Mengzhao Jia

Found 6 papers, 3 papers with code

Describe-then-Reason: Improving Multimodal Mathematical Reasoning through Visual Comprehension Training

no code implementations22 Apr 2024 Mengzhao Jia, Zhihan Zhang, Wenhao Yu, Fangkai Jiao, Meng Jiang

Open-source multimodal large language models (MLLMs) excel in various tasks involving textual and visual inputs but still struggle with complex multimodal mathematical reasoning, lagging behind proprietary models like GPT-4V(ision) and Gemini-Pro.

Math Mathematical Reasoning

Debiasing Multimodal Sarcasm Detection with Contrastive Learning

no code implementations16 Dec 2023 Mengzhao Jia, Can Xie, Liqiang Jing

Moreover, we propose a novel debiasing multimodal sarcasm detection framework with contrastive learning, which aims to mitigate the harmful effect of biased textual factors for robust OOD generalization.

Contrastive Learning counterfactual +2

PLUG: Leveraging Pivot Language in Cross-Lingual Instruction Tuning

1 code implementation15 Nov 2023 Zhihan Zhang, Dong-Ho Lee, Yuwei Fang, Wenhao Yu, Mengzhao Jia, Meng Jiang, Francesco Barbieri

Instruction tuning has remarkably advanced large language models (LLMs) in understanding and responding to diverse human instructions.

Instruction Following

FAITHSCORE: Evaluating Hallucinations in Large Vision-Language Models

1 code implementation2 Nov 2023 Liqiang Jing, Ruosen Li, Yunmo Chen, Mengzhao Jia, Xinya Du

We introduce FAITHSCORE (Faithfulness to Atomic Image Facts Score), a reference-free and fine-grained evaluation metric that measures the faithfulness of the generated free-form answers from large vision-language models (LVLMs).

Descriptive Instruction Following

Knowledge-enhanced Memory Model for Emotional Support Conversation

no code implementations11 Oct 2023 Mengzhao Jia, Qianglong Chen, Liqiang Jing, Dawei Fu, Renyu Li

The prevalence of mental disorders has become a significant issue, leading to the increased focus on Emotional Support Conversation as an effective supplement for mental health support.

Response Generation

Multi-source Semantic Graph-based Multimodal Sarcasm Explanation Generation

1 code implementation29 Jun 2023 Liqiang Jing, Xuemeng Song, Kun Ouyang, Mengzhao Jia, Liqiang Nie

Multimodal Sarcasm Explanation (MuSE) is a new yet challenging task, which aims to generate a natural language sentence for a multimodal social post (an image as well as its caption) to explain why it contains sarcasm.

Explanation Generation Object +1

Cannot find the paper you are looking for? You can Submit a new open access paper.